2025-12-04T09:32:17.1329506Z Current runner version: '2.330.0'
2025-12-04T09:32:17.1337455Z Runner name: 'i-00bb8650059fae3eb'
2025-12-04T09:32:17.1338361Z Runner group name: 'default'
2025-12-04T09:32:17.1339357Z Machine name: 'ip-10-0-51-5'
2025-12-04T09:32:17.1342657Z ##[group]GITHUB_TOKEN Permissions
2025-12-04T09:32:17.1345291Z Contents: read
2025-12-04T09:32:17.1346008Z Metadata: read
2025-12-04T09:32:17.1346691Z ##[endgroup]
2025-12-04T09:32:17.1349263Z Secret source: Actions
2025-12-04T09:32:17.1350195Z Prepare workflow directory
2025-12-04T09:32:17.1957273Z Prepare all required actions
2025-12-04T09:32:17.2007949Z Getting action download info
2025-12-04T09:32:17.5603949Z Download action repository 'pytorch/test-infra@main' (SHA:39aa74d619174326f4e2fb0e216151c2f29d9ffd)
2025-12-04T09:32:19.8617136Z Download action repository 'pytorch/pytorch@main' (SHA:7716da9fb23f27a65b41f9f016a2afadf281c18f)
2025-12-04T09:32:34.3515375Z Download action repository 'actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065' (SHA:a26af69be951a213d495a4c3e4e4022e16d87065)
2025-12-04T09:32:34.7309301Z Download action repository 'aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722' (SHA:ececac1a45f3b08a01d2dd070d28d111c5fe6722)
2025-12-04T09:32:34.9213619Z Download action repository 'aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076' (SHA:062b18b96a7aff071d4dc91bc00c4c1a7945b076)
2025-12-04T09:32:35.1017632Z Download action repository 'seemethere/download-artifact-s3@1da556a7aa0a088e3153970611f6c432d58e80e6' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6)
2025-12-04T09:32:35.4087520Z Download action repository 'seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a' (SHA:baba72d0712b404f646cebe0730933554ebce96a)
2025-12-04T09:32:35.7010530Z Getting action download info
2025-12-04T09:32:35.8547548Z Download action repository 'actions/checkout@v4' (SHA:34e114876b0b11c390a56381ad16ebd13914f8d5)
2025-12-04T09:32:36.1506520Z Getting action download info
2025-12-04T09:32:36.3006713Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e)
2025-12-04T09:32:36.5134161Z Getting action download info
2025-12-04T09:32:36.6269662Z Download action repository 'nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482' (SHA:3e91a01664abd3c5cd539100d10d33b9c5b68482)
2025-12-04T09:32:36.8922040Z Getting action download info
2025-12-04T09:32:37.0881561Z Uses: pytorch/pytorch/.github/workflows/_linux-test.yml@refs/heads/main (ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32)
2025-12-04T09:32:37.0886066Z ##[group] Inputs
2025-12-04T09:32:37.0886508Z   build-environment: linux-jammy-cuda12.4-py3.10-gcc11
2025-12-04T09:32:37.0893942Z   test-matrix: {"include": [{"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]}
2025-12-04T09:32:37.0902145Z   docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:32:37.0903159Z   sync-tag: 
2025-12-04T09:32:37.0904123Z   timeout-minutes: 240
2025-12-04T09:32:37.0904437Z   use-gha: 
2025-12-04T09:32:37.0904697Z   dashboard-tag: 
2025-12-04T09:32:37.0904969Z   s3-bucket: gha-artifacts
2025-12-04T09:32:37.0905291Z   aws-role-to-assume: 
2025-12-04T09:32:37.0905972Z   disable-monitor: false
2025-12-04T09:32:37.0906317Z   monitor-log-interval: 5
2025-12-04T09:32:37.0906679Z   monitor-data-collect-interval: 1
2025-12-04T09:32:37.0907071Z ##[endgroup]
2025-12-04T09:32:37.0907833Z Complete job name: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable)
2025-12-04T09:32:37.1519348Z A job started hook has been configured by the self-hosted runner administrator
2025-12-04T09:32:37.1637728Z ##[group]Run '/home/ec2-user/runner-scripts/before_job.sh'
2025-12-04T09:32:37.1648545Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:32:37.1649310Z ##[endgroup]
2025-12-04T09:32:38.6911482Z Runner Type: linux.g4dn.4xlarge.nvidia.gpu
2025-12-04T09:32:38.6912144Z Instance Type: g4dn.4xlarge
2025-12-04T09:32:38.6912459Z AMI Name: unknown
2025-12-04T09:32:38.6952334Z AMI ID: ami-08982f1c5bf93d976
2025-12-04T09:32:45.3265680Z ##[group]Run pytorch/test-infra/.github/actions/setup-ssh@main
2025-12-04T09:32:45.3266213Z with:
2025-12-04T09:32:45.3266880Z   github-secret: ***
2025-12-04T09:32:45.3267740Z   instructions: All testing is done inside the container, to start an interactive session run:
  docker exec -it $(docker container ps --format '{{.ID}}') bash

2025-12-04T09:32:45.3268701Z   activate-with-label: false
2025-12-04T09:32:45.3269032Z   label: with-ssh
2025-12-04T09:32:45.3269312Z   remove-existing-keys: true
2025-12-04T09:32:45.3269644Z   fail-silently: true
2025-12-04T09:32:45.3269940Z env:
2025-12-04T09:32:45.3270179Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:32:45.3270497Z ##[endgroup]
2025-12-04T09:32:45.4853692Z Please see https://github.com/pytorch/pytorch/wiki/Debugging-using-with-ssh-for-Github-Actions for more info.
2025-12-04T09:32:45.4855535Z Not on pull request and ciflow reference could not be extracted, skipping adding ssh keys
2025-12-04T09:32:45.5166195Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main
2025-12-04T09:32:45.5166727Z with:
2025-12-04T09:32:45.5166993Z   no-sudo: true
2025-12-04T09:32:45.5167273Z   submodules: recursive
2025-12-04T09:32:45.5167575Z   fetch-depth: 0
2025-12-04T09:32:45.5167867Z env:
2025-12-04T09:32:45.5168115Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:32:45.5168422Z ##[endgroup]
2025-12-04T09:32:45.5284862Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T09:32:45.5286016Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:32:45.5298830Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:32:45.5299670Z env:
2025-12-04T09:32:45.5300146Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:32:45.5300753Z ##[endgroup]
2025-12-04T09:32:45.5399589Z ##[group]Run # Use all available CPUs for fetching
2025-12-04T09:32:45.5400123Z [36;1m# Use all available CPUs for fetching[0m
2025-12-04T09:32:45.5400542Z [36;1mcd "${GITHUB_WORKSPACE}"[0m
2025-12-04T09:32:45.5400937Z [36;1mgit config --global fetch.parallel 0[0m
2025-12-04T09:32:45.5401581Z [36;1mgit config --global submodule.fetchJobs 0[0m
2025-12-04T09:32:45.5402000Z [36;1m[0m
2025-12-04T09:32:45.5402418Z [36;1m# Clean workspace. The default checkout action should also do this, but[0m
2025-12-04T09:32:45.5402987Z [36;1m# do it here as well just in case[0m
2025-12-04T09:32:45.5403375Z [36;1mif [[ -d .git ]]; then[0m
2025-12-04T09:32:45.5403713Z [36;1m  if [ -z "${NO_SUDO}" ]; then[0m
2025-12-04T09:32:45.5404087Z [36;1m    sudo git clean -ffdx[0m
2025-12-04T09:32:45.5404420Z [36;1m  else[0m
2025-12-04T09:32:45.5404700Z [36;1m    git clean -ffdx[0m
2025-12-04T09:32:45.5404998Z [36;1m  fi[0m
2025-12-04T09:32:45.5405249Z [36;1mfi[0m
2025-12-04T09:32:45.5411925Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:32:45.5412369Z env:
2025-12-04T09:32:45.5412718Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:32:45.5413059Z   NO_SUDO: true
2025-12-04T09:32:45.5413310Z ##[endgroup]
2025-12-04T09:32:45.5546641Z ##[group]Run actions/checkout@v4
2025-12-04T09:32:45.5547032Z with:
2025-12-04T09:32:45.5547334Z   ref: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:32:45.5547723Z   fetch-depth: 0
2025-12-04T09:32:45.5548012Z   submodules: recursive
2025-12-04T09:32:45.5548321Z   show-progress: false
2025-12-04T09:32:45.5548639Z   repository: pytorch/pytorch
2025-12-04T09:32:45.5549106Z   token: ***
2025-12-04T09:32:45.5549370Z   ssh-strict: true
2025-12-04T09:32:45.5549649Z   ssh-user: git
2025-12-04T09:32:45.5549917Z   persist-credentials: true
2025-12-04T09:32:45.5550234Z   clean: true
2025-12-04T09:32:45.5550529Z   sparse-checkout-cone-mode: true
2025-12-04T09:32:45.5550868Z   fetch-tags: false
2025-12-04T09:32:45.5551141Z   lfs: false
2025-12-04T09:32:45.5551410Z   set-safe-directory: true
2025-12-04T09:32:45.5551715Z env:
2025-12-04T09:32:45.5551965Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:32:45.5552271Z ##[endgroup]
2025-12-04T09:32:45.6801168Z Syncing repository: pytorch/pytorch
2025-12-04T09:32:45.6802744Z ##[group]Getting Git version info
2025-12-04T09:32:45.6803349Z Working directory is '/home/ec2-user/actions-runner/_work/pytorch/pytorch'
2025-12-04T09:32:45.6804133Z [command]/usr/bin/git version
2025-12-04T09:32:45.6962656Z git version 2.50.1
2025-12-04T09:32:45.7007886Z ##[endgroup]
2025-12-04T09:32:45.7019200Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/f7d10314-b94e-44a0-bd16-0b12211406dc/.gitconfig'
2025-12-04T09:32:45.7039395Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/f7d10314-b94e-44a0-bd16-0b12211406dc' before making global git config changes
2025-12-04T09:32:45.7040584Z Adding repository directory to the temporary git global config as a safe directory
2025-12-04T09:32:45.7044875Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T09:32:45.7089610Z Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch'
2025-12-04T09:32:45.7093092Z ##[group]Initializing the repository
2025-12-04T09:32:45.7097770Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T09:32:45.7161525Z hint: Using 'master' as the name for the initial branch. This default branch name
2025-12-04T09:32:45.7162258Z hint: is subject to change. To configure the initial branch name to use in all
2025-12-04T09:32:45.7162939Z hint: of your new repositories, which will suppress this warning, call:
2025-12-04T09:32:45.7163426Z hint:
2025-12-04T09:32:45.7163777Z hint: 	git config --global init.defaultBranch <name>
2025-12-04T09:32:45.7164194Z hint:
2025-12-04T09:32:45.7164574Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
2025-12-04T09:32:45.7165268Z hint: 'development'. The just-created branch can be renamed via this command:
2025-12-04T09:32:45.7165788Z hint:
2025-12-04T09:32:45.7166034Z hint: 	git branch -m <name>
2025-12-04T09:32:45.7166345Z hint:
2025-12-04T09:32:45.7166793Z hint: Disable this message with "git config set advice.defaultBranchName false"
2025-12-04T09:32:45.7170682Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/
2025-12-04T09:32:45.7181536Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch
2025-12-04T09:32:45.7219613Z ##[endgroup]
2025-12-04T09:32:45.7220132Z ##[group]Disabling automatic garbage collection
2025-12-04T09:32:45.7223510Z [command]/usr/bin/git config --local gc.auto 0
2025-12-04T09:32:45.7251151Z ##[endgroup]
2025-12-04T09:32:45.7251611Z ##[group]Setting up auth
2025-12-04T09:32:45.7258661Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2025-12-04T09:32:45.7287966Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :"
2025-12-04T09:32:45.7633156Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2025-12-04T09:32:45.7663049Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :"
2025-12-04T09:32:45.7967570Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T09:32:45.7997259Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url
2025-12-04T09:32:45.8341456Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic ***
2025-12-04T09:32:45.8399936Z ##[endgroup]
2025-12-04T09:32:45.8400465Z ##[group]Fetching the repository
2025-12-04T09:32:45.8409416Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/*
2025-12-04T09:33:41.1139450Z From https://github.com/pytorch/pytorch
2025-12-04T09:33:41.1140961Z  * [new branch]              2.6.0.dev20241004+          -> origin/2.6.0.dev20241004+
2025-12-04T09:33:41.1143039Z  * [new branch]              2.9.1                       -> origin/2.9.1
2025-12-04T09:33:41.1143872Z  * [new branch]              AaronWang04_addmmfusion_perftest -> origin/AaronWang04_addmmfusion_perftest
2025-12-04T09:33:41.1144644Z  * [new branch]              Flamefire-patch-1           -> origin/Flamefire-patch-1
2025-12-04T09:33:41.1145382Z  * [new branch]              HDCharles-2.6.0-release-notes -> origin/HDCharles-2.6.0-release-notes
2025-12-04T09:33:41.1146077Z  * [new branch]              HOPrintFunc                 -> origin/HOPrintFunc
2025-12-04T09:33:41.1146797Z  * [new branch]              IvanKobzarev/stack/1        -> origin/IvanKobzarev/stack/1
2025-12-04T09:33:41.1149336Z  * [new branch]              NicoshevSVE128              -> origin/NicoshevSVE128
2025-12-04T09:33:41.1150561Z  * [new branch]              PR-AOTInductorNoneBug       -> origin/PR-AOTInductorNoneBug
2025-12-04T09:33:41.1152421Z  * [new branch]              PR-AOTInductorNoneBugFix    -> origin/PR-AOTInductorNoneBugFix
2025-12-04T09:33:41.1153656Z  * [new branch]              PR-FixConfigsIssue          -> origin/PR-FixConfigsIssue
2025-12-04T09:33:41.1155008Z  * [new branch]              PR-NoneBugFix-viable        -> origin/PR-NoneBugFix-viable
2025-12-04T09:33:41.1156584Z  * [new branch]              PR-ResetToZero              -> origin/PR-ResetToZero
2025-12-04T09:33:41.1158123Z  * [new branch]              Update-Flash-Packaging      -> origin/Update-Flash-Packaging
2025-12-04T09:33:41.1159680Z  * [new branch]              VLA_exp                     -> origin/VLA_exp
2025-12-04T09:33:41.1161328Z  * [new branch]              activation_bench            -> origin/activation_bench
2025-12-04T09:33:41.1163467Z  * [new branch]              addmm-heuristic             -> origin/addmm-heuristic
2025-12-04T09:33:41.1165588Z  * [new branch]              adi/onednn_aarch64          -> origin/adi/onednn_aarch64
2025-12-04T09:33:41.1167135Z  * [new branch]              adi/test                    -> origin/adi/test
2025-12-04T09:33:41.1168582Z  * [new branch]              adi/test_bgemm              -> origin/adi/test_bgemm
2025-12-04T09:33:41.1170137Z  * [new branch]              adi/test_m8g                -> origin/adi/test_m8g
2025-12-04T09:33:41.1171919Z  * [new branch]              adi/test_onednn             -> origin/adi/test_onednn
2025-12-04T09:33:41.1173497Z  * [new branch]              adi/test_onednn_v3.9        -> origin/adi/test_onednn_v3.9
2025-12-04T09:33:41.1175033Z  * [new branch]              adi/test_presve_change      -> origin/adi/test_presve_change
2025-12-04T09:33:41.1176405Z  * [new branch]              adi/test_timm               -> origin/adi/test_timm
2025-12-04T09:33:41.1178426Z  * [new branch]              adi/testpresve_change       -> origin/adi/testpresve_change
2025-12-04T09:33:41.1180909Z  * [new branch]              aditew01/test/vec_bf16      -> origin/aditew01/test/vec_bf16
2025-12-04T09:33:41.1182270Z  * [new branch]              ah-globalfeedback-hook      -> origin/ah-globalfeedback-hook
2025-12-04T09:33:41.1184103Z  * [new branch]              albanD-patch-1              -> origin/albanD-patch-1
2025-12-04T09:33:41.1185380Z  * [new branch]              also-surround-shimh         -> origin/also-surround-shimh
2025-12-04T09:33:41.1187644Z  * [new branch]              angelayi/aot_compile        -> origin/angelayi/aot_compile
2025-12-04T09:33:41.1189050Z  * [new branch]              angelayi/aoti_additional_files -> origin/angelayi/aoti_additional_files
2025-12-04T09:33:41.1190465Z  * [new branch]              angelayi/benchmark          -> origin/angelayi/benchmark
2025-12-04T09:33:41.1192105Z  * [new branch]              angelayi/change_pytree_serialization -> origin/angelayi/change_pytree_serialization
2025-12-04T09:33:41.1193249Z  * [new branch]              angelayi/cpp_loader         -> origin/angelayi/cpp_loader
2025-12-04T09:33:41.1194872Z  * [new branch]              angelayi/inductor_const     -> origin/angelayi/inductor_const
2025-12-04T09:33:41.1196064Z  * [new branch]              angelayi/lstm               -> origin/angelayi/lstm
2025-12-04T09:33:41.1198115Z  * [new branch]              angelayi/no_so_weight       -> origin/angelayi/no_so_weight
2025-12-04T09:33:41.1199985Z  * [new branch]              angelayi/scan_layers        -> origin/angelayi/scan_layers
2025-12-04T09:33:41.1201487Z  * [new branch]              angelayi/side_eff           -> origin/angelayi/side_eff
2025-12-04T09:33:41.1203052Z  * [new branch]              angelayi/state_dict         -> origin/angelayi/state_dict
2025-12-04T09:33:41.1204396Z  * [new branch]              angelayi/symint_input       -> origin/angelayi/symint_input
2025-12-04T09:33:41.1206219Z  * [new branch]              angelayi/symm_mem           -> origin/angelayi/symm_mem
2025-12-04T09:33:41.1207399Z  * [new branch]              angelayi/test_cpp           -> origin/angelayi/test_cpp
2025-12-04T09:33:41.1209002Z  * [new branch]              angelayi/torch_size         -> origin/angelayi/torch_size
2025-12-04T09:33:41.1210337Z  * [new branch]              annotate_assert             -> origin/annotate_assert
2025-12-04T09:33:41.1212019Z  * [new branch]              annotate_fallback_kernel    -> origin/annotate_fallback_kernel
2025-12-04T09:33:41.1213333Z  * [new branch]              annotation_deepcopy         -> origin/annotation_deepcopy
2025-12-04T09:33:41.1214870Z  * [new branch]              annotation_dynamo           -> origin/annotation_dynamo
2025-12-04T09:33:41.1216493Z  * [new branch]              aot_eager_stack_trace       -> origin/aot_eager_stack_trace
2025-12-04T09:33:41.1218018Z  * [new branch]              aoti-cuda-alloc             -> origin/aoti-cuda-alloc
2025-12-04T09:33:41.1219488Z  * [new branch]              aoti_const_device           -> origin/aoti_const_device
2025-12-04T09:33:41.1220987Z  * [new branch]              aoti_fqn_name_interface     -> origin/aoti_fqn_name_interface
2025-12-04T09:33:41.1222358Z  * [new branch]              aoti_package_weights_binary -> origin/aoti_package_weights_binary
2025-12-04T09:33:41.1223752Z  * [new branch]              aoti_target_windows         -> origin/aoti_target_windows
2025-12-04T09:33:41.1226341Z  * [new branch]              arsh/feat/inductor_check_profiling -> origin/arsh/feat/inductor_check_profiling
2025-12-04T09:33:41.1227622Z  * [new branch]              async_tp                    -> origin/async_tp
2025-12-04T09:33:41.1229359Z  * [new branch]              atalman-inductor-perf-cu124 -> origin/atalman-inductor-perf-cu124
2025-12-04T09:33:41.1230799Z  * [new branch]              atalman-inductor-perf-cu124.1 -> origin/atalman-inductor-perf-cu124.1
2025-12-04T09:33:41.1232486Z  * [new branch]              atalman-patch-2             -> origin/atalman-patch-2
2025-12-04T09:33:41.1234064Z  * [new branch]              atalman-patch-3             -> origin/atalman-patch-3
2025-12-04T09:33:41.1235606Z  * [new branch]              atalman-patch-4             -> origin/atalman-patch-4
2025-12-04T09:33:41.1237177Z  * [new branch]              atalman-patch-5             -> origin/atalman-patch-5
2025-12-04T09:33:41.1238761Z  * [new branch]              atalman-patch-6             -> origin/atalman-patch-6
2025-12-04T09:33:41.1240314Z  * [new branch]              atalman-patch-7             -> origin/atalman-patch-7
2025-12-04T09:33:41.1241957Z  * [new branch]              atalman-patch-8             -> origin/atalman-patch-8
2025-12-04T09:33:41.1243458Z  * [new branch]              atalman_inductor_2.3.1      -> origin/atalman_inductor_2.3.1
2025-12-04T09:33:41.1244812Z  * [new branch]              atalman_inductor_2.4.0      -> origin/atalman_inductor_2.4.0
2025-12-04T09:33:41.1246472Z  * [new branch]              atalman_inductor_2.4.x      -> origin/atalman_inductor_2.4.x
2025-12-04T09:33:41.1248250Z  * [new branch]              attention_benchmarking_clean -> origin/attention_benchmarking_clean
2025-12-04T09:33:41.1250276Z  * [new branch]              bahuang/dt_fix_scalar_add   -> origin/bahuang/dt_fix_scalar_add
2025-12-04T09:33:41.1251573Z  * [new branch]              bahuang/fix_debug_mode      -> origin/bahuang/fix_debug_mode
2025-12-04T09:33:41.1253083Z  * [new branch]              bahuang/fix_expand          -> origin/bahuang/fix_expand
2025-12-04T09:33:41.1254586Z  * [new branch]              bahuang/test                -> origin/bahuang/test
2025-12-04T09:33:41.1256862Z  * [new branch]              base/1.5                    -> origin/base/1.5
2025-12-04T09:33:41.1258741Z  * [new branch]              batching_sdpa_efficient_attention -> origin/batching_sdpa_efficient_attention
2025-12-04T09:33:41.1260028Z  * [new branch]              bench_scaled_mm_ops         -> origin/bench_scaled_mm_ops
2025-12-04T09:33:41.1261741Z  * [new branch]              benchmark-updates           -> origin/benchmark-updates
2025-12-04T09:33:41.1263045Z  * [new branch]              benchmarking-script         -> origin/benchmarking-script
2025-12-04T09:33:41.1265084Z  * [new branch]              bertmaher/pinbump26         -> origin/bertmaher/pinbump26
2025-12-04T09:33:41.1267037Z  * [new branch]              bertrand/cutlass            -> origin/bertrand/cutlass
2025-12-04T09:33:41.1268972Z  * [new branch]              bf/bug-static-input         -> origin/bf/bug-static-input
2025-12-04T09:33:41.1270268Z  * [new branch]              bf/cg-backend               -> origin/bf/cg-backend
2025-12-04T09:33:41.1271979Z  * [new branch]              bf/cg-nccl-test             -> origin/bf/cg-nccl-test
2025-12-04T09:33:41.1273264Z  * [new branch]              bf/cg-remove-check          -> origin/bf/cg-remove-check
2025-12-04T09:33:41.1274936Z  * [new branch]              bf/clean-torchbench-hf      -> origin/bf/clean-torchbench-hf
2025-12-04T09:33:41.1276233Z  * [new branch]              bf/combo-debug-log          -> origin/bf/combo-debug-log
2025-12-04T09:33:41.1277744Z  * [new branch]              bf/cudagraph                -> origin/bf/cudagraph
2025-12-04T09:33:41.1279743Z  * [new branch]              bf/cudagraph-disable-input-mutation -> origin/bf/cudagraph-disable-input-mutation
2025-12-04T09:33:41.1281542Z  * [new branch]              bf/cudagraph-enable-input-mutation-support-benchmark -> origin/bf/cudagraph-enable-input-mutation-support-benchmark
2025-12-04T09:33:41.1283073Z  * [new branch]              bf/cudagraph-partition      -> origin/bf/cudagraph-partition
2025-12-04T09:33:41.1284073Z  * [new branch]              bf/donated-buffer-bench     -> origin/bf/donated-buffer-bench
2025-12-04T09:33:41.1285615Z  * [new branch]              bf/dynamo-partition         -> origin/bf/dynamo-partition
2025-12-04T09:33:41.1287041Z  * [new branch]              bf/lite                     -> origin/bf/lite
2025-12-04T09:33:41.1288511Z  * [new branch]              bf/pa-non-divisible         -> origin/bf/pa-non-divisible
2025-12-04T09:33:41.1290198Z  * [new branch]              bf/partition-cache-free-symbols -> origin/bf/partition-cache-free-symbols
2025-12-04T09:33:41.1291585Z  * [new branch]              bf/partition-memory-plan    -> origin/bf/partition-memory-plan
2025-12-04T09:33:41.1293023Z  * [new branch]              bf/partition-move-cpu       -> origin/bf/partition-move-cpu
2025-12-04T09:33:41.1294610Z  * [new branch]              bf/partition-view-fallback  -> origin/bf/partition-view-fallback
2025-12-04T09:33:41.1295916Z  * [new branch]              bf/remove-check-55b0c39d    -> origin/bf/remove-check-55b0c39d
2025-12-04T09:33:41.1297518Z  * [new branch]              bf/timm-nov-26-2025         -> origin/bf/timm-nov-26-2025
2025-12-04T09:33:41.1298877Z  * [new branch]              bf/transformer-pin-4-57-3   -> origin/bf/transformer-pin-4-57-3
2025-12-04T09:33:41.1300471Z  * [new branch]              bisect_perf_hf_T5_3acc6eac492 -> origin/bisect_perf_hf_T5_3acc6eac492
2025-12-04T09:33:41.1301796Z  * [new branch]              bisect_perf_hf_T5_3fcf66f61fb -> origin/bisect_perf_hf_T5_3fcf66f61fb
2025-12-04T09:33:41.1303187Z  * [new branch]              bisect_perf_hf_T5_4009d154129 -> origin/bisect_perf_hf_T5_4009d154129
2025-12-04T09:33:41.1304536Z  * [new branch]              bisect_perf_hf_T5_40d0740e73d -> origin/bisect_perf_hf_T5_40d0740e73d
2025-12-04T09:33:41.1305934Z  * [new branch]              bisect_perf_hf_T5_5268754e  -> origin/bisect_perf_hf_T5_5268754e
2025-12-04T09:33:41.1307259Z  * [new branch]              bisect_perf_hf_T5_7d89a8d385c -> origin/bisect_perf_hf_T5_7d89a8d385c
2025-12-04T09:33:41.1308806Z  * [new branch]              bisect_perf_hf_T5_b7a25c1ee7c -> origin/bisect_perf_hf_T5_b7a25c1ee7c
2025-12-04T09:33:41.1310100Z  * [new branch]              bisect_perf_hf_T5_c25b201583f -> origin/bisect_perf_hf_T5_c25b201583f
2025-12-04T09:33:41.1311513Z  * [new branch]              bisect_perf_hf_T5_c93e57efac0 -> origin/bisect_perf_hf_T5_c93e57efac0
2025-12-04T09:33:41.1313344Z  * [new branch]              bisect_perf_hf_T5_ca9813ea149 -> origin/bisect_perf_hf_T5_ca9813ea149
2025-12-04T09:33:41.1314430Z  * [new branch]              bisect_perf_hf_T5_d65f194a  -> origin/bisect_perf_hf_T5_d65f194a
2025-12-04T09:33:41.1315819Z  * [new branch]              bisect_perf_hf_T5_da94ab0b  -> origin/bisect_perf_hf_T5_da94ab0b
2025-12-04T09:33:41.1317206Z  * [new branch]              bisect_perf_hf_T5_da94ab0b_new -> origin/bisect_perf_hf_T5_da94ab0b_new
2025-12-04T09:33:41.1318596Z  * [new branch]              bisect_perf_hf_T5_db4e8a1d8a8 -> origin/bisect_perf_hf_T5_db4e8a1d8a8
2025-12-04T09:33:41.1319909Z  * [new branch]              bisect_perf_hf_T5_e0d97e936a2 -> origin/bisect_perf_hf_T5_e0d97e936a2
2025-12-04T09:33:41.1321451Z  * [new branch]              bisect_perf_hf_T5_f23621ec563 -> origin/bisect_perf_hf_T5_f23621ec563
2025-12-04T09:33:41.1323412Z  * [new branch]              brister/fx_device_type      -> origin/brister/fx_device_type
2025-12-04T09:33:41.1324765Z  * [new branch]              brister/test_inductor_all_fx -> origin/brister/test_inductor_all_fx
2025-12-04T09:33:41.1326205Z  * [new branch]              brister/tiled_reduction_no_numel_check -> origin/brister/tiled_reduction_no_numel_check
2025-12-04T09:33:41.1327536Z  * [new branch]              bwd-backup                  -> origin/bwd-backup
2025-12-04T09:33:41.1329266Z  * [new branch]              c57382a49                   -> origin/c57382a49
2025-12-04T09:33:41.1330498Z  * [new branch]              ca_0431d47eaa               -> origin/ca_0431d47eaa
2025-12-04T09:33:41.1331980Z  * [new branch]              ca_fix_0431d47eaa           -> origin/ca_fix_0431d47eaa
2025-12-04T09:33:41.1334073Z  * [new branch]              camyllh/test_setup_hooks_push -> origin/camyllh/test_setup_hooks_push
2025-12-04T09:33:41.1335729Z  * [new branch]              cccclai-patch-1             -> origin/cccclai-patch-1
2025-12-04T09:33:41.1337626Z  * [new branch]              cherry-pick-159969-by-pytorch_bot_bot_ -> origin/cherry-pick-159969-by-pytorch_bot_bot_
2025-12-04T09:33:41.1338985Z  * [new branch]              cherry-pick-160586-by-pytorch_bot_bot_ -> origin/cherry-pick-160586-by-pytorch_bot_bot_
2025-12-04T09:33:41.1340572Z  * [new branch]              cherry-pick-162208-by-pytorch_bot_bot_ -> origin/cherry-pick-162208-by-pytorch_bot_bot_
2025-12-04T09:33:41.1342033Z  * [new branch]              cherry-pick-163169-by-pytorch_bot_bot_ -> origin/cherry-pick-163169-by-pytorch_bot_bot_
2025-12-04T09:33:41.1343525Z  * [new branch]              cherry-pick-165086-by-pytorch_bot_bot_ -> origin/cherry-pick-165086-by-pytorch_bot_bot_
2025-12-04T09:33:41.1345130Z  * [new branch]              cherry-pick-165514-by-pytorch_bot_bot_ -> origin/cherry-pick-165514-by-pytorch_bot_bot_
2025-12-04T09:33:41.1346546Z  * [new branch]              cherry-pick-165601-by-pytorch_bot_bot_ -> origin/cherry-pick-165601-by-pytorch_bot_bot_
2025-12-04T09:33:41.1348021Z  * [new branch]              cherry-pick-165667-by-pytorch_bot_bot_ -> origin/cherry-pick-165667-by-pytorch_bot_bot_
2025-12-04T09:33:41.1349615Z  * [new branch]              cherry-pick-165815-by-pytorch_bot_bot_ -> origin/cherry-pick-165815-by-pytorch_bot_bot_
2025-12-04T09:33:41.1351220Z  * [new branch]              cherry-pick-165922-by-pytorch_bot_bot_ -> origin/cherry-pick-165922-by-pytorch_bot_bot_
2025-12-04T09:33:41.1352640Z  * [new branch]              cherry-pick-166148-by-pytorch_bot_bot_ -> origin/cherry-pick-166148-by-pytorch_bot_bot_
2025-12-04T09:33:41.1354076Z  * [new branch]              cherry-pick-166181-by-pytorch_bot_bot_ -> origin/cherry-pick-166181-by-pytorch_bot_bot_
2025-12-04T09:33:41.1355487Z  * [new branch]              cherry-pick-166404-by-pytorch_bot_bot_ -> origin/cherry-pick-166404-by-pytorch_bot_bot_
2025-12-04T09:33:41.1356982Z  * [new branch]              cherry-pick-166427-by-pytorch_bot_bot_ -> origin/cherry-pick-166427-by-pytorch_bot_bot_
2025-12-04T09:33:41.1358539Z  * [new branch]              cherry-pick-166480-by-pytorch_bot_bot_ -> origin/cherry-pick-166480-by-pytorch_bot_bot_
2025-12-04T09:33:41.1359858Z  * [new branch]              cherry-pick-166570-by-pytorch_bot_bot_ -> origin/cherry-pick-166570-by-pytorch_bot_bot_
2025-12-04T09:33:41.1361372Z  * [new branch]              cherry-pick-166993-by-pytorch_bot_bot_ -> origin/cherry-pick-166993-by-pytorch_bot_bot_
2025-12-04T09:33:41.1362827Z  * [new branch]              cherry-pick-167111-by-pytorch_bot_bot_ -> origin/cherry-pick-167111-by-pytorch_bot_bot_
2025-12-04T09:33:41.1364346Z  * [new branch]              cherry-pick-167478-by-pytorch_bot_bot_ -> origin/cherry-pick-167478-by-pytorch_bot_bot_
2025-12-04T09:33:41.1365589Z  * [new branch]              cherry_pick_166036_166040   -> origin/cherry_pick_166036_166040
2025-12-04T09:33:41.1367198Z  * [new branch]              cherry_pick_166457          -> origin/cherry_pick_166457
2025-12-04T09:33:41.1368790Z  * [new branch]              cherrypick_166338           -> origin/cherrypick_166338
2025-12-04T09:33:41.1370303Z  * [new branch]              cherrypick_166458           -> origin/cherrypick_166458
2025-12-04T09:33:41.1371984Z  * [new branch]              cherrypick_166586           -> origin/cherrypick_166586
2025-12-04T09:33:41.1373528Z  * [new branch]              cherrypick_166956           -> origin/cherrypick_166956
2025-12-04T09:33:41.1374886Z  * [new branch]              ci_attn                     -> origin/ci_attn
2025-12-04T09:33:41.1376481Z  * [new branch]              codex-testing               -> origin/codex-testing
2025-12-04T09:33:41.1378948Z  * [new branch]              codex/add-check_memory_overlap-helper-functions -> origin/codex/add-check_memory_overlap-helper-functions
2025-12-04T09:33:41.1380116Z  * [new branch]              codex/fix-issue-121219-in-pytorch -> origin/codex/fix-issue-121219-in-pytorch
2025-12-04T09:33:41.1382220Z  * [new branch]              codex/investigate-segfaults-in-get_tensor_storage_id -> origin/codex/investigate-segfaults-in-get_tensor_storage_id
2025-12-04T09:33:41.1383919Z  * [new branch]              codex/refactor-lintrunner-config-to-use-uv-run -> origin/codex/refactor-lintrunner-config-to-use-uv-run
2025-12-04T09:33:41.1385033Z  * [new branch]              compatiblpy39util           -> origin/compatiblpy39util
2025-12-04T09:33:41.1386609Z  * [new branch]              cond_hop_device             -> origin/cond_hop_device
2025-12-04T09:33:41.1388018Z  * [new branch]              context_test                -> origin/context_test
2025-12-04T09:33:41.1390116Z  * [new branch]              copilot/code-style-cleanup-python-pip -> origin/copilot/code-style-cleanup-python-pip
2025-12-04T09:33:41.1391890Z  * [new branch]              cpio/fix_new_ami_tests      -> origin/cpio/fix_new_ami_tests
2025-12-04T09:33:41.1393433Z  * [new branch]              cpp-docs-dependency-upgrade -> origin/cpp-docs-dependency-upgrade
2025-12-04T09:33:41.1395662Z  * [new branch]              crpa/typo-in-inductor_comm_lowering -> origin/crpa/typo-in-inductor_comm_lowering
2025-12-04T09:33:41.1397345Z  * [new branch]              csl/always_produce_xml      -> origin/csl/always_produce_xml
2025-12-04T09:33:41.1398618Z  * [new branch]              csl/build_test_more_procs   -> origin/csl/build_test_more_procs
2025-12-04T09:33:41.1400120Z  * [new branch]              csl/build_test_more_procs2  -> origin/csl/build_test_more_procs2
2025-12-04T09:33:41.1401430Z  * [new branch]              csl/clean_up                -> origin/csl/clean_up
2025-12-04T09:33:41.1403451Z  * [new branch]              csl/fix_retry_segfault_exit -> origin/csl/fix_retry_segfault_exit
2025-12-04T09:33:41.1404622Z  * [new branch]              csl/katex                   -> origin/csl/katex
2025-12-04T09:33:41.1406498Z  * [new branch]              csl/larger_runner           -> origin/csl/larger_runner
2025-12-04T09:33:41.1408277Z  * [new branch]              csl/lint_testing            -> origin/csl/lint_testing
2025-12-04T09:33:41.1410042Z  * [new branch]              csl/lint_thing              -> origin/csl/lint_thing
2025-12-04T09:33:41.1411731Z  * [new branch]              csl/lintrunner_stuff        -> origin/csl/lintrunner_stuff
2025-12-04T09:33:41.1413228Z  * [new branch]              csl/manually_gen_json       -> origin/csl/manually_gen_json
2025-12-04T09:33:41.1414730Z  * [new branch]              csl/mps_sharding            -> origin/csl/mps_sharding
2025-12-04T09:33:41.1416225Z  * [new branch]              csl/multistage_docker       -> origin/csl/multistage_docker
2025-12-04T09:33:41.1417866Z  * [new branch]              csl/print_timing            -> origin/csl/print_timing
2025-12-04T09:33:41.1419331Z  * [new branch]              csl/remove_experiment       -> origin/csl/remove_experiment
2025-12-04T09:33:41.1420874Z  * [new branch]              csl/remove_maybe_unused_var -> origin/csl/remove_maybe_unused_var
2025-12-04T09:33:41.1422498Z  * [new branch]              csl/remove_repo_specific_autolabel -> origin/csl/remove_repo_specific_autolabel
2025-12-04T09:33:41.1423858Z  * [new branch]              csl/remove_run_parallel     -> origin/csl/remove_run_parallel
2025-12-04T09:33:41.1425166Z  * [new branch]              csl/remove_unused_vars      -> origin/csl/remove_unused_vars
2025-12-04T09:33:41.1426667Z  * [new branch]              csl/revert_open             -> origin/csl/revert_open
2025-12-04T09:33:41.1428171Z  * [new branch]              csl/skip_build              -> origin/csl/skip_build
2025-12-04T09:33:41.1429540Z  * [new branch]              csl/smaller_avx_amx_runenrs -> origin/csl/smaller_avx_amx_runenrs
2025-12-04T09:33:41.1430846Z  * [new branch]              csl/td_job_level            -> origin/csl/td_job_level
2025-12-04T09:33:41.1432524Z  * [new branch]              csl/test_cuda_build_large_runner -> origin/csl/test_cuda_build_large_runner
2025-12-04T09:33:41.1434030Z  * [new branch]              csl/test_owners_autograd_dispatch_nn -> origin/csl/test_owners_autograd_dispatch_nn
2025-12-04T09:33:41.1435402Z  * [new branch]              csl/test_owners_higher_confidence -> origin/csl/test_owners_higher_confidence
2025-12-04T09:33:41.1436731Z  * [new branch]              csl/upload_json_running     -> origin/csl/upload_json_running
2025-12-04T09:33:41.1438301Z  * [new branch]              csl/win_sccache             -> origin/csl/win_sccache
2025-12-04T09:33:41.1439674Z  * [new branch]              csl/xml_stuff               -> origin/csl/xml_stuff
2025-12-04T09:33:41.1441173Z  * [new branch]              cublasrelax2                -> origin/cublasrelax2
2025-12-04T09:33:41.1442686Z  * [new branch]              cuda_mempool                -> origin/cuda_mempool
2025-12-04T09:33:41.1444151Z  * [new branch]              custom_lowering_dict        -> origin/custom_lowering_dict
2025-12-04T09:33:41.1446127Z  * [new branch]              d4l3k/debug_plane_frtrace   -> origin/d4l3k/debug_plane_frtrace
2025-12-04T09:33:41.1448076Z  * [new branch]              daxia6/2.8o3                -> origin/daxia6/2.8o3
2025-12-04T09:33:41.1449505Z  * [new branch]              debug-guard                 -> origin/debug-guard
2025-12-04T09:33:41.1451126Z  * [new branch]              delete-quant-docs           -> origin/delete-quant-docs
2025-12-04T09:33:41.1455752Z  * [new branch]              dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0
2025-12-04T09:33:41.1457476Z  * [new branch]              dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1
2025-12-04T09:33:41.1459053Z  * [new branch]              desertfire/test_cpp_wrapper -> origin/desertfire/test_cpp_wrapper
2025-12-04T09:33:41.1460691Z  * [new branch]              desertfire/triton-cpu-for-aarch64 -> origin/desertfire/triton-cpu-for-aarch64
2025-12-04T09:33:41.1462889Z  * [new branch]              dev/dhruva/flex_attn_opt    -> origin/dev/dhruva/flex_attn_opt
2025-12-04T09:33:41.1465265Z  * [new branch]              dev/joona/MPSNDArrayAdd     -> origin/dev/joona/MPSNDArrayAdd
2025-12-04T09:33:41.1467041Z  * [new branch]              dev/joona/Unranked          -> origin/dev/joona/Unranked
2025-12-04T09:33:41.1468754Z  * [new branch]              dev/joona/cat               -> origin/dev/joona/cat
2025-12-04T09:33:41.1470291Z  * [new branch]              dev/joona/embeddingbag      -> origin/dev/joona/embeddingbag
2025-12-04T09:33:41.1472065Z  * [new branch]              dev/joona/fix_sdpa_memtest  -> origin/dev/joona/fix_sdpa_memtest
2025-12-04T09:33:41.1473882Z  * [new branch]              dev/joona/getTensorsString  -> origin/dev/joona/getTensorsString
2025-12-04T09:33:41.1475605Z  * [new branch]              dev/joona/mps_linear_macos14 -> origin/dev/joona/mps_linear_macos14
2025-12-04T09:33:41.1477681Z  * [new branch]              dev/joona/scalar_clamp      -> origin/dev/joona/scalar_clamp
2025-12-04T09:33:41.1479688Z  * [new branch]              dev/joona/sdpa              -> origin/dev/joona/sdpa
2025-12-04T09:33:41.1481834Z  * [new branch]              dev/joona/sdpa_api          -> origin/dev/joona/sdpa_api
2025-12-04T09:33:41.1483581Z  * [new branch]              dev/joona/type_inf          -> origin/dev/joona/type_inf
2025-12-04T09:33:41.1485411Z  * [new branch]              dev/joona/ulpAssertClose    -> origin/dev/joona/ulpAssertClose
2025-12-04T09:33:41.1487088Z  * [new branch]              dev/joona/upsize3d          -> origin/dev/joona/upsize3d
2025-12-04T09:33:41.1488584Z  * [new branch]              disp_counter                -> origin/disp_counter
2025-12-04T09:33:41.1490164Z  * [new branch]              divyanshk-patch-1           -> origin/divyanshk-patch-1
2025-12-04T09:33:41.1491427Z  * [new branch]              docs                        -> origin/docs
2025-12-04T09:33:41.1493091Z  * [new branch]              documentation               -> origin/documentation
2025-12-04T09:33:41.1494606Z  * [new branch]              eager_model_benchmarks      -> origin/eager_model_benchmarks
2025-12-04T09:33:41.1496715Z  * [new branch]              embg/test_inductor_ci_control -> origin/embg/test_inductor_ci_control
2025-12-04T09:33:41.1498086Z  * [new branch]              embg/triton_l2_prefetch_128B -> origin/embg/triton_l2_prefetch_128B
2025-12-04T09:33:41.1499361Z  * [new branch]              embg/triton_l2_prefetch_256B -> origin/embg/triton_l2_prefetch_256B
2025-12-04T09:33:41.1500915Z  * [new branch]              eqy-patch-1                 -> origin/eqy-patch-1
2025-12-04T09:33:41.1502457Z  * [new branch]              eqy-patch-2                 -> origin/eqy-patch-2
2025-12-04T09:33:41.1504045Z  * [new branch]              eqy-patch-3                 -> origin/eqy-patch-3
2025-12-04T09:33:41.1505598Z  * [new branch]              eqy-patch-4                 -> origin/eqy-patch-4
2025-12-04T09:33:41.1507116Z  * [new branch]              eqy-patch-5                 -> origin/eqy-patch-5
2025-12-04T09:33:41.1508467Z  * [new branch]              eqy-patch-6                 -> origin/eqy-patch-6
2025-12-04T09:33:41.1510523Z  * [new branch]              exclamaforte/amd-ma         -> origin/exclamaforte/amd-ma
2025-12-04T09:33:41.1512154Z  * [new branch]              exclamaforte/combo-kernels-perf-run -> origin/exclamaforte/combo-kernels-perf-run
2025-12-04T09:33:41.1513482Z  * [new branch]              exclamaforte/do_bench_refactor -> origin/exclamaforte/do_bench_refactor
2025-12-04T09:33:41.1514913Z  * [new branch]              exclamaforte/enable-mem-dep-fusion -> origin/exclamaforte/enable-mem-dep-fusion
2025-12-04T09:33:41.1516410Z  * [new branch]              exclamaforte/fix-exhaustive-autotuning -> origin/exclamaforte/fix-exhaustive-autotuning
2025-12-04T09:33:41.1518251Z  * [new branch]              exclamaforte/fix-trace-parsing-fx-svg -> origin/exclamaforte/fix-trace-parsing-fx-svg
2025-12-04T09:33:41.1520180Z  * [new branch]              exclamaforte/force-pointwise-cat-perf-run -> origin/exclamaforte/force-pointwise-cat-perf-run
2025-12-04T09:33:41.1521446Z  * [new branch]              exclamaforte/fusion-data    -> origin/exclamaforte/fusion-data
2025-12-04T09:33:41.1523273Z  * [new branch]              exclamaforte/gemm-benchmark-run -> origin/exclamaforte/gemm-benchmark-run
2025-12-04T09:33:41.1524430Z  * [new branch]              exclamaforte/gemm-export-model -> origin/exclamaforte/gemm-export-model
2025-12-04T09:33:41.1525796Z  * [new branch]              exclamaforte/gemm-model     -> origin/exclamaforte/gemm-model
2025-12-04T09:33:41.1527550Z  * [new branch]              exclamaforte/gemm-model-all-data-collection -> origin/exclamaforte/gemm-model-all-data-collection
2025-12-04T09:33:41.1528669Z  * [new branch]              exclamaforte/gemm-to-amd    -> origin/exclamaforte/gemm-to-amd
2025-12-04T09:33:41.1530053Z  * [new branch]              exclamaforte/just-gemm-model -> origin/exclamaforte/just-gemm-model
2025-12-04T09:33:41.1531818Z  * [new branch]              exclamaforte/just-gemm-model-no-refactor -> origin/exclamaforte/just-gemm-model-no-refactor
2025-12-04T09:33:41.1533149Z  * [new branch]              exclamaforte/profile-diff-algo -> origin/exclamaforte/profile-diff-algo
2025-12-04T09:33:41.1534671Z  * [new branch]              exclamaforte/profiler-visualization -> origin/exclamaforte/profiler-visualization
2025-12-04T09:33:41.1536107Z  * [new branch]              exclamaforte/test_cpp_wrapper_mode -> origin/exclamaforte/test_cpp_wrapper_mode
2025-12-04T09:33:41.1537779Z  * [new branch]              exclamaforte/update-autotune-configs -> origin/exclamaforte/update-autotune-configs
2025-12-04T09:33:41.1539263Z  * [new branch]              exclamaforte/update-autotune-configs-2 -> origin/exclamaforte/update-autotune-configs-2
2025-12-04T09:33:41.1540543Z  * [new branch]              exec                        -> origin/exec
2025-12-04T09:33:41.1542332Z  * [new branch]              experimental-mosaic         -> origin/experimental-mosaic
2025-12-04T09:33:41.1543820Z  * [new branch]              export-D61047529            -> origin/export-D61047529
2025-12-04T09:33:41.1545373Z  * [new branch]              export-D71412006            -> origin/export-D71412006
2025-12-04T09:33:41.1547056Z  * [new branch]              export-D73042989            -> origin/export-D73042989
2025-12-04T09:33:41.1548495Z  * [new branch]              export-D78957093            -> origin/export-D78957093
2025-12-04T09:33:41.1549942Z  * [new branch]              export-D78996107            -> origin/export-D78996107
2025-12-04T09:33:41.1551377Z  * [new branch]              export-D80823877            -> origin/export-D80823877
2025-12-04T09:33:41.1552965Z  * [new branch]              export-D80958642            -> origin/export-D80958642
2025-12-04T09:33:41.1554416Z  * [new branch]              export-D81054193            -> origin/export-D81054193
2025-12-04T09:33:41.1555861Z  * [new branch]              export-D81204584            -> origin/export-D81204584
2025-12-04T09:33:41.1557154Z  * [new branch]              export-D81429090            -> origin/export-D81429090
2025-12-04T09:33:41.1558842Z  * [new branch]              export-D82250826            -> origin/export-D82250826
2025-12-04T09:33:41.1560354Z  * [new branch]              export-D82253817            -> origin/export-D82253817
2025-12-04T09:33:41.1561641Z  * [new branch]              export-D83541846            -> origin/export-D83541846
2025-12-04T09:33:41.1563208Z  * [new branch]              export-D83627170            -> origin/export-D83627170
2025-12-04T09:33:41.1564672Z  * [new branch]              export-D83766701            -> origin/export-D83766701
2025-12-04T09:33:41.1566138Z  * [new branch]              export-D83768878            -> origin/export-D83768878
2025-12-04T09:33:41.1567575Z  * [new branch]              export-D83769447            -> origin/export-D83769447
2025-12-04T09:33:41.1569167Z  * [new branch]              export-D84089824            -> origin/export-D84089824
2025-12-04T09:33:41.1570316Z  * [new branch]              export-D84213020            -> origin/export-D84213020
2025-12-04T09:33:41.1572752Z  * [new branch]              export-D84373821            -> origin/export-D84373821
2025-12-04T09:33:41.1574422Z  * [new branch]              export-D84612194            -> origin/export-D84612194
2025-12-04T09:33:41.1575802Z  * [new branch]              export-D84890985            -> origin/export-D84890985
2025-12-04T09:33:41.1577361Z  * [new branch]              export-D85122326            -> origin/export-D85122326
2025-12-04T09:33:41.1578933Z  * [new branch]              export-D86256198            -> origin/export-D86256198
2025-12-04T09:33:41.1580404Z  * [new branch]              export-D86460608            -> origin/export-D86460608
2025-12-04T09:33:41.1582009Z  * [new branch]              export-D86474796            -> origin/export-D86474796
2025-12-04T09:33:41.1583697Z  * [new branch]              export-D86712396            -> origin/export-D86712396
2025-12-04T09:33:41.1585255Z  * [new branch]              export-D87022129            -> origin/export-D87022129
2025-12-04T09:33:41.1586779Z  * [new branch]              export-D87838959            -> origin/export-D87838959
2025-12-04T09:33:41.1588315Z  * [new branch]              export-D88319437            -> origin/export-D88319437
2025-12-04T09:33:41.1590010Z  * [new branch]              exported-model-train-idempotent -> origin/exported-model-train-idempotent
2025-12-04T09:33:41.1591354Z  * [new branch]              ezyang-titan-october        -> origin/ezyang-titan-october
2025-12-04T09:33:41.1592811Z  * [new branch]              ezyang-titan-october2       -> origin/ezyang-titan-october2
2025-12-04T09:33:41.1594098Z  * [new branch]              ezyang-war                  -> origin/ezyang-war
2025-12-04T09:33:41.1596279Z  * [new branch]              ezyang/wip-aot-descriptors  -> origin/ezyang/wip-aot-descriptors
2025-12-04T09:33:41.1597482Z  * [new branch]              fa_u8_brgemm                -> origin/fa_u8_brgemm
2025-12-04T09:33:41.1599584Z  * [new branch]              fadeputr/sequence_fbgemm    -> origin/fadeputr/sequence_fbgemm
2025-12-04T09:33:41.1601043Z  * [new branch]              fastmath_baseline           -> origin/fastmath_baseline
2025-12-04T09:33:41.1603080Z  * [new branch]              fbcode/warm                 -> origin/fbcode/warm
2025-12-04T09:33:41.1604719Z  * [new branch]              fca                         -> origin/fca
2025-12-04T09:33:41.1606138Z  * [new branch]              fca2_ca5984c                -> origin/fca2_ca5984c
2025-12-04T09:33:41.1607629Z  * [new branch]              fca5                        -> origin/fca5
2025-12-04T09:33:41.1609648Z  * [new branch]              feature/justknobs-cpp       -> origin/feature/justknobs-cpp
2025-12-04T09:33:41.1611029Z  * [new branch]              feature/numa-forkserver     -> origin/feature/numa-forkserver
2025-12-04T09:33:41.1613055Z  * [new branch]              ffast_math_baseline         -> origin/ffast_math_baseline
2025-12-04T09:33:41.1614498Z  * [new branch]              ffast_math_target           -> origin/ffast_math_target
2025-12-04T09:33:41.1616548Z  * [new branch]              findhao/base_commit         -> origin/findhao/base_commit
2025-12-04T09:33:41.1618063Z  * [new branch]              findhao/base_commit1        -> origin/findhao/base_commit1
2025-12-04T09:33:41.1619535Z  * [new branch]              findhao/multistream2        -> origin/findhao/multistream2
2025-12-04T09:33:41.1620778Z  * [new branch]              findhao/multistream5        -> origin/findhao/multistream5
2025-12-04T09:33:41.1622080Z  * [new branch]              findhao/multistream6        -> origin/findhao/multistream6
2025-12-04T09:33:41.1623422Z  * [new branch]              findhao/operatorbench3      -> origin/findhao/operatorbench3
2025-12-04T09:33:41.1624726Z  * [new branch]              findhao/operatorbench5      -> origin/findhao/operatorbench5
2025-12-04T09:33:41.1625996Z  * [new branch]              findhao/tritonparse         -> origin/findhao/tritonparse
2025-12-04T09:33:41.1627655Z  * [new branch]              fix-ck-gemm-template-format -> origin/fix-ck-gemm-template-format
2025-12-04T09:33:41.1629120Z  * [new branch]              fix-config-ignore           -> origin/fix-config-ignore
2025-12-04T09:33:41.1630543Z  * [new branch]              fix-dict-guard              -> origin/fix-dict-guard
2025-12-04T09:33:41.1632188Z  * [new branch]              fix_addmm_issue             -> origin/fix_addmm_issue
2025-12-04T09:33:41.1633717Z  * [new branch]              fix_amd_missing_cluster_dims -> origin/fix_amd_missing_cluster_dims
2025-12-04T09:33:41.1634992Z  * [new branch]              fix_bench_bwd_pass          -> origin/fix_bench_bwd_pass
2025-12-04T09:33:41.1636564Z  * [new branch]              fix_mem_profiler_config     -> origin/fix_mem_profiler_config
2025-12-04T09:33:41.1637871Z  * [new branch]              fix_nvrtc_discovery         -> origin/fix_nvrtc_discovery
2025-12-04T09:33:41.1639373Z  * [new branch]              fix_op_runner               -> origin/fix_op_runner
2025-12-04T09:33:41.1641066Z  * [new branch]              fix_ubn_159469              -> origin/fix_ubn_159469
2025-12-04T09:33:41.1642427Z  * [new branch]              fixes-triage                -> origin/fixes-triage
2025-12-04T09:33:41.1643893Z  * [new branch]              fixflashinfer               -> origin/fixflashinfer
2025-12-04T09:33:41.1645328Z  * [new branch]              flash_decoding_cpu          -> origin/flash_decoding_cpu
2025-12-04T09:33:41.1647146Z  * [new branch]              flex-flash                  -> origin/flex-flash
2025-12-04T09:33:41.1648689Z  * [new branch]              flex_attention_functorch_grad -> origin/flex_attention_functorch_grad
2025-12-04T09:33:41.1649986Z  * [new branch]              flex_flash                  -> origin/flex_flash
2025-12-04T09:33:41.1652234Z  * [new branch]              fmassa/fix_memeff_sharding_rule -> origin/fmassa/fix_memeff_sharding_rule
2025-12-04T09:33:41.1683495Z  * [new branch]              fmassa/tests_comm_compute_scheduler -> origin/fmassa/tests_comm_compute_scheduler
2025-12-04T09:33:41.1684424Z  * [new branch]              forkserver_fix              -> origin/forkserver_fix
2025-12-04T09:33:41.1685075Z  * [new branch]              fsdp2_trace_rules           -> origin/fsdp2_trace_rules
2025-12-04T09:33:41.1685652Z  * [new branch]              fx_cpp                      -> origin/fx_cpp
2025-12-04T09:33:41.1686219Z  * [new branch]              fy/fix-win                  -> origin/fy/fix-win
2025-12-04T09:33:41.1686818Z  * [new branch]              galv-patch-1                -> origin/galv-patch-1
2025-12-04T09:33:41.1687593Z  * [new branch]              galv/cudagraphs-conditional-nodes-4 -> origin/galv/cudagraphs-conditional-nodes-4
2025-12-04T09:33:41.1688443Z  * [new branch]              georgehong/cmakelists-patch -> origin/georgehong/cmakelists-patch
2025-12-04T09:33:41.1689155Z  * [new branch]              gh/AlnisM/1/base            -> origin/gh/AlnisM/1/base
2025-12-04T09:33:41.1689776Z  * [new branch]              gh/AlnisM/1/head            -> origin/gh/AlnisM/1/head
2025-12-04T09:33:41.1690407Z  * [new branch]              gh/EikanWang/67/base        -> origin/gh/EikanWang/67/base
2025-12-04T09:33:41.1691047Z  * [new branch]              gh/EikanWang/67/head        -> origin/gh/EikanWang/67/head
2025-12-04T09:33:41.1691683Z  * [new branch]              gh/Gasoonjia/1/base         -> origin/gh/Gasoonjia/1/base
2025-12-04T09:33:41.1692308Z  * [new branch]              gh/Gasoonjia/1/head         -> origin/gh/Gasoonjia/1/head
2025-12-04T09:33:41.1692933Z  * [new branch]              gh/H-Huang/131/base         -> origin/gh/H-Huang/131/base
2025-12-04T09:33:41.1693530Z  * [new branch]              gh/H-Huang/131/head         -> origin/gh/H-Huang/131/head
2025-12-04T09:33:41.1694134Z  * [new branch]              gh/H-Huang/131/orig         -> origin/gh/H-Huang/131/orig
2025-12-04T09:33:41.1694735Z  * [new branch]              gh/H-Huang/132/base         -> origin/gh/H-Huang/132/base
2025-12-04T09:33:41.1695339Z  * [new branch]              gh/H-Huang/132/head         -> origin/gh/H-Huang/132/head
2025-12-04T09:33:41.1695940Z  * [new branch]              gh/H-Huang/132/orig         -> origin/gh/H-Huang/132/orig
2025-12-04T09:33:41.1696839Z  * [new branch]              gh/H-Huang/180/base         -> origin/gh/H-Huang/180/base
2025-12-04T09:33:41.1697447Z  * [new branch]              gh/H-Huang/180/head         -> origin/gh/H-Huang/180/head
2025-12-04T09:33:41.1698052Z  * [new branch]              gh/H-Huang/180/orig         -> origin/gh/H-Huang/180/orig
2025-12-04T09:33:41.1698658Z  * [new branch]              gh/H-Huang/182/base         -> origin/gh/H-Huang/182/base
2025-12-04T09:33:41.1699267Z  * [new branch]              gh/H-Huang/182/head         -> origin/gh/H-Huang/182/head
2025-12-04T09:33:41.1699875Z  * [new branch]              gh/H-Huang/182/orig         -> origin/gh/H-Huang/182/orig
2025-12-04T09:33:41.1701714Z  * [new branch]              gh/H-Huang/226/base         -> origin/gh/H-Huang/226/base
2025-12-04T09:33:41.1703105Z  * [new branch]              gh/H-Huang/226/head         -> origin/gh/H-Huang/226/head
2025-12-04T09:33:41.1704593Z  * [new branch]              gh/H-Huang/226/orig         -> origin/gh/H-Huang/226/orig
2025-12-04T09:33:41.1706581Z  * [new branch]              gh/H-Huang/228/base         -> origin/gh/H-Huang/228/base
2025-12-04T09:33:41.1708038Z  * [new branch]              gh/H-Huang/228/head         -> origin/gh/H-Huang/228/head
2025-12-04T09:33:41.1709499Z  * [new branch]              gh/H-Huang/228/orig         -> origin/gh/H-Huang/228/orig
2025-12-04T09:33:41.1712087Z  * [new branch]              gh/IvanKobzarev/150/base    -> origin/gh/IvanKobzarev/150/base
2025-12-04T09:33:41.1713408Z  * [new branch]              gh/IvanKobzarev/150/head    -> origin/gh/IvanKobzarev/150/head
2025-12-04T09:33:41.1714969Z  * [new branch]              gh/IvanKobzarev/150/orig    -> origin/gh/IvanKobzarev/150/orig
2025-12-04T09:33:41.1717125Z  * [new branch]              gh/IvanKobzarev/157/base    -> origin/gh/IvanKobzarev/157/base
2025-12-04T09:33:41.1718682Z  * [new branch]              gh/IvanKobzarev/157/head    -> origin/gh/IvanKobzarev/157/head
2025-12-04T09:33:41.1720186Z  * [new branch]              gh/IvanKobzarev/157/orig    -> origin/gh/IvanKobzarev/157/orig
2025-12-04T09:33:41.1722235Z  * [new branch]              gh/IvanKobzarev/159/base    -> origin/gh/IvanKobzarev/159/base
2025-12-04T09:33:41.1723724Z  * [new branch]              gh/IvanKobzarev/159/head    -> origin/gh/IvanKobzarev/159/head
2025-12-04T09:33:41.1725229Z  * [new branch]              gh/IvanKobzarev/159/orig    -> origin/gh/IvanKobzarev/159/orig
2025-12-04T09:33:41.1727237Z  * [new branch]              gh/IvanKobzarev/162/base    -> origin/gh/IvanKobzarev/162/base
2025-12-04T09:33:41.1728840Z  * [new branch]              gh/IvanKobzarev/162/head    -> origin/gh/IvanKobzarev/162/head
2025-12-04T09:33:41.1730217Z  * [new branch]              gh/IvanKobzarev/162/orig    -> origin/gh/IvanKobzarev/162/orig
2025-12-04T09:33:41.1732297Z  * [new branch]              gh/IvanKobzarev/163/base    -> origin/gh/IvanKobzarev/163/base
2025-12-04T09:33:41.1733575Z  * [new branch]              gh/IvanKobzarev/163/head    -> origin/gh/IvanKobzarev/163/head
2025-12-04T09:33:41.1735061Z  * [new branch]              gh/IvanKobzarev/163/orig    -> origin/gh/IvanKobzarev/163/orig
2025-12-04T09:33:41.1737285Z  * [new branch]              gh/IvanKobzarev/166/base    -> origin/gh/IvanKobzarev/166/base
2025-12-04T09:33:41.1738842Z  * [new branch]              gh/IvanKobzarev/166/head    -> origin/gh/IvanKobzarev/166/head
2025-12-04T09:33:41.1740168Z  * [new branch]              gh/IvanKobzarev/166/orig    -> origin/gh/IvanKobzarev/166/orig
2025-12-04T09:33:41.1742372Z  * [new branch]              gh/IvanKobzarev/167/base    -> origin/gh/IvanKobzarev/167/base
2025-12-04T09:33:41.1743692Z  * [new branch]              gh/IvanKobzarev/167/head    -> origin/gh/IvanKobzarev/167/head
2025-12-04T09:33:41.1745156Z  * [new branch]              gh/IvanKobzarev/167/orig    -> origin/gh/IvanKobzarev/167/orig
2025-12-04T09:33:41.1747127Z  * [new branch]              gh/IvanKobzarev/168/base    -> origin/gh/IvanKobzarev/168/base
2025-12-04T09:33:41.1749165Z  * [new branch]              gh/IvanKobzarev/168/head    -> origin/gh/IvanKobzarev/168/head
2025-12-04T09:33:41.1750058Z  * [new branch]              gh/IvanKobzarev/168/orig    -> origin/gh/IvanKobzarev/168/orig
2025-12-04T09:33:41.1752142Z  * [new branch]              gh/IvanKobzarev/169/base    -> origin/gh/IvanKobzarev/169/base
2025-12-04T09:33:41.1753638Z  * [new branch]              gh/IvanKobzarev/169/head    -> origin/gh/IvanKobzarev/169/head
2025-12-04T09:33:41.1755125Z  * [new branch]              gh/IvanKobzarev/169/orig    -> origin/gh/IvanKobzarev/169/orig
2025-12-04T09:33:41.1756956Z  * [new branch]              gh/IvanKobzarev/170/base    -> origin/gh/IvanKobzarev/170/base
2025-12-04T09:33:41.1758463Z  * [new branch]              gh/IvanKobzarev/170/head    -> origin/gh/IvanKobzarev/170/head
2025-12-04T09:33:41.1759810Z  * [new branch]              gh/IvanKobzarev/170/orig    -> origin/gh/IvanKobzarev/170/orig
2025-12-04T09:33:41.1762141Z  * [new branch]              gh/IvanKobzarev/171/base    -> origin/gh/IvanKobzarev/171/base
2025-12-04T09:33:41.1763643Z  * [new branch]              gh/IvanKobzarev/171/head    -> origin/gh/IvanKobzarev/171/head
2025-12-04T09:33:41.1764974Z  * [new branch]              gh/IvanKobzarev/171/orig    -> origin/gh/IvanKobzarev/171/orig
2025-12-04T09:33:41.1767016Z  * [new branch]              gh/IvanKobzarev/172/base    -> origin/gh/IvanKobzarev/172/base
2025-12-04T09:33:41.1768589Z  * [new branch]              gh/IvanKobzarev/172/head    -> origin/gh/IvanKobzarev/172/head
2025-12-04T09:33:41.1770080Z  * [new branch]              gh/IvanKobzarev/172/orig    -> origin/gh/IvanKobzarev/172/orig
2025-12-04T09:33:41.1772259Z  * [new branch]              gh/IvanKobzarev/173/base    -> origin/gh/IvanKobzarev/173/base
2025-12-04T09:33:41.1773774Z  * [new branch]              gh/IvanKobzarev/173/head    -> origin/gh/IvanKobzarev/173/head
2025-12-04T09:33:41.1775237Z  * [new branch]              gh/IvanKobzarev/173/orig    -> origin/gh/IvanKobzarev/173/orig
2025-12-04T09:33:41.1777335Z  * [new branch]              gh/IvanKobzarev/174/base    -> origin/gh/IvanKobzarev/174/base
2025-12-04T09:33:41.1778898Z  * [new branch]              gh/IvanKobzarev/174/head    -> origin/gh/IvanKobzarev/174/head
2025-12-04T09:33:41.1780286Z  * [new branch]              gh/IvanKobzarev/174/orig    -> origin/gh/IvanKobzarev/174/orig
2025-12-04T09:33:41.1782487Z  * [new branch]              gh/IvanKobzarev/175/base    -> origin/gh/IvanKobzarev/175/base
2025-12-04T09:33:41.1784038Z  * [new branch]              gh/IvanKobzarev/175/head    -> origin/gh/IvanKobzarev/175/head
2025-12-04T09:33:41.1785553Z  * [new branch]              gh/IvanKobzarev/175/orig    -> origin/gh/IvanKobzarev/175/orig
2025-12-04T09:33:41.1787749Z  * [new branch]              gh/IvanKobzarev/176/base    -> origin/gh/IvanKobzarev/176/base
2025-12-04T09:33:41.1789131Z  * [new branch]              gh/IvanKobzarev/176/head    -> origin/gh/IvanKobzarev/176/head
2025-12-04T09:33:41.1790734Z  * [new branch]              gh/IvanKobzarev/176/orig    -> origin/gh/IvanKobzarev/176/orig
2025-12-04T09:33:41.1793068Z  * [new branch]              gh/IvanKobzarev/177/base    -> origin/gh/IvanKobzarev/177/base
2025-12-04T09:33:41.1794638Z  * [new branch]              gh/IvanKobzarev/177/head    -> origin/gh/IvanKobzarev/177/head
2025-12-04T09:33:41.1795973Z  * [new branch]              gh/IvanKobzarev/177/orig    -> origin/gh/IvanKobzarev/177/orig
2025-12-04T09:33:41.1798241Z  * [new branch]              gh/IvanKobzarev/178/base    -> origin/gh/IvanKobzarev/178/base
2025-12-04T09:33:41.1799797Z  * [new branch]              gh/IvanKobzarev/178/head    -> origin/gh/IvanKobzarev/178/head
2025-12-04T09:33:41.1801312Z  * [new branch]              gh/IvanKobzarev/178/orig    -> origin/gh/IvanKobzarev/178/orig
2025-12-04T09:33:41.1803454Z  * [new branch]              gh/IvanKobzarev/179/base    -> origin/gh/IvanKobzarev/179/base
2025-12-04T09:33:41.1804798Z  * [new branch]              gh/IvanKobzarev/179/head    -> origin/gh/IvanKobzarev/179/head
2025-12-04T09:33:41.1806477Z  * [new branch]              gh/IvanKobzarev/179/orig    -> origin/gh/IvanKobzarev/179/orig
2025-12-04T09:33:41.1808386Z  * [new branch]              gh/IvanKobzarev/180/base    -> origin/gh/IvanKobzarev/180/base
2025-12-04T09:33:41.1809915Z  * [new branch]              gh/IvanKobzarev/180/head    -> origin/gh/IvanKobzarev/180/head
2025-12-04T09:33:41.1811291Z  * [new branch]              gh/IvanKobzarev/180/orig    -> origin/gh/IvanKobzarev/180/orig
2025-12-04T09:33:41.1813640Z  * [new branch]              gh/IvanKobzarev/181/base    -> origin/gh/IvanKobzarev/181/base
2025-12-04T09:33:41.1815278Z  * [new branch]              gh/IvanKobzarev/181/head    -> origin/gh/IvanKobzarev/181/head
2025-12-04T09:33:41.1816672Z  * [new branch]              gh/IvanKobzarev/181/orig    -> origin/gh/IvanKobzarev/181/orig
2025-12-04T09:33:41.1819095Z  * [new branch]              gh/IvanKobzarev/182/base    -> origin/gh/IvanKobzarev/182/base
2025-12-04T09:33:41.1820474Z  * [new branch]              gh/IvanKobzarev/182/head    -> origin/gh/IvanKobzarev/182/head
2025-12-04T09:33:41.1821962Z  * [new branch]              gh/IvanKobzarev/182/orig    -> origin/gh/IvanKobzarev/182/orig
2025-12-04T09:33:41.1824165Z  * [new branch]              gh/IvanKobzarev/183/base    -> origin/gh/IvanKobzarev/183/base
2025-12-04T09:33:41.1825586Z  * [new branch]              gh/IvanKobzarev/183/head    -> origin/gh/IvanKobzarev/183/head
2025-12-04T09:33:41.1827243Z  * [new branch]              gh/IvanKobzarev/183/orig    -> origin/gh/IvanKobzarev/183/orig
2025-12-04T09:33:41.1829254Z  * [new branch]              gh/IvanKobzarev/184/base    -> origin/gh/IvanKobzarev/184/base
2025-12-04T09:33:41.1830779Z  * [new branch]              gh/IvanKobzarev/184/head    -> origin/gh/IvanKobzarev/184/head
2025-12-04T09:33:41.1832292Z  * [new branch]              gh/IvanKobzarev/184/orig    -> origin/gh/IvanKobzarev/184/orig
2025-12-04T09:33:41.1834700Z  * [new branch]              gh/NikhilAPatel/1/base      -> origin/gh/NikhilAPatel/1/base
2025-12-04T09:33:41.1836339Z  * [new branch]              gh/NikhilAPatel/1/head      -> origin/gh/NikhilAPatel/1/head
2025-12-04T09:33:41.1838145Z  * [new branch]              gh/NikhilAPatel/2/base      -> origin/gh/NikhilAPatel/2/base
2025-12-04T09:33:41.1839714Z  * [new branch]              gh/NikhilAPatel/2/head      -> origin/gh/NikhilAPatel/2/head
2025-12-04T09:33:41.1841806Z  * [new branch]              gh/NikhilAPatel/4/base      -> origin/gh/NikhilAPatel/4/base
2025-12-04T09:33:41.1843420Z  * [new branch]              gh/NikhilAPatel/4/head      -> origin/gh/NikhilAPatel/4/head
2025-12-04T09:33:41.1845349Z  * [new branch]              gh/NikhilAPatel/5/base      -> origin/gh/NikhilAPatel/5/base
2025-12-04T09:33:41.1846918Z  * [new branch]              gh/NikhilAPatel/5/head      -> origin/gh/NikhilAPatel/5/head
2025-12-04T09:33:41.1848447Z  * [new branch]              gh/NikhilAPatel/5/orig      -> origin/gh/NikhilAPatel/5/orig
2025-12-04T09:33:41.1850787Z  * [new branch]              gh/PaliC/17/base            -> origin/gh/PaliC/17/base
2025-12-04T09:33:41.1852244Z  * [new branch]              gh/PaliC/17/head            -> origin/gh/PaliC/17/head
2025-12-04T09:33:41.1853734Z  * [new branch]              gh/PaliC/17/orig            -> origin/gh/PaliC/17/orig
2025-12-04T09:33:41.1855743Z  * [new branch]              gh/PaliC/18/base            -> origin/gh/PaliC/18/base
2025-12-04T09:33:41.1857369Z  * [new branch]              gh/PaliC/18/head            -> origin/gh/PaliC/18/head
2025-12-04T09:33:41.1858901Z  * [new branch]              gh/PaliC/18/orig            -> origin/gh/PaliC/18/orig
2025-12-04T09:33:41.1861015Z  * [new branch]              gh/PaliC/20/base            -> origin/gh/PaliC/20/base
2025-12-04T09:33:41.1862414Z  * [new branch]              gh/PaliC/20/head            -> origin/gh/PaliC/20/head
2025-12-04T09:33:41.1863920Z  * [new branch]              gh/PaliC/20/orig            -> origin/gh/PaliC/20/orig
2025-12-04T09:33:41.1865810Z  * [new branch]              gh/PaliC/21/base            -> origin/gh/PaliC/21/base
2025-12-04T09:33:41.1867432Z  * [new branch]              gh/PaliC/21/head            -> origin/gh/PaliC/21/head
2025-12-04T09:33:41.1868699Z  * [new branch]              gh/PaliC/21/orig            -> origin/gh/PaliC/21/orig
2025-12-04T09:33:41.1870650Z  * [new branch]              gh/PaliC/23/base            -> origin/gh/PaliC/23/base
2025-12-04T09:33:41.1872910Z  * [new branch]              gh/PaliC/23/head            -> origin/gh/PaliC/23/head
2025-12-04T09:33:41.1874380Z  * [new branch]              gh/PaliC/23/orig            -> origin/gh/PaliC/23/orig
2025-12-04T09:33:41.1876330Z  * [new branch]              gh/PaliC/24/base            -> origin/gh/PaliC/24/base
2025-12-04T09:33:41.1877872Z  * [new branch]              gh/PaliC/24/head            -> origin/gh/PaliC/24/head
2025-12-04T09:33:41.1879364Z  * [new branch]              gh/PaliC/24/orig            -> origin/gh/PaliC/24/orig
2025-12-04T09:33:41.1881209Z  * [new branch]              gh/PaliC/25/head            -> origin/gh/PaliC/25/head
2025-12-04T09:33:41.1882664Z  * [new branch]              gh/PaliC/25/next            -> origin/gh/PaliC/25/next
2025-12-04T09:33:41.1884231Z  * [new branch]              gh/PaliC/25/orig            -> origin/gh/PaliC/25/orig
2025-12-04T09:33:41.1886164Z  * [new branch]              gh/PaliC/26/head            -> origin/gh/PaliC/26/head
2025-12-04T09:33:41.1887387Z  * [new branch]              gh/PaliC/26/next            -> origin/gh/PaliC/26/next
2025-12-04T09:33:41.1889001Z  * [new branch]              gh/PaliC/26/orig            -> origin/gh/PaliC/26/orig
2025-12-04T09:33:41.1890931Z  * [new branch]              gh/PaliC/27/next            -> origin/gh/PaliC/27/next
2025-12-04T09:33:41.1892936Z  * [new branch]              gh/PaliC/28/head            -> origin/gh/PaliC/28/head
2025-12-04T09:33:41.1894193Z  * [new branch]              gh/PaliC/28/next            -> origin/gh/PaliC/28/next
2025-12-04T09:33:41.1895827Z  * [new branch]              gh/PaliC/28/orig            -> origin/gh/PaliC/28/orig
2025-12-04T09:33:41.1897874Z  * [new branch]              gh/PaliC/29/head            -> origin/gh/PaliC/29/head
2025-12-04T09:33:41.1899307Z  * [new branch]              gh/PaliC/29/next            -> origin/gh/PaliC/29/next
2025-12-04T09:33:41.1900796Z  * [new branch]              gh/PaliC/29/orig            -> origin/gh/PaliC/29/orig
2025-12-04T09:33:41.1902744Z  * [new branch]              gh/PaliC/30/head            -> origin/gh/PaliC/30/head
2025-12-04T09:33:41.1903982Z  * [new branch]              gh/PaliC/30/next            -> origin/gh/PaliC/30/next
2025-12-04T09:33:41.1905508Z  * [new branch]              gh/PaliC/30/orig            -> origin/gh/PaliC/30/orig
2025-12-04T09:33:41.1907504Z  * [new branch]              gh/PaliC/31/head            -> origin/gh/PaliC/31/head
2025-12-04T09:33:41.1908704Z  * [new branch]              gh/PaliC/31/next            -> origin/gh/PaliC/31/next
2025-12-04T09:33:41.1910586Z  * [new branch]              gh/PaliC/31/orig            -> origin/gh/PaliC/31/orig
2025-12-04T09:33:41.1912920Z  * [new branch]              gh/PaulZhang12/25/base      -> origin/gh/PaulZhang12/25/base
2025-12-04T09:33:41.1914543Z  * [new branch]              gh/PaulZhang12/25/head      -> origin/gh/PaulZhang12/25/head
2025-12-04T09:33:41.1916088Z  * [new branch]              gh/PaulZhang12/25/orig      -> origin/gh/PaulZhang12/25/orig
2025-12-04T09:33:41.1918087Z  * [new branch]              gh/PaulZhang12/28/base      -> origin/gh/PaulZhang12/28/base
2025-12-04T09:33:41.1919673Z  * [new branch]              gh/PaulZhang12/28/head      -> origin/gh/PaulZhang12/28/head
2025-12-04T09:33:41.1921250Z  * [new branch]              gh/PaulZhang12/28/orig      -> origin/gh/PaulZhang12/28/orig
2025-12-04T09:33:41.1923481Z  * [new branch]              gh/PaulZhang12/31/base      -> origin/gh/PaulZhang12/31/base
2025-12-04T09:33:41.1925047Z  * [new branch]              gh/PaulZhang12/31/head      -> origin/gh/PaulZhang12/31/head
2025-12-04T09:33:41.1928555Z  * [new branch]              gh/PaulZhang12/31/orig      -> origin/gh/PaulZhang12/31/orig
2025-12-04T09:33:41.1929258Z  * [new branch]              gh/PaulZhang12/37/base      -> origin/gh/PaulZhang12/37/base
2025-12-04T09:33:41.1929948Z  * [new branch]              gh/PaulZhang12/37/head      -> origin/gh/PaulZhang12/37/head
2025-12-04T09:33:41.1931121Z  * [new branch]              gh/PaulZhang12/37/orig      -> origin/gh/PaulZhang12/37/orig
2025-12-04T09:33:41.1933228Z  * [new branch]              gh/PaulZhang12/40/base      -> origin/gh/PaulZhang12/40/base
2025-12-04T09:33:41.1934562Z  * [new branch]              gh/PaulZhang12/40/head      -> origin/gh/PaulZhang12/40/head
2025-12-04T09:33:41.1936091Z  * [new branch]              gh/PaulZhang12/40/orig      -> origin/gh/PaulZhang12/40/orig
2025-12-04T09:33:41.1938185Z  * [new branch]              gh/PaulZhang12/42/base      -> origin/gh/PaulZhang12/42/base
2025-12-04T09:33:41.1939632Z  * [new branch]              gh/PaulZhang12/42/head      -> origin/gh/PaulZhang12/42/head
2025-12-04T09:33:41.1941600Z  * [new branch]              gh/PaulZhang12/43/base      -> origin/gh/PaulZhang12/43/base
2025-12-04T09:33:41.1943103Z  * [new branch]              gh/PaulZhang12/43/head      -> origin/gh/PaulZhang12/43/head
2025-12-04T09:33:41.1944612Z  * [new branch]              gh/PaulZhang12/43/orig      -> origin/gh/PaulZhang12/43/orig
2025-12-04T09:33:41.1946408Z  * [new branch]              gh/PaulZhang12/44/base      -> origin/gh/PaulZhang12/44/base
2025-12-04T09:33:41.1947913Z  * [new branch]              gh/PaulZhang12/44/head      -> origin/gh/PaulZhang12/44/head
2025-12-04T09:33:41.1950000Z  * [new branch]              gh/PaulZhang12/45/base      -> origin/gh/PaulZhang12/45/base
2025-12-04T09:33:41.1951487Z  * [new branch]              gh/PaulZhang12/45/head      -> origin/gh/PaulZhang12/45/head
2025-12-04T09:33:41.1952940Z  * [new branch]              gh/PaulZhang12/45/orig      -> origin/gh/PaulZhang12/45/orig
2025-12-04T09:33:41.1954905Z  * [new branch]              gh/PaulZhang12/46/base      -> origin/gh/PaulZhang12/46/base
2025-12-04T09:33:41.1956405Z  * [new branch]              gh/PaulZhang12/46/head      -> origin/gh/PaulZhang12/46/head
2025-12-04T09:33:41.1957960Z  * [new branch]              gh/PaulZhang12/46/orig      -> origin/gh/PaulZhang12/46/orig
2025-12-04T09:33:41.1960047Z  * [new branch]              gh/PaulZhang12/47/base      -> origin/gh/PaulZhang12/47/base
2025-12-04T09:33:41.1961515Z  * [new branch]              gh/PaulZhang12/47/head      -> origin/gh/PaulZhang12/47/head
2025-12-04T09:33:41.1963050Z  * [new branch]              gh/PaulZhang12/47/orig      -> origin/gh/PaulZhang12/47/orig
2025-12-04T09:33:41.1964883Z  * [new branch]              gh/PaulZhang12/48/base      -> origin/gh/PaulZhang12/48/base
2025-12-04T09:33:41.1966227Z  * [new branch]              gh/PaulZhang12/48/head      -> origin/gh/PaulZhang12/48/head
2025-12-04T09:33:41.1967738Z  * [new branch]              gh/PaulZhang12/48/orig      -> origin/gh/PaulZhang12/48/orig
2025-12-04T09:33:41.1970062Z  * [new branch]              gh/SamGinzburg/11/base      -> origin/gh/SamGinzburg/11/base
2025-12-04T09:33:41.1971743Z  * [new branch]              gh/SamGinzburg/11/head      -> origin/gh/SamGinzburg/11/head
2025-12-04T09:33:41.1974316Z  * [new branch]              gh/SherlockNoMad/1/base     -> origin/gh/SherlockNoMad/1/base
2025-12-04T09:33:41.1975843Z  * [new branch]              gh/SherlockNoMad/1/head     -> origin/gh/SherlockNoMad/1/head
2025-12-04T09:33:41.1978021Z  * [new branch]              gh/SherlockNoMad/10/base    -> origin/gh/SherlockNoMad/10/base
2025-12-04T09:33:41.1979526Z  * [new branch]              gh/SherlockNoMad/10/head    -> origin/gh/SherlockNoMad/10/head
2025-12-04T09:33:41.1981113Z  * [new branch]              gh/SherlockNoMad/10/orig    -> origin/gh/SherlockNoMad/10/orig
2025-12-04T09:33:41.1982891Z  * [new branch]              gh/SherlockNoMad/11/base    -> origin/gh/SherlockNoMad/11/base
2025-12-04T09:33:41.1984427Z  * [new branch]              gh/SherlockNoMad/11/head    -> origin/gh/SherlockNoMad/11/head
2025-12-04T09:33:41.1985942Z  * [new branch]              gh/SherlockNoMad/11/orig    -> origin/gh/SherlockNoMad/11/orig
2025-12-04T09:33:41.1987649Z  * [new branch]              gh/SherlockNoMad/12/base    -> origin/gh/SherlockNoMad/12/base
2025-12-04T09:33:41.1989043Z  * [new branch]              gh/SherlockNoMad/12/head    -> origin/gh/SherlockNoMad/12/head
2025-12-04T09:33:41.1990534Z  * [new branch]              gh/SherlockNoMad/12/orig    -> origin/gh/SherlockNoMad/12/orig
2025-12-04T09:33:41.1992573Z  * [new branch]              gh/SherlockNoMad/15/base    -> origin/gh/SherlockNoMad/15/base
2025-12-04T09:33:41.1994063Z  * [new branch]              gh/SherlockNoMad/15/head    -> origin/gh/SherlockNoMad/15/head
2025-12-04T09:33:41.1995456Z  * [new branch]              gh/SherlockNoMad/15/orig    -> origin/gh/SherlockNoMad/15/orig
2025-12-04T09:33:41.1997418Z  * [new branch]              gh/SherlockNoMad/17/base    -> origin/gh/SherlockNoMad/17/base
2025-12-04T09:33:41.1998759Z  * [new branch]              gh/SherlockNoMad/17/head    -> origin/gh/SherlockNoMad/17/head
2025-12-04T09:33:41.2000255Z  * [new branch]              gh/SherlockNoMad/17/orig    -> origin/gh/SherlockNoMad/17/orig
2025-12-04T09:33:41.2002416Z  * [new branch]              gh/SherlockNoMad/18/base    -> origin/gh/SherlockNoMad/18/base
2025-12-04T09:33:41.2004079Z  * [new branch]              gh/SherlockNoMad/18/head    -> origin/gh/SherlockNoMad/18/head
2025-12-04T09:33:41.2005456Z  * [new branch]              gh/SherlockNoMad/18/orig    -> origin/gh/SherlockNoMad/18/orig
2025-12-04T09:33:41.2007321Z  * [new branch]              gh/SherlockNoMad/19/base    -> origin/gh/SherlockNoMad/19/base
2025-12-04T09:33:41.2008974Z  * [new branch]              gh/SherlockNoMad/19/head    -> origin/gh/SherlockNoMad/19/head
2025-12-04T09:33:41.2010495Z  * [new branch]              gh/SherlockNoMad/19/orig    -> origin/gh/SherlockNoMad/19/orig
2025-12-04T09:33:41.2012301Z  * [new branch]              gh/SherlockNoMad/2/base     -> origin/gh/SherlockNoMad/2/base
2025-12-04T09:33:41.2013600Z  * [new branch]              gh/SherlockNoMad/2/head     -> origin/gh/SherlockNoMad/2/head
2025-12-04T09:33:41.2015473Z  * [new branch]              gh/SherlockNoMad/20/base    -> origin/gh/SherlockNoMad/20/base
2025-12-04T09:33:41.2017186Z  * [new branch]              gh/SherlockNoMad/20/head    -> origin/gh/SherlockNoMad/20/head
2025-12-04T09:33:41.2018445Z  * [new branch]              gh/SherlockNoMad/20/orig    -> origin/gh/SherlockNoMad/20/orig
2025-12-04T09:33:41.2020745Z  * [new branch]              gh/SherlockNoMad/21/base    -> origin/gh/SherlockNoMad/21/base
2025-12-04T09:33:41.2022318Z  * [new branch]              gh/SherlockNoMad/21/head    -> origin/gh/SherlockNoMad/21/head
2025-12-04T09:33:41.2023623Z  * [new branch]              gh/SherlockNoMad/21/orig    -> origin/gh/SherlockNoMad/21/orig
2025-12-04T09:33:41.2025473Z  * [new branch]              gh/SherlockNoMad/3/base     -> origin/gh/SherlockNoMad/3/base
2025-12-04T09:33:41.2026960Z  * [new branch]              gh/SherlockNoMad/3/head     -> origin/gh/SherlockNoMad/3/head
2025-12-04T09:33:41.2028707Z  * [new branch]              gh/SherlockNoMad/4/base     -> origin/gh/SherlockNoMad/4/base
2025-12-04T09:33:41.2030043Z  * [new branch]              gh/SherlockNoMad/4/head     -> origin/gh/SherlockNoMad/4/head
2025-12-04T09:33:41.2031890Z  * [new branch]              gh/SherlockNoMad/5/base     -> origin/gh/SherlockNoMad/5/base
2025-12-04T09:33:41.2033195Z  * [new branch]              gh/SherlockNoMad/5/head     -> origin/gh/SherlockNoMad/5/head
2025-12-04T09:33:41.2036222Z  * [new branch]              gh/Sidharth123-cpu/24/base  -> origin/gh/Sidharth123-cpu/24/base
2025-12-04T09:33:41.2038001Z  * [new branch]              gh/Sidharth123-cpu/25/base  -> origin/gh/Sidharth123-cpu/25/base
2025-12-04T09:33:41.2039830Z  * [new branch]              gh/Sidharth123-cpu/26/base  -> origin/gh/Sidharth123-cpu/26/base
2025-12-04T09:33:41.2041962Z  * [new branch]              gh/Sidharth123-cpu/27/base  -> origin/gh/Sidharth123-cpu/27/base
2025-12-04T09:33:41.2044510Z  * [new branch]              gh/StrongerXi/1/base        -> origin/gh/StrongerXi/1/base
2025-12-04T09:33:41.2045773Z  * [new branch]              gh/StrongerXi/1/head        -> origin/gh/StrongerXi/1/head
2025-12-04T09:33:41.2047783Z  * [new branch]              gh/StrongerXi/71/base       -> origin/gh/StrongerXi/71/base
2025-12-04T09:33:41.2049257Z  * [new branch]              gh/StrongerXi/71/head       -> origin/gh/StrongerXi/71/head
2025-12-04T09:33:41.2051066Z  * [new branch]              gh/StrongerXi/72/base       -> origin/gh/StrongerXi/72/base
2025-12-04T09:33:41.2052503Z  * [new branch]              gh/StrongerXi/72/head       -> origin/gh/StrongerXi/72/head
2025-12-04T09:33:41.2054387Z  * [new branch]              gh/StrongerXi/73/base       -> origin/gh/StrongerXi/73/base
2025-12-04T09:33:41.2055888Z  * [new branch]              gh/StrongerXi/73/head       -> origin/gh/StrongerXi/73/head
2025-12-04T09:33:41.2057654Z  * [new branch]              gh/StrongerXi/73/orig       -> origin/gh/StrongerXi/73/orig
2025-12-04T09:33:41.2060253Z  * [new branch]              gh/XilunWu/160/base         -> origin/gh/XilunWu/160/base
2025-12-04T09:33:41.2061711Z  * [new branch]              gh/XilunWu/160/head         -> origin/gh/XilunWu/160/head
2025-12-04T09:33:41.2063216Z  * [new branch]              gh/XilunWu/160/orig         -> origin/gh/XilunWu/160/orig
2025-12-04T09:33:41.2065224Z  * [new branch]              gh/XilunWu/163/base         -> origin/gh/XilunWu/163/base
2025-12-04T09:33:41.2066799Z  * [new branch]              gh/XilunWu/163/head         -> origin/gh/XilunWu/163/head
2025-12-04T09:33:41.2068246Z  * [new branch]              gh/XilunWu/163/orig         -> origin/gh/XilunWu/163/orig
2025-12-04T09:33:41.2070411Z  * [new branch]              gh/XilunWu/168/base         -> origin/gh/XilunWu/168/base
2025-12-04T09:33:41.2072016Z  * [new branch]              gh/XilunWu/168/head         -> origin/gh/XilunWu/168/head
2025-12-04T09:33:41.2073540Z  * [new branch]              gh/XilunWu/168/orig         -> origin/gh/XilunWu/168/orig
2025-12-04T09:33:41.2075438Z  * [new branch]              gh/XilunWu/169/base         -> origin/gh/XilunWu/169/base
2025-12-04T09:33:41.2076933Z  * [new branch]              gh/XilunWu/169/head         -> origin/gh/XilunWu/169/head
2025-12-04T09:33:41.2078413Z  * [new branch]              gh/XilunWu/169/orig         -> origin/gh/XilunWu/169/orig
2025-12-04T09:33:41.2080218Z  * [new branch]              gh/XilunWu/170/base         -> origin/gh/XilunWu/170/base
2025-12-04T09:33:41.2081703Z  * [new branch]              gh/XilunWu/170/head         -> origin/gh/XilunWu/170/head
2025-12-04T09:33:41.2083180Z  * [new branch]              gh/XilunWu/170/orig         -> origin/gh/XilunWu/170/orig
2025-12-04T09:33:41.2085328Z  * [new branch]              gh/XilunWu/171/base         -> origin/gh/XilunWu/171/base
2025-12-04T09:33:41.2086802Z  * [new branch]              gh/XilunWu/171/head         -> origin/gh/XilunWu/171/head
2025-12-04T09:33:41.2088288Z  * [new branch]              gh/XilunWu/171/orig         -> origin/gh/XilunWu/171/orig
2025-12-04T09:33:41.2090112Z  * [new branch]              gh/XilunWu/173/base         -> origin/gh/XilunWu/173/base
2025-12-04T09:33:41.2091660Z  * [new branch]              gh/XilunWu/173/head         -> origin/gh/XilunWu/173/head
2025-12-04T09:33:41.2093209Z  * [new branch]              gh/XilunWu/173/orig         -> origin/gh/XilunWu/173/orig
2025-12-04T09:33:41.2095076Z  * [new branch]              gh/XilunWu/175/base         -> origin/gh/XilunWu/175/base
2025-12-04T09:33:41.2096687Z  * [new branch]              gh/XilunWu/175/head         -> origin/gh/XilunWu/175/head
2025-12-04T09:33:41.2098236Z  * [new branch]              gh/XilunWu/175/orig         -> origin/gh/XilunWu/175/orig
2025-12-04T09:33:41.2100276Z  * [new branch]              gh/XilunWu/176/base         -> origin/gh/XilunWu/176/base
2025-12-04T09:33:41.2101756Z  * [new branch]              gh/XilunWu/176/head         -> origin/gh/XilunWu/176/head
2025-12-04T09:33:41.2103446Z  * [new branch]              gh/XilunWu/176/orig         -> origin/gh/XilunWu/176/orig
2025-12-04T09:33:41.2105772Z  * [new branch]              gh/XuehaiPan/14/base        -> origin/gh/XuehaiPan/14/base
2025-12-04T09:33:41.2107235Z  * [new branch]              gh/XuehaiPan/14/head        -> origin/gh/XuehaiPan/14/head
2025-12-04T09:33:41.2108696Z  * [new branch]              gh/XuehaiPan/14/orig        -> origin/gh/XuehaiPan/14/orig
2025-12-04T09:33:41.2110861Z  * [new branch]              gh/XuehaiPan/179/base       -> origin/gh/XuehaiPan/179/base
2025-12-04T09:33:41.2112337Z  * [new branch]              gh/XuehaiPan/179/head       -> origin/gh/XuehaiPan/179/head
2025-12-04T09:33:41.2114011Z  * [new branch]              gh/XuehaiPan/179/orig       -> origin/gh/XuehaiPan/179/orig
2025-12-04T09:33:41.2116026Z  * [new branch]              gh/XuehaiPan/249/base       -> origin/gh/XuehaiPan/249/base
2025-12-04T09:33:41.2117628Z  * [new branch]              gh/XuehaiPan/249/head       -> origin/gh/XuehaiPan/249/head
2025-12-04T09:33:41.2119220Z  * [new branch]              gh/XuehaiPan/249/orig       -> origin/gh/XuehaiPan/249/orig
2025-12-04T09:33:41.2121250Z  * [new branch]              gh/XuehaiPan/253/base       -> origin/gh/XuehaiPan/253/base
2025-12-04T09:33:41.2122687Z  * [new branch]              gh/XuehaiPan/253/head       -> origin/gh/XuehaiPan/253/head
2025-12-04T09:33:41.2124196Z  * [new branch]              gh/XuehaiPan/253/orig       -> origin/gh/XuehaiPan/253/orig
2025-12-04T09:33:41.2126188Z  * [new branch]              gh/XuehaiPan/254/base       -> origin/gh/XuehaiPan/254/base
2025-12-04T09:33:41.2127672Z  * [new branch]              gh/XuehaiPan/254/head       -> origin/gh/XuehaiPan/254/head
2025-12-04T09:33:41.2129179Z  * [new branch]              gh/XuehaiPan/254/orig       -> origin/gh/XuehaiPan/254/orig
2025-12-04T09:33:41.2131048Z  * [new branch]              gh/XuehaiPan/255/base       -> origin/gh/XuehaiPan/255/base
2025-12-04T09:33:41.2132513Z  * [new branch]              gh/XuehaiPan/255/head       -> origin/gh/XuehaiPan/255/head
2025-12-04T09:33:41.2134018Z  * [new branch]              gh/XuehaiPan/255/orig       -> origin/gh/XuehaiPan/255/orig
2025-12-04T09:33:41.2135963Z  * [new branch]              gh/XuehaiPan/271/base       -> origin/gh/XuehaiPan/271/base
2025-12-04T09:33:41.2137616Z  * [new branch]              gh/XuehaiPan/271/head       -> origin/gh/XuehaiPan/271/head
2025-12-04T09:33:41.2138925Z  * [new branch]              gh/XuehaiPan/271/orig       -> origin/gh/XuehaiPan/271/orig
2025-12-04T09:33:41.2140935Z  * [new branch]              gh/XuehaiPan/343/base       -> origin/gh/XuehaiPan/343/base
2025-12-04T09:33:41.2142390Z  * [new branch]              gh/XuehaiPan/343/head       -> origin/gh/XuehaiPan/343/head
2025-12-04T09:33:41.2143799Z  * [new branch]              gh/XuehaiPan/343/orig       -> origin/gh/XuehaiPan/343/orig
2025-12-04T09:33:41.2145871Z  * [new branch]              gh/XuehaiPan/347/base       -> origin/gh/XuehaiPan/347/base
2025-12-04T09:33:41.2147365Z  * [new branch]              gh/XuehaiPan/347/head       -> origin/gh/XuehaiPan/347/head
2025-12-04T09:33:41.2148880Z  * [new branch]              gh/XuehaiPan/347/orig       -> origin/gh/XuehaiPan/347/orig
2025-12-04T09:33:41.2150813Z  * [new branch]              gh/XuehaiPan/348/base       -> origin/gh/XuehaiPan/348/base
2025-12-04T09:33:41.2152288Z  * [new branch]              gh/XuehaiPan/348/head       -> origin/gh/XuehaiPan/348/head
2025-12-04T09:33:41.2153752Z  * [new branch]              gh/XuehaiPan/348/orig       -> origin/gh/XuehaiPan/348/orig
2025-12-04T09:33:41.2155707Z  * [new branch]              gh/XuehaiPan/350/base       -> origin/gh/XuehaiPan/350/base
2025-12-04T09:33:41.2157166Z  * [new branch]              gh/XuehaiPan/350/head       -> origin/gh/XuehaiPan/350/head
2025-12-04T09:33:41.2158630Z  * [new branch]              gh/XuehaiPan/350/orig       -> origin/gh/XuehaiPan/350/orig
2025-12-04T09:33:41.2160817Z  * [new branch]              gh/XuehaiPan/365/base       -> origin/gh/XuehaiPan/365/base
2025-12-04T09:33:41.2162122Z  * [new branch]              gh/XuehaiPan/365/head       -> origin/gh/XuehaiPan/365/head
2025-12-04T09:33:41.2163633Z  * [new branch]              gh/XuehaiPan/365/orig       -> origin/gh/XuehaiPan/365/orig
2025-12-04T09:33:41.2165672Z  * [new branch]              gh/XuehaiPan/366/base       -> origin/gh/XuehaiPan/366/base
2025-12-04T09:33:41.2167115Z  * [new branch]              gh/XuehaiPan/366/head       -> origin/gh/XuehaiPan/366/head
2025-12-04T09:33:41.2169462Z  * [new branch]              gh/XuehaiPan/370/base       -> origin/gh/XuehaiPan/370/base
2025-12-04T09:33:41.2171082Z  * [new branch]              gh/XuehaiPan/370/head       -> origin/gh/XuehaiPan/370/head
2025-12-04T09:33:41.2174077Z  * [new branch]              gh/XuehaiPan/370/orig       -> origin/gh/XuehaiPan/370/orig
2025-12-04T09:33:41.2176085Z  * [new branch]              gh/XuehaiPan/390/base       -> origin/gh/XuehaiPan/390/base
2025-12-04T09:33:41.2177720Z  * [new branch]              gh/XuehaiPan/390/head       -> origin/gh/XuehaiPan/390/head
2025-12-04T09:33:41.2179197Z  * [new branch]              gh/XuehaiPan/390/orig       -> origin/gh/XuehaiPan/390/orig
2025-12-04T09:33:41.2181178Z  * [new branch]              gh/XuehaiPan/391/base       -> origin/gh/XuehaiPan/391/base
2025-12-04T09:33:41.2182556Z  * [new branch]              gh/XuehaiPan/391/head       -> origin/gh/XuehaiPan/391/head
2025-12-04T09:33:41.2184064Z  * [new branch]              gh/XuehaiPan/391/orig       -> origin/gh/XuehaiPan/391/orig
2025-12-04T09:33:41.2186067Z  * [new branch]              gh/XuehaiPan/392/base       -> origin/gh/XuehaiPan/392/base
2025-12-04T09:33:41.2187523Z  * [new branch]              gh/XuehaiPan/392/head       -> origin/gh/XuehaiPan/392/head
2025-12-04T09:33:41.2188998Z  * [new branch]              gh/XuehaiPan/392/orig       -> origin/gh/XuehaiPan/392/orig
2025-12-04T09:33:41.2191593Z  * [new branch]              gh/XuehaiPan/394/base       -> origin/gh/XuehaiPan/394/base
2025-12-04T09:33:41.2193283Z  * [new branch]              gh/XuehaiPan/394/head       -> origin/gh/XuehaiPan/394/head
2025-12-04T09:33:41.2194569Z  * [new branch]              gh/XuehaiPan/394/orig       -> origin/gh/XuehaiPan/394/orig
2025-12-04T09:33:41.2196640Z  * [new branch]              gh/XuehaiPan/397/base       -> origin/gh/XuehaiPan/397/base
2025-12-04T09:33:41.2198177Z  * [new branch]              gh/XuehaiPan/397/head       -> origin/gh/XuehaiPan/397/head
2025-12-04T09:33:41.2199509Z  * [new branch]              gh/XuehaiPan/397/orig       -> origin/gh/XuehaiPan/397/orig
2025-12-04T09:33:41.2201582Z  * [new branch]              gh/XuehaiPan/398/base       -> origin/gh/XuehaiPan/398/base
2025-12-04T09:33:41.2203063Z  * [new branch]              gh/XuehaiPan/398/head       -> origin/gh/XuehaiPan/398/head
2025-12-04T09:33:41.2204549Z  * [new branch]              gh/XuehaiPan/398/orig       -> origin/gh/XuehaiPan/398/orig
2025-12-04T09:33:41.2206443Z  * [new branch]              gh/XuehaiPan/399/base       -> origin/gh/XuehaiPan/399/base
2025-12-04T09:33:41.2207988Z  * [new branch]              gh/XuehaiPan/399/head       -> origin/gh/XuehaiPan/399/head
2025-12-04T09:33:41.2209486Z  * [new branch]              gh/XuehaiPan/399/orig       -> origin/gh/XuehaiPan/399/orig
2025-12-04T09:33:41.2211562Z  * [new branch]              gh/XuehaiPan/400/base       -> origin/gh/XuehaiPan/400/base
2025-12-04T09:33:41.2213091Z  * [new branch]              gh/XuehaiPan/400/head       -> origin/gh/XuehaiPan/400/head
2025-12-04T09:33:41.2214569Z  * [new branch]              gh/XuehaiPan/400/orig       -> origin/gh/XuehaiPan/400/orig
2025-12-04T09:33:41.2217084Z  * [new branch]              gh/ZhiweiYan-96/39/base     -> origin/gh/ZhiweiYan-96/39/base
2025-12-04T09:33:41.2218628Z  * [new branch]              gh/ZhiweiYan-96/39/head     -> origin/gh/ZhiweiYan-96/39/head
2025-12-04T09:33:41.2220127Z  * [new branch]              gh/ZhiweiYan-96/39/orig     -> origin/gh/ZhiweiYan-96/39/orig
2025-12-04T09:33:41.2222284Z  * [new branch]              gh/ZhiweiYan-96/44/base     -> origin/gh/ZhiweiYan-96/44/base
2025-12-04T09:33:41.2223581Z  * [new branch]              gh/ZhiweiYan-96/44/head     -> origin/gh/ZhiweiYan-96/44/head
2025-12-04T09:33:41.2225543Z  * [new branch]              gh/ZhiweiYan-96/45/base     -> origin/gh/ZhiweiYan-96/45/base
2025-12-04T09:33:41.2226842Z  * [new branch]              gh/ZhiweiYan-96/45/head     -> origin/gh/ZhiweiYan-96/45/head
2025-12-04T09:33:41.2228931Z  * [new branch]              gh/ZhiweiYan-96/49/base     -> origin/gh/ZhiweiYan-96/49/base
2025-12-04T09:33:41.2230451Z  * [new branch]              gh/ZhiweiYan-96/49/head     -> origin/gh/ZhiweiYan-96/49/head
2025-12-04T09:33:41.2232411Z  * [new branch]              gh/ZhiweiYan-96/62/base     -> origin/gh/ZhiweiYan-96/62/base
2025-12-04T09:33:41.2233886Z  * [new branch]              gh/ZhiweiYan-96/62/head     -> origin/gh/ZhiweiYan-96/62/head
2025-12-04T09:33:41.2235927Z  * [new branch]              gh/ZhiweiYan-96/66/base     -> origin/gh/ZhiweiYan-96/66/base
2025-12-04T09:33:41.2237413Z  * [new branch]              gh/ZhiweiYan-96/66/head     -> origin/gh/ZhiweiYan-96/66/head
2025-12-04T09:33:41.2239329Z  * [new branch]              gh/ZhiweiYan-96/67/base     -> origin/gh/ZhiweiYan-96/67/base
2025-12-04T09:33:41.2240668Z  * [new branch]              gh/ZhiweiYan-96/67/head     -> origin/gh/ZhiweiYan-96/67/head
2025-12-04T09:33:41.2242590Z  * [new branch]              gh/ZhiweiYan-96/68/base     -> origin/gh/ZhiweiYan-96/68/base
2025-12-04T09:33:41.2243861Z  * [new branch]              gh/ZhiweiYan-96/68/head     -> origin/gh/ZhiweiYan-96/68/head
2025-12-04T09:33:41.2245407Z  * [new branch]              gh/ZhiweiYan-96/68/orig     -> origin/gh/ZhiweiYan-96/68/orig
2025-12-04T09:33:41.2247821Z  * [new branch]              gh/aakhundov/1/base         -> origin/gh/aakhundov/1/base
2025-12-04T09:33:41.2249394Z  * [new branch]              gh/aakhundov/1/head         -> origin/gh/aakhundov/1/head
2025-12-04T09:33:41.2251187Z  * [new branch]              gh/aakhundov/2/base         -> origin/gh/aakhundov/2/base
2025-12-04T09:33:41.2252756Z  * [new branch]              gh/aakhundov/2/head         -> origin/gh/aakhundov/2/head
2025-12-04T09:33:41.2254850Z  * [new branch]              gh/aditew01/openblas        -> origin/gh/aditew01/openblas
2025-12-04T09:33:41.2256141Z  * [new branch]              gh/aditew01/sbgemm          -> origin/gh/aditew01/sbgemm
2025-12-04T09:33:41.2257854Z  * [new branch]              gh/aditew01/vecbf16         -> origin/gh/aditew01/vecbf16
2025-12-04T09:33:41.2260105Z  * [new branch]              gh/albanD/4/base            -> origin/gh/albanD/4/base
2025-12-04T09:33:41.2261563Z  * [new branch]              gh/albanD/4/head            -> origin/gh/albanD/4/head
2025-12-04T09:33:41.2263104Z  * [new branch]              gh/albanD/4/orig            -> origin/gh/albanD/4/orig
2025-12-04T09:33:41.2265438Z  * [new branch]              gh/alexbrauckmann/paddedtensor_faketensor_init -> origin/gh/alexbrauckmann/paddedtensor_faketensor_init
2025-12-04T09:33:41.2267496Z  * [new branch]              gh/alexsamardzic/12/base    -> origin/gh/alexsamardzic/12/base
2025-12-04T09:33:41.2268875Z  * [new branch]              gh/alexsamardzic/12/head    -> origin/gh/alexsamardzic/12/head
2025-12-04T09:33:41.2270522Z  * [new branch]              gh/alexsamardzic/12/orig    -> origin/gh/alexsamardzic/12/orig
2025-12-04T09:33:41.2272748Z  * [new branch]              gh/alexsamardzic/14/base    -> origin/gh/alexsamardzic/14/base
2025-12-04T09:33:41.2274103Z  * [new branch]              gh/alexsamardzic/14/head    -> origin/gh/alexsamardzic/14/head
2025-12-04T09:33:41.2275747Z  * [new branch]              gh/alexsamardzic/14/orig    -> origin/gh/alexsamardzic/14/orig
2025-12-04T09:33:41.2277701Z  * [new branch]              gh/alexsamardzic/15/base    -> origin/gh/alexsamardzic/15/base
2025-12-04T09:33:41.2279072Z  * [new branch]              gh/alexsamardzic/15/head    -> origin/gh/alexsamardzic/15/head
2025-12-04T09:33:41.2280765Z  * [new branch]              gh/alexsamardzic/15/orig    -> origin/gh/alexsamardzic/15/orig
2025-12-04T09:33:41.2282921Z  * [new branch]              gh/amjames/18/base          -> origin/gh/amjames/18/base
2025-12-04T09:33:41.2284391Z  * [new branch]              gh/amjames/18/head          -> origin/gh/amjames/18/head
2025-12-04T09:33:41.2285846Z  * [new branch]              gh/amjames/18/orig          -> origin/gh/amjames/18/orig
2025-12-04T09:33:41.2288490Z  * [new branch]              gh/andrewor14/35/base       -> origin/gh/andrewor14/35/base
2025-12-04T09:33:41.2290085Z  * [new branch]              gh/andrewor14/35/head       -> origin/gh/andrewor14/35/head
2025-12-04T09:33:41.2291684Z  * [new branch]              gh/andrewor14/35/orig       -> origin/gh/andrewor14/35/orig
2025-12-04T09:33:41.2293855Z  * [new branch]              gh/andrewor14/50/base       -> origin/gh/andrewor14/50/base
2025-12-04T09:33:41.2295324Z  * [new branch]              gh/andrewor14/50/head       -> origin/gh/andrewor14/50/head
2025-12-04T09:33:41.2296924Z  * [new branch]              gh/andrewor14/50/orig       -> origin/gh/andrewor14/50/orig
2025-12-04T09:33:41.2299365Z  * [new branch]              gh/andyanwang/30/base       -> origin/gh/andyanwang/30/base
2025-12-04T09:33:41.2301132Z  * [new branch]              gh/andyanwang/30/orig       -> origin/gh/andyanwang/30/orig
2025-12-04T09:33:41.2303103Z  * [new branch]              gh/andyanwang/31/base       -> origin/gh/andyanwang/31/base
2025-12-04T09:33:41.2304815Z  * [new branch]              gh/andyanwang/31/orig       -> origin/gh/andyanwang/31/orig
2025-12-04T09:33:41.2306861Z  * [new branch]              gh/andyanwang/39/base       -> origin/gh/andyanwang/39/base
2025-12-04T09:33:41.2308457Z  * [new branch]              gh/andyanwang/39/head       -> origin/gh/andyanwang/39/head
2025-12-04T09:33:41.2309938Z  * [new branch]              gh/andyanwang/39/orig       -> origin/gh/andyanwang/39/orig
2025-12-04T09:33:41.2312181Z  * [new branch]              gh/andyanwang/42/base       -> origin/gh/andyanwang/42/base
2025-12-04T09:33:41.2313517Z  * [new branch]              gh/andyanwang/42/head       -> origin/gh/andyanwang/42/head
2025-12-04T09:33:41.2315145Z  * [new branch]              gh/andyanwang/42/orig       -> origin/gh/andyanwang/42/orig
2025-12-04T09:33:41.2317259Z  * [new branch]              gh/andyanwang/45/base       -> origin/gh/andyanwang/45/base
2025-12-04T09:33:41.2318825Z  * [new branch]              gh/andyanwang/45/head       -> origin/gh/andyanwang/45/head
2025-12-04T09:33:41.2320306Z  * [new branch]              gh/andyanwang/45/orig       -> origin/gh/andyanwang/45/orig
2025-12-04T09:33:41.2322746Z  * [new branch]              gh/angelayi/107/base        -> origin/gh/angelayi/107/base
2025-12-04T09:33:41.2324071Z  * [new branch]              gh/angelayi/107/head        -> origin/gh/angelayi/107/head
2025-12-04T09:33:41.2326274Z  * [new branch]              gh/angelayi/114/base        -> origin/gh/angelayi/114/base
2025-12-04T09:33:41.2327934Z  * [new branch]              gh/angelayi/114/head        -> origin/gh/angelayi/114/head
2025-12-04T09:33:41.2329436Z  * [new branch]              gh/angelayi/114/orig        -> origin/gh/angelayi/114/orig
2025-12-04T09:33:41.2331375Z  * [new branch]              gh/angelayi/116/base        -> origin/gh/angelayi/116/base
2025-12-04T09:33:41.2332887Z  * [new branch]              gh/angelayi/116/head        -> origin/gh/angelayi/116/head
2025-12-04T09:33:41.2334403Z  * [new branch]              gh/angelayi/116/orig        -> origin/gh/angelayi/116/orig
2025-12-04T09:33:41.2336604Z  * [new branch]              gh/angelayi/122/base        -> origin/gh/angelayi/122/base
2025-12-04T09:33:41.2338095Z  * [new branch]              gh/angelayi/122/head        -> origin/gh/angelayi/122/head
2025-12-04T09:33:41.2339572Z  * [new branch]              gh/angelayi/122/orig        -> origin/gh/angelayi/122/orig
2025-12-04T09:33:41.2341733Z  * [new branch]              gh/angelayi/124/base        -> origin/gh/angelayi/124/base
2025-12-04T09:33:41.2343103Z  * [new branch]              gh/angelayi/124/head        -> origin/gh/angelayi/124/head
2025-12-04T09:33:41.2344594Z  * [new branch]              gh/angelayi/124/orig        -> origin/gh/angelayi/124/orig
2025-12-04T09:33:41.2346721Z  * [new branch]              gh/angelayi/128/base        -> origin/gh/angelayi/128/base
2025-12-04T09:33:41.2348290Z  * [new branch]              gh/angelayi/128/head        -> origin/gh/angelayi/128/head
2025-12-04T09:33:41.2349786Z  * [new branch]              gh/angelayi/128/orig        -> origin/gh/angelayi/128/orig
2025-12-04T09:33:41.2351773Z  * [new branch]              gh/angelayi/131/base        -> origin/gh/angelayi/131/base
2025-12-04T09:33:41.2353274Z  * [new branch]              gh/angelayi/131/head        -> origin/gh/angelayi/131/head
2025-12-04T09:33:41.2354788Z  * [new branch]              gh/angelayi/131/orig        -> origin/gh/angelayi/131/orig
2025-12-04T09:33:41.2357125Z  * [new branch]              gh/angelayi/132/base        -> origin/gh/angelayi/132/base
2025-12-04T09:33:41.2358809Z  * [new branch]              gh/angelayi/132/head        -> origin/gh/angelayi/132/head
2025-12-04T09:33:41.2360501Z  * [new branch]              gh/angelayi/132/orig        -> origin/gh/angelayi/132/orig
2025-12-04T09:33:41.2362356Z  * [new branch]              gh/angelayi/133/base        -> origin/gh/angelayi/133/base
2025-12-04T09:33:41.2363859Z  * [new branch]              gh/angelayi/133/head        -> origin/gh/angelayi/133/head
2025-12-04T09:33:41.2365355Z  * [new branch]              gh/angelayi/133/orig        -> origin/gh/angelayi/133/orig
2025-12-04T09:33:41.2367761Z  * [new branch]              gh/angelayi/134/base        -> origin/gh/angelayi/134/base
2025-12-04T09:33:41.2369766Z  * [new branch]              gh/angelayi/134/head        -> origin/gh/angelayi/134/head
2025-12-04T09:33:41.2370790Z  * [new branch]              gh/angelayi/134/orig        -> origin/gh/angelayi/134/orig
2025-12-04T09:33:41.2375370Z  * [new branch]              gh/angelayi/135/base        -> origin/gh/angelayi/135/base
2025-12-04T09:33:41.2377003Z  * [new branch]              gh/angelayi/135/head        -> origin/gh/angelayi/135/head
2025-12-04T09:33:41.2378550Z  * [new branch]              gh/angelayi/135/orig        -> origin/gh/angelayi/135/orig
2025-12-04T09:33:41.2380521Z  * [new branch]              gh/angelayi/136/base        -> origin/gh/angelayi/136/base
2025-12-04T09:33:41.2382170Z  * [new branch]              gh/angelayi/136/head        -> origin/gh/angelayi/136/head
2025-12-04T09:33:41.2383608Z  * [new branch]              gh/angelayi/136/orig        -> origin/gh/angelayi/136/orig
2025-12-04T09:33:41.2385627Z  * [new branch]              gh/angelayi/137/base        -> origin/gh/angelayi/137/base
2025-12-04T09:33:41.2387033Z  * [new branch]              gh/angelayi/137/head        -> origin/gh/angelayi/137/head
2025-12-04T09:33:41.2388762Z  * [new branch]              gh/angelayi/137/orig        -> origin/gh/angelayi/137/orig
2025-12-04T09:33:41.2390709Z  * [new branch]              gh/angelayi/138/base        -> origin/gh/angelayi/138/base
2025-12-04T09:33:41.2392158Z  * [new branch]              gh/angelayi/138/head        -> origin/gh/angelayi/138/head
2025-12-04T09:33:41.2393573Z  * [new branch]              gh/angelayi/138/orig        -> origin/gh/angelayi/138/orig
2025-12-04T09:33:41.2395548Z  * [new branch]              gh/angelayi/139/base        -> origin/gh/angelayi/139/base
2025-12-04T09:33:41.2397061Z  * [new branch]              gh/angelayi/139/head        -> origin/gh/angelayi/139/head
2025-12-04T09:33:41.2398481Z  * [new branch]              gh/angelayi/139/orig        -> origin/gh/angelayi/139/orig
2025-12-04T09:33:41.2400575Z  * [new branch]              gh/angelayi/140/base        -> origin/gh/angelayi/140/base
2025-12-04T09:33:41.2402192Z  * [new branch]              gh/angelayi/140/head        -> origin/gh/angelayi/140/head
2025-12-04T09:33:41.2403682Z  * [new branch]              gh/angelayi/140/orig        -> origin/gh/angelayi/140/orig
2025-12-04T09:33:41.2406403Z  * [new branch]              gh/angelayi/141/base        -> origin/gh/angelayi/141/base
2025-12-04T09:33:41.2407705Z  * [new branch]              gh/angelayi/141/head        -> origin/gh/angelayi/141/head
2025-12-04T09:33:41.2409257Z  * [new branch]              gh/angelayi/141/orig        -> origin/gh/angelayi/141/orig
2025-12-04T09:33:41.2411272Z  * [new branch]              gh/angelayi/142/base        -> origin/gh/angelayi/142/base
2025-12-04T09:33:41.2412603Z  * [new branch]              gh/angelayi/142/head        -> origin/gh/angelayi/142/head
2025-12-04T09:33:41.2414192Z  * [new branch]              gh/angelayi/142/orig        -> origin/gh/angelayi/142/orig
2025-12-04T09:33:41.2416154Z  * [new branch]              gh/angelayi/143/base        -> origin/gh/angelayi/143/base
2025-12-04T09:33:41.2417734Z  * [new branch]              gh/angelayi/143/head        -> origin/gh/angelayi/143/head
2025-12-04T09:33:41.2419050Z  * [new branch]              gh/angelayi/143/orig        -> origin/gh/angelayi/143/orig
2025-12-04T09:33:41.2421165Z  * [new branch]              gh/angelayi/144/base        -> origin/gh/angelayi/144/base
2025-12-04T09:33:41.2422865Z  * [new branch]              gh/angelayi/144/head        -> origin/gh/angelayi/144/head
2025-12-04T09:33:41.2424166Z  * [new branch]              gh/angelayi/144/orig        -> origin/gh/angelayi/144/orig
2025-12-04T09:33:41.2426869Z  * [new branch]              gh/anijain2305/753/base     -> origin/gh/anijain2305/753/base
2025-12-04T09:33:41.2428198Z  * [new branch]              gh/anijain2305/753/head     -> origin/gh/anijain2305/753/head
2025-12-04T09:33:41.2429765Z  * [new branch]              gh/anijain2305/753/orig     -> origin/gh/anijain2305/753/orig
2025-12-04T09:33:41.2431942Z  * [new branch]              gh/anijain2305/810/base     -> origin/gh/anijain2305/810/base
2025-12-04T09:33:41.2433438Z  * [new branch]              gh/anijain2305/810/head     -> origin/gh/anijain2305/810/head
2025-12-04T09:33:41.2434974Z  * [new branch]              gh/anijain2305/810/orig     -> origin/gh/anijain2305/810/orig
2025-12-04T09:33:41.2436973Z  * [new branch]              gh/anijain2305/854/base     -> origin/gh/anijain2305/854/base
2025-12-04T09:33:41.2439034Z  * [new branch]              gh/anijain2305/854/head     -> origin/gh/anijain2305/854/head
2025-12-04T09:33:41.2440662Z  * [new branch]              gh/anijain2305/854/orig     -> origin/gh/anijain2305/854/orig
2025-12-04T09:33:41.2442851Z  * [new branch]              gh/anijain2305/864/base     -> origin/gh/anijain2305/864/base
2025-12-04T09:33:41.2444363Z  * [new branch]              gh/anijain2305/864/head     -> origin/gh/anijain2305/864/head
2025-12-04T09:33:41.2445897Z  * [new branch]              gh/anijain2305/864/orig     -> origin/gh/anijain2305/864/orig
2025-12-04T09:33:41.2447934Z  * [new branch]              gh/anijain2305/870/base     -> origin/gh/anijain2305/870/base
2025-12-04T09:33:41.2449193Z  * [new branch]              gh/anijain2305/870/head     -> origin/gh/anijain2305/870/head
2025-12-04T09:33:41.2450822Z  * [new branch]              gh/anijain2305/870/orig     -> origin/gh/anijain2305/870/orig
2025-12-04T09:33:41.2452914Z  * [new branch]              gh/anijain2305/873/base     -> origin/gh/anijain2305/873/base
2025-12-04T09:33:41.2454172Z  * [new branch]              gh/anijain2305/873/head     -> origin/gh/anijain2305/873/head
2025-12-04T09:33:41.2455676Z  * [new branch]              gh/anijain2305/873/orig     -> origin/gh/anijain2305/873/orig
2025-12-04T09:33:41.2457830Z  * [new branch]              gh/anijain2305/894/base     -> origin/gh/anijain2305/894/base
2025-12-04T09:33:41.2459162Z  * [new branch]              gh/anijain2305/894/head     -> origin/gh/anijain2305/894/head
2025-12-04T09:33:41.2460814Z  * [new branch]              gh/anijain2305/894/orig     -> origin/gh/anijain2305/894/orig
2025-12-04T09:33:41.2462798Z  * [new branch]              gh/anijain2305/895/base     -> origin/gh/anijain2305/895/base
2025-12-04T09:33:41.2464362Z  * [new branch]              gh/anijain2305/895/head     -> origin/gh/anijain2305/895/head
2025-12-04T09:33:41.2465906Z  * [new branch]              gh/anijain2305/895/orig     -> origin/gh/anijain2305/895/orig
2025-12-04T09:33:41.2467911Z  * [new branch]              gh/anijain2305/910/base     -> origin/gh/anijain2305/910/base
2025-12-04T09:33:41.2469453Z  * [new branch]              gh/anijain2305/910/head     -> origin/gh/anijain2305/910/head
2025-12-04T09:33:41.2471149Z  * [new branch]              gh/anijain2305/910/orig     -> origin/gh/anijain2305/910/orig
2025-12-04T09:33:41.2473292Z  * [new branch]              gh/anijain2305/919/base     -> origin/gh/anijain2305/919/base
2025-12-04T09:33:41.2474852Z  * [new branch]              gh/anijain2305/919/head     -> origin/gh/anijain2305/919/head
2025-12-04T09:33:41.2476312Z  * [new branch]              gh/anijain2305/919/orig     -> origin/gh/anijain2305/919/orig
2025-12-04T09:33:41.2478322Z  * [new branch]              gh/anijain2305/922/base     -> origin/gh/anijain2305/922/base
2025-12-04T09:33:41.2479912Z  * [new branch]              gh/anijain2305/922/head     -> origin/gh/anijain2305/922/head
2025-12-04T09:33:41.2481491Z  * [new branch]              gh/anijain2305/922/orig     -> origin/gh/anijain2305/922/orig
2025-12-04T09:33:41.2483559Z  * [new branch]              gh/anijain2305/932/base     -> origin/gh/anijain2305/932/base
2025-12-04T09:33:41.2485237Z  * [new branch]              gh/anijain2305/932/head     -> origin/gh/anijain2305/932/head
2025-12-04T09:33:41.2486774Z  * [new branch]              gh/anijain2305/932/orig     -> origin/gh/anijain2305/932/orig
2025-12-04T09:33:41.2488760Z  * [new branch]              gh/anijain2305/940/base     -> origin/gh/anijain2305/940/base
2025-12-04T09:33:41.2490109Z  * [new branch]              gh/anijain2305/940/head     -> origin/gh/anijain2305/940/head
2025-12-04T09:33:41.2491697Z  * [new branch]              gh/anijain2305/940/orig     -> origin/gh/anijain2305/940/orig
2025-12-04T09:33:41.2493715Z  * [new branch]              gh/anijain2305/941/base     -> origin/gh/anijain2305/941/base
2025-12-04T09:33:41.2495218Z  * [new branch]              gh/anijain2305/941/head     -> origin/gh/anijain2305/941/head
2025-12-04T09:33:41.2496622Z  * [new branch]              gh/anijain2305/941/orig     -> origin/gh/anijain2305/941/orig
2025-12-04T09:33:41.2498720Z  * [new branch]              gh/anijain2305/942/base     -> origin/gh/anijain2305/942/base
2025-12-04T09:33:41.2500245Z  * [new branch]              gh/anijain2305/942/head     -> origin/gh/anijain2305/942/head
2025-12-04T09:33:41.2501941Z  * [new branch]              gh/anijain2305/942/orig     -> origin/gh/anijain2305/942/orig
2025-12-04T09:33:41.2503977Z  * [new branch]              gh/anijain2305/943/base     -> origin/gh/anijain2305/943/base
2025-12-04T09:33:41.2505312Z  * [new branch]              gh/anijain2305/943/head     -> origin/gh/anijain2305/943/head
2025-12-04T09:33:41.2506856Z  * [new branch]              gh/anijain2305/943/orig     -> origin/gh/anijain2305/943/orig
2025-12-04T09:33:41.2509408Z  * [new branch]              gh/anijain2305/944/base     -> origin/gh/anijain2305/944/base
2025-12-04T09:33:41.2510749Z  * [new branch]              gh/anijain2305/944/head     -> origin/gh/anijain2305/944/head
2025-12-04T09:33:41.2513104Z  * [new branch]              gh/anijain2305/944/orig     -> origin/gh/anijain2305/944/orig
2025-12-04T09:33:41.2515084Z  * [new branch]              gh/anijain2305/945/base     -> origin/gh/anijain2305/945/base
2025-12-04T09:33:41.2516658Z  * [new branch]              gh/anijain2305/945/head     -> origin/gh/anijain2305/945/head
2025-12-04T09:33:41.2517989Z  * [new branch]              gh/anijain2305/945/orig     -> origin/gh/anijain2305/945/orig
2025-12-04T09:33:41.2520163Z  * [new branch]              gh/anijain2305/946/base     -> origin/gh/anijain2305/946/base
2025-12-04T09:33:41.2521482Z  * [new branch]              gh/anijain2305/946/head     -> origin/gh/anijain2305/946/head
2025-12-04T09:33:41.2523174Z  * [new branch]              gh/anijain2305/946/orig     -> origin/gh/anijain2305/946/orig
2025-12-04T09:33:41.2525268Z  * [new branch]              gh/anijain2305/947/base     -> origin/gh/anijain2305/947/base
2025-12-04T09:33:41.2526845Z  * [new branch]              gh/anijain2305/947/head     -> origin/gh/anijain2305/947/head
2025-12-04T09:33:41.2527956Z  * [new branch]              gh/anijain2305/947/orig     -> origin/gh/anijain2305/947/orig
2025-12-04T09:33:41.2530024Z  * [new branch]              gh/anijain2305/948/base     -> origin/gh/anijain2305/948/base
2025-12-04T09:33:41.2531497Z  * [new branch]              gh/anijain2305/948/head     -> origin/gh/anijain2305/948/head
2025-12-04T09:33:41.2532823Z  * [new branch]              gh/anijain2305/948/orig     -> origin/gh/anijain2305/948/orig
2025-12-04T09:33:41.2534901Z  * [new branch]              gh/anijain2305/949/base     -> origin/gh/anijain2305/949/base
2025-12-04T09:33:41.2536229Z  * [new branch]              gh/anijain2305/949/head     -> origin/gh/anijain2305/949/head
2025-12-04T09:33:41.2537915Z  * [new branch]              gh/anijain2305/949/orig     -> origin/gh/anijain2305/949/orig
2025-12-04T09:33:41.2539963Z  * [new branch]              gh/anijain2305/950/base     -> origin/gh/anijain2305/950/base
2025-12-04T09:33:41.2541459Z  * [new branch]              gh/anijain2305/950/head     -> origin/gh/anijain2305/950/head
2025-12-04T09:33:41.2543148Z  * [new branch]              gh/anijain2305/950/orig     -> origin/gh/anijain2305/950/orig
2025-12-04T09:33:41.2545129Z  * [new branch]              gh/anijain2305/951/base     -> origin/gh/anijain2305/951/base
2025-12-04T09:33:41.2546459Z  * [new branch]              gh/anijain2305/951/head     -> origin/gh/anijain2305/951/head
2025-12-04T09:33:41.2548070Z  * [new branch]              gh/anijain2305/951/orig     -> origin/gh/anijain2305/951/orig
2025-12-04T09:33:41.2550017Z  * [new branch]              gh/anijain2305/952/base     -> origin/gh/anijain2305/952/base
2025-12-04T09:33:41.2551456Z  * [new branch]              gh/anijain2305/952/head     -> origin/gh/anijain2305/952/head
2025-12-04T09:33:41.2552980Z  * [new branch]              gh/anijain2305/952/orig     -> origin/gh/anijain2305/952/orig
2025-12-04T09:33:41.2555455Z  * [new branch]              gh/anijain2305/953/base     -> origin/gh/anijain2305/953/base
2025-12-04T09:33:41.2556803Z  * [new branch]              gh/anijain2305/953/head     -> origin/gh/anijain2305/953/head
2025-12-04T09:33:41.2558367Z  * [new branch]              gh/anijain2305/953/orig     -> origin/gh/anijain2305/953/orig
2025-12-04T09:33:41.2560377Z  * [new branch]              gh/anijain2305/954/base     -> origin/gh/anijain2305/954/base
2025-12-04T09:33:41.2561963Z  * [new branch]              gh/anijain2305/954/head     -> origin/gh/anijain2305/954/head
2025-12-04T09:33:41.2563598Z  * [new branch]              gh/anijain2305/954/orig     -> origin/gh/anijain2305/954/orig
2025-12-04T09:33:41.2565704Z  * [new branch]              gh/anijain2305/955/base     -> origin/gh/anijain2305/955/base
2025-12-04T09:33:41.2567170Z  * [new branch]              gh/anijain2305/955/head     -> origin/gh/anijain2305/955/head
2025-12-04T09:33:41.2568727Z  * [new branch]              gh/anijain2305/955/orig     -> origin/gh/anijain2305/955/orig
2025-12-04T09:33:41.2570836Z  * [new branch]              gh/anijain2305/956/base     -> origin/gh/anijain2305/956/base
2025-12-04T09:33:41.2572637Z  * [new branch]              gh/anijain2305/956/head     -> origin/gh/anijain2305/956/head
2025-12-04T09:33:41.2574327Z  * [new branch]              gh/anijain2305/956/orig     -> origin/gh/anijain2305/956/orig
2025-12-04T09:33:41.2576185Z  * [new branch]              gh/anijain2305/957/base     -> origin/gh/anijain2305/957/base
2025-12-04T09:33:41.2577841Z  * [new branch]              gh/anijain2305/957/head     -> origin/gh/anijain2305/957/head
2025-12-04T09:33:41.2579180Z  * [new branch]              gh/anijain2305/957/orig     -> origin/gh/anijain2305/957/orig
2025-12-04T09:33:41.2581263Z  * [new branch]              gh/anijain2305/958/base     -> origin/gh/anijain2305/958/base
2025-12-04T09:33:41.2583026Z  * [new branch]              gh/anijain2305/958/head     -> origin/gh/anijain2305/958/head
2025-12-04T09:33:41.2584377Z  * [new branch]              gh/anijain2305/958/orig     -> origin/gh/anijain2305/958/orig
2025-12-04T09:33:41.2586446Z  * [new branch]              gh/anijain2305/959/base     -> origin/gh/anijain2305/959/base
2025-12-04T09:33:41.2587949Z  * [new branch]              gh/anijain2305/959/head     -> origin/gh/anijain2305/959/head
2025-12-04T09:33:41.2589222Z  * [new branch]              gh/anijain2305/959/orig     -> origin/gh/anijain2305/959/orig
2025-12-04T09:33:41.2591411Z  * [new branch]              gh/anijain2305/960/base     -> origin/gh/anijain2305/960/base
2025-12-04T09:33:41.2592972Z  * [new branch]              gh/anijain2305/960/head     -> origin/gh/anijain2305/960/head
2025-12-04T09:33:41.2594444Z  * [new branch]              gh/anijain2305/960/orig     -> origin/gh/anijain2305/960/orig
2025-12-04T09:33:41.2596620Z  * [new branch]              gh/anijain2305/961/base     -> origin/gh/anijain2305/961/base
2025-12-04T09:33:41.2597949Z  * [new branch]              gh/anijain2305/961/head     -> origin/gh/anijain2305/961/head
2025-12-04T09:33:41.2599544Z  * [new branch]              gh/anijain2305/961/orig     -> origin/gh/anijain2305/961/orig
2025-12-04T09:33:41.2601564Z  * [new branch]              gh/anijain2305/962/base     -> origin/gh/anijain2305/962/base
2025-12-04T09:33:41.2602996Z  * [new branch]              gh/anijain2305/962/head     -> origin/gh/anijain2305/962/head
2025-12-04T09:33:41.2604553Z  * [new branch]              gh/anijain2305/962/orig     -> origin/gh/anijain2305/962/orig
2025-12-04T09:33:41.2606959Z  * [new branch]              gh/anijain2305/963/base     -> origin/gh/anijain2305/963/base
2025-12-04T09:33:41.2608657Z  * [new branch]              gh/anijain2305/963/head     -> origin/gh/anijain2305/963/head
2025-12-04T09:33:41.2610272Z  * [new branch]              gh/anijain2305/963/orig     -> origin/gh/anijain2305/963/orig
2025-12-04T09:33:41.2612330Z  * [new branch]              gh/anijain2305/964/base     -> origin/gh/anijain2305/964/base
2025-12-04T09:33:41.2613645Z  * [new branch]              gh/anijain2305/964/head     -> origin/gh/anijain2305/964/head
2025-12-04T09:33:41.2615182Z  * [new branch]              gh/anijain2305/964/orig     -> origin/gh/anijain2305/964/orig
2025-12-04T09:33:41.2617481Z  * [new branch]              gh/anijain2305/965/base     -> origin/gh/anijain2305/965/base
2025-12-04T09:33:41.2618933Z  * [new branch]              gh/anijain2305/965/head     -> origin/gh/anijain2305/965/head
2025-12-04T09:33:41.2620435Z  * [new branch]              gh/anijain2305/965/orig     -> origin/gh/anijain2305/965/orig
2025-12-04T09:33:41.2622292Z  * [new branch]              gh/anijain2305/966/base     -> origin/gh/anijain2305/966/base
2025-12-04T09:33:41.2623747Z  * [new branch]              gh/anijain2305/966/head     -> origin/gh/anijain2305/966/head
2025-12-04T09:33:41.2625306Z  * [new branch]              gh/anijain2305/966/orig     -> origin/gh/anijain2305/966/orig
2025-12-04T09:33:41.2627277Z  * [new branch]              gh/anijain2305/967/base     -> origin/gh/anijain2305/967/base
2025-12-04T09:33:41.2628674Z  * [new branch]              gh/anijain2305/967/head     -> origin/gh/anijain2305/967/head
2025-12-04T09:33:41.2630311Z  * [new branch]              gh/anijain2305/967/orig     -> origin/gh/anijain2305/967/orig
2025-12-04T09:33:41.2632347Z  * [new branch]              gh/anijain2305/968/base     -> origin/gh/anijain2305/968/base
2025-12-04T09:33:41.2633735Z  * [new branch]              gh/anijain2305/968/head     -> origin/gh/anijain2305/968/head
2025-12-04T09:33:41.2635179Z  * [new branch]              gh/anijain2305/968/orig     -> origin/gh/anijain2305/968/orig
2025-12-04T09:33:41.2637160Z  * [new branch]              gh/anijain2305/969/base     -> origin/gh/anijain2305/969/base
2025-12-04T09:33:41.2638638Z  * [new branch]              gh/anijain2305/969/head     -> origin/gh/anijain2305/969/head
2025-12-04T09:33:41.2640348Z  * [new branch]              gh/anijain2305/969/orig     -> origin/gh/anijain2305/969/orig
2025-12-04T09:33:41.2642179Z  * [new branch]              gh/anijain2305/970/base     -> origin/gh/anijain2305/970/base
2025-12-04T09:33:41.2643700Z  * [new branch]              gh/anijain2305/970/head     -> origin/gh/anijain2305/970/head
2025-12-04T09:33:41.2645273Z  * [new branch]              gh/anijain2305/970/orig     -> origin/gh/anijain2305/970/orig
2025-12-04T09:33:41.2647899Z  * [new branch]              gh/anjali411/216/base       -> origin/gh/anjali411/216/base
2025-12-04T09:33:41.2649283Z  * [new branch]              gh/anjali411/216/head       -> origin/gh/anjali411/216/head
2025-12-04T09:33:41.2650806Z  * [new branch]              gh/anjali411/216/orig       -> origin/gh/anjali411/216/orig
2025-12-04T09:33:41.2653482Z  * [new branch]              gh/anshul-si/1/base         -> origin/gh/anshul-si/1/base
2025-12-04T09:33:41.2654809Z  * [new branch]              gh/anshul-si/1/head         -> origin/gh/anshul-si/1/head
2025-12-04T09:33:41.2656706Z  * [new branch]              gh/anshul-si/2/base         -> origin/gh/anshul-si/2/base
2025-12-04T09:33:41.2658088Z  * [new branch]              gh/anshul-si/2/head         -> origin/gh/anshul-si/2/head
2025-12-04T09:33:41.2659817Z  * [new branch]              gh/anshul-si/3/base         -> origin/gh/anshul-si/3/base
2025-12-04T09:33:41.2661776Z  * [new branch]              gh/anshul-si/3/head         -> origin/gh/anshul-si/3/head
2025-12-04T09:33:41.2663659Z  * [new branch]              gh/anshul-si/4/base         -> origin/gh/anshul-si/4/base
2025-12-04T09:33:41.2664981Z  * [new branch]              gh/anshul-si/4/head         -> origin/gh/anshul-si/4/head
2025-12-04T09:33:41.2666714Z  * [new branch]              gh/anshul-si/5/base         -> origin/gh/anshul-si/5/base
2025-12-04T09:33:41.2668174Z  * [new branch]              gh/anshul-si/5/head         -> origin/gh/anshul-si/5/head
2025-12-04T09:33:41.2670459Z  * [new branch]              gh/anshul-si/53/base        -> origin/gh/anshul-si/53/base
2025-12-04T09:33:41.2672494Z  * [new branch]              gh/anshul-si/53/head        -> origin/gh/anshul-si/53/head
2025-12-04T09:33:41.2674035Z  * [new branch]              gh/anshul-si/58/base        -> origin/gh/anshul-si/58/base
2025-12-04T09:33:41.2675362Z  * [new branch]              gh/anshul-si/58/head        -> origin/gh/anshul-si/58/head
2025-12-04T09:33:41.2677256Z  * [new branch]              gh/anshul-si/66/base        -> origin/gh/anshul-si/66/base
2025-12-04T09:33:41.2678782Z  * [new branch]              gh/anshul-si/66/head        -> origin/gh/anshul-si/66/head
2025-12-04T09:33:41.2680244Z  * [new branch]              gh/anshul-si/66/orig        -> origin/gh/anshul-si/66/orig
2025-12-04T09:33:41.2682126Z  * [new branch]              gh/anshul-si/67/base        -> origin/gh/anshul-si/67/base
2025-12-04T09:33:41.2683606Z  * [new branch]              gh/anshul-si/67/head        -> origin/gh/anshul-si/67/head
2025-12-04T09:33:41.2685174Z  * [new branch]              gh/anshul-si/67/orig        -> origin/gh/anshul-si/67/orig
2025-12-04T09:33:41.2687296Z  * [new branch]              gh/anshul-si/68/base        -> origin/gh/anshul-si/68/base
2025-12-04T09:33:41.2688608Z  * [new branch]              gh/anshul-si/68/head        -> origin/gh/anshul-si/68/head
2025-12-04T09:33:41.2690134Z  * [new branch]              gh/anshul-si/68/orig        -> origin/gh/anshul-si/68/orig
2025-12-04T09:33:41.2692346Z  * [new branch]              gh/anshul-si/69/base        -> origin/gh/anshul-si/69/base
2025-12-04T09:33:41.2693784Z  * [new branch]              gh/anshul-si/69/head        -> origin/gh/anshul-si/69/head
2025-12-04T09:33:41.2695273Z  * [new branch]              gh/anshul-si/69/orig        -> origin/gh/anshul-si/69/orig
2025-12-04T09:33:41.2697335Z  * [new branch]              gh/anshul-si/70/base        -> origin/gh/anshul-si/70/base
2025-12-04T09:33:41.2698936Z  * [new branch]              gh/anshul-si/70/head        -> origin/gh/anshul-si/70/head
2025-12-04T09:33:41.2701056Z  * [new branch]              gh/anshul-si/70/orig        -> origin/gh/anshul-si/70/orig
2025-12-04T09:33:41.2703074Z  * [new branch]              gh/anshul-si/71/base        -> origin/gh/anshul-si/71/base
2025-12-04T09:33:41.2704680Z  * [new branch]              gh/anshul-si/71/head        -> origin/gh/anshul-si/71/head
2025-12-04T09:33:41.2706199Z  * [new branch]              gh/anshul-si/71/orig        -> origin/gh/anshul-si/71/orig
2025-12-04T09:33:41.2708143Z  * [new branch]              gh/anshul-si/72/base        -> origin/gh/anshul-si/72/base
2025-12-04T09:33:41.2709688Z  * [new branch]              gh/anshul-si/72/head        -> origin/gh/anshul-si/72/head
2025-12-04T09:33:41.2711191Z  * [new branch]              gh/anshul-si/72/orig        -> origin/gh/anshul-si/72/orig
2025-12-04T09:33:41.2713224Z  * [new branch]              gh/anshul-si/73/base        -> origin/gh/anshul-si/73/base
2025-12-04T09:33:41.2714786Z  * [new branch]              gh/anshul-si/73/head        -> origin/gh/anshul-si/73/head
2025-12-04T09:33:41.2716271Z  * [new branch]              gh/anshul-si/73/orig        -> origin/gh/anshul-si/73/orig
2025-12-04T09:33:41.2718786Z  * [new branch]              gh/aorenste/132/base        -> origin/gh/aorenste/132/base
2025-12-04T09:33:41.2720203Z  * [new branch]              gh/aorenste/132/head        -> origin/gh/aorenste/132/head
2025-12-04T09:33:41.2722553Z  * [new branch]              gh/aorenste/134/base        -> origin/gh/aorenste/134/base
2025-12-04T09:33:41.2724234Z  * [new branch]              gh/aorenste/134/head        -> origin/gh/aorenste/134/head
2025-12-04T09:33:41.2725741Z  * [new branch]              gh/aorenste/134/orig        -> origin/gh/aorenste/134/orig
2025-12-04T09:33:41.2727788Z  * [new branch]              gh/aorenste/139/base        -> origin/gh/aorenste/139/base
2025-12-04T09:33:41.2729308Z  * [new branch]              gh/aorenste/139/head        -> origin/gh/aorenste/139/head
2025-12-04T09:33:41.2730805Z  * [new branch]              gh/aorenste/139/orig        -> origin/gh/aorenste/139/orig
2025-12-04T09:33:41.2732779Z  * [new branch]              gh/aorenste/141/base        -> origin/gh/aorenste/141/base
2025-12-04T09:33:41.2734058Z  * [new branch]              gh/aorenste/141/head        -> origin/gh/aorenste/141/head
2025-12-04T09:33:41.2736534Z  * [new branch]              gh/aorenste/145/base        -> origin/gh/aorenste/145/base
2025-12-04T09:33:41.2738134Z  * [new branch]              gh/aorenste/145/head        -> origin/gh/aorenste/145/head
2025-12-04T09:33:41.2739759Z  * [new branch]              gh/aorenste/145/orig        -> origin/gh/aorenste/145/orig
2025-12-04T09:33:41.2741931Z  * [new branch]              gh/aorenste/146/base        -> origin/gh/aorenste/146/base
2025-12-04T09:33:41.2743556Z  * [new branch]              gh/aorenste/146/head        -> origin/gh/aorenste/146/head
2025-12-04T09:33:41.2745078Z  * [new branch]              gh/aorenste/146/orig        -> origin/gh/aorenste/146/orig
2025-12-04T09:33:41.2747285Z  * [new branch]              gh/aorenste/147/base        -> origin/gh/aorenste/147/base
2025-12-04T09:33:41.2748945Z  * [new branch]              gh/aorenste/147/head        -> origin/gh/aorenste/147/head
2025-12-04T09:33:41.2750425Z  * [new branch]              gh/aorenste/147/orig        -> origin/gh/aorenste/147/orig
2025-12-04T09:33:41.2752496Z  * [new branch]              gh/aorenste/148/base        -> origin/gh/aorenste/148/base
2025-12-04T09:33:41.2753986Z  * [new branch]              gh/aorenste/148/head        -> origin/gh/aorenste/148/head
2025-12-04T09:33:41.2755554Z  * [new branch]              gh/aorenste/148/orig        -> origin/gh/aorenste/148/orig
2025-12-04T09:33:41.2757625Z  * [new branch]              gh/aorenste/149/base        -> origin/gh/aorenste/149/base
2025-12-04T09:33:41.2759095Z  * [new branch]              gh/aorenste/149/head        -> origin/gh/aorenste/149/head
2025-12-04T09:33:41.2760523Z  * [new branch]              gh/aorenste/149/orig        -> origin/gh/aorenste/149/orig
2025-12-04T09:33:41.2762658Z  * [new branch]              gh/aorenste/150/base        -> origin/gh/aorenste/150/base
2025-12-04T09:33:41.2763952Z  * [new branch]              gh/aorenste/150/head        -> origin/gh/aorenste/150/head
2025-12-04T09:33:41.2765638Z  * [new branch]              gh/aorenste/150/orig        -> origin/gh/aorenste/150/orig
2025-12-04T09:33:41.2767471Z  * [new branch]              gh/aorenste/151/base        -> origin/gh/aorenste/151/base
2025-12-04T09:33:41.2768961Z  * [new branch]              gh/aorenste/151/head        -> origin/gh/aorenste/151/head
2025-12-04T09:33:41.2770529Z  * [new branch]              gh/aorenste/151/orig        -> origin/gh/aorenste/151/orig
2025-12-04T09:33:41.2772664Z  * [new branch]              gh/aorenste/152/base        -> origin/gh/aorenste/152/base
2025-12-04T09:33:41.2774149Z  * [new branch]              gh/aorenste/152/head        -> origin/gh/aorenste/152/head
2025-12-04T09:33:41.2775658Z  * [new branch]              gh/aorenste/152/orig        -> origin/gh/aorenste/152/orig
2025-12-04T09:33:41.2777579Z  * [new branch]              gh/aorenste/153/base        -> origin/gh/aorenste/153/base
2025-12-04T09:33:41.2779077Z  * [new branch]              gh/aorenste/153/head        -> origin/gh/aorenste/153/head
2025-12-04T09:33:41.2780548Z  * [new branch]              gh/aorenste/153/orig        -> origin/gh/aorenste/153/orig
2025-12-04T09:33:41.2782841Z  * [new branch]              gh/aorenste/154/base        -> origin/gh/aorenste/154/base
2025-12-04T09:33:41.2784033Z  * [new branch]              gh/aorenste/154/head        -> origin/gh/aorenste/154/head
2025-12-04T09:33:41.2785334Z  * [new branch]              gh/aorenste/154/orig        -> origin/gh/aorenste/154/orig
2025-12-04T09:33:41.2787162Z  * [new branch]              gh/aorenste/155/base        -> origin/gh/aorenste/155/base
2025-12-04T09:33:41.2788721Z  * [new branch]              gh/aorenste/155/head        -> origin/gh/aorenste/155/head
2025-12-04T09:33:41.2789947Z  * [new branch]              gh/aorenste/155/orig        -> origin/gh/aorenste/155/orig
2025-12-04T09:33:41.2792023Z  * [new branch]              gh/aorenste/156/base        -> origin/gh/aorenste/156/base
2025-12-04T09:33:41.2793224Z  * [new branch]              gh/aorenste/156/head        -> origin/gh/aorenste/156/head
2025-12-04T09:33:41.2794721Z  * [new branch]              gh/aorenste/156/orig        -> origin/gh/aorenste/156/orig
2025-12-04T09:33:41.2797035Z  * [new branch]              gh/aorenste/157/base        -> origin/gh/aorenste/157/base
2025-12-04T09:33:41.2798531Z  * [new branch]              gh/aorenste/157/head        -> origin/gh/aorenste/157/head
2025-12-04T09:33:41.2799853Z  * [new branch]              gh/aorenste/157/orig        -> origin/gh/aorenste/157/orig
2025-12-04T09:33:41.2801951Z  * [new branch]              gh/aorenste/158/base        -> origin/gh/aorenste/158/base
2025-12-04T09:33:41.2803478Z  * [new branch]              gh/aorenste/158/head        -> origin/gh/aorenste/158/head
2025-12-04T09:33:41.2804715Z  * [new branch]              gh/aorenste/158/orig        -> origin/gh/aorenste/158/orig
2025-12-04T09:33:41.2806645Z  * [new branch]              gh/aorenste/159/base        -> origin/gh/aorenste/159/base
2025-12-04T09:33:41.2808259Z  * [new branch]              gh/aorenste/159/head        -> origin/gh/aorenste/159/head
2025-12-04T09:33:41.2809538Z  * [new branch]              gh/aorenste/159/orig        -> origin/gh/aorenste/159/orig
2025-12-04T09:33:41.2812004Z  * [new branch]              gh/avikchaudhuri/1/base     -> origin/gh/avikchaudhuri/1/base
2025-12-04T09:33:41.2813396Z  * [new branch]              gh/avikchaudhuri/1/head     -> origin/gh/avikchaudhuri/1/head
2025-12-04T09:33:41.2815303Z  * [new branch]              gh/avikchaudhuri/2/base     -> origin/gh/avikchaudhuri/2/base
2025-12-04T09:33:41.2816720Z  * [new branch]              gh/avikchaudhuri/2/head     -> origin/gh/avikchaudhuri/2/head
2025-12-04T09:33:41.2818165Z  * [new branch]              gh/avikchaudhuri/2/orig     -> origin/gh/avikchaudhuri/2/orig
2025-12-04T09:33:41.2821204Z  * [new branch]              gh/bdhirsh/666/base         -> origin/gh/bdhirsh/666/base
2025-12-04T09:33:41.2822442Z  * [new branch]              gh/bdhirsh/666/head         -> origin/gh/bdhirsh/666/head
2025-12-04T09:33:41.2824075Z  * [new branch]              gh/bdhirsh/666/orig         -> origin/gh/bdhirsh/666/orig
2025-12-04T09:33:41.2826104Z  * [new branch]              gh/bdhirsh/668/base         -> origin/gh/bdhirsh/668/base
2025-12-04T09:33:41.2827558Z  * [new branch]              gh/bdhirsh/668/head         -> origin/gh/bdhirsh/668/head
2025-12-04T09:33:41.2828974Z  * [new branch]              gh/bdhirsh/668/orig         -> origin/gh/bdhirsh/668/orig
2025-12-04T09:33:41.2831199Z  * [new branch]              gh/bdhirsh/669/base         -> origin/gh/bdhirsh/669/base
2025-12-04T09:33:41.2832481Z  * [new branch]              gh/bdhirsh/669/head         -> origin/gh/bdhirsh/669/head
2025-12-04T09:33:41.2834021Z  * [new branch]              gh/bdhirsh/669/orig         -> origin/gh/bdhirsh/669/orig
2025-12-04T09:33:41.2836212Z  * [new branch]              gh/bdhirsh/670/base         -> origin/gh/bdhirsh/670/base
2025-12-04T09:33:41.2837830Z  * [new branch]              gh/bdhirsh/670/head         -> origin/gh/bdhirsh/670/head
2025-12-04T09:33:41.2839397Z  * [new branch]              gh/bdhirsh/670/orig         -> origin/gh/bdhirsh/670/orig
2025-12-04T09:33:41.2841487Z  * [new branch]              gh/bdhirsh/672/base         -> origin/gh/bdhirsh/672/base
2025-12-04T09:33:41.2843008Z  * [new branch]              gh/bdhirsh/672/head         -> origin/gh/bdhirsh/672/head
2025-12-04T09:33:41.2844459Z  * [new branch]              gh/bdhirsh/672/orig         -> origin/gh/bdhirsh/672/orig
2025-12-04T09:33:41.2846771Z  * [new branch]              gh/bdhirsh/675/base         -> origin/gh/bdhirsh/675/base
2025-12-04T09:33:41.2848506Z  * [new branch]              gh/bdhirsh/675/head         -> origin/gh/bdhirsh/675/head
2025-12-04T09:33:41.2850018Z  * [new branch]              gh/bdhirsh/675/orig         -> origin/gh/bdhirsh/675/orig
2025-12-04T09:33:41.2852030Z  * [new branch]              gh/bdhirsh/676/base         -> origin/gh/bdhirsh/676/base
2025-12-04T09:33:41.2853743Z  * [new branch]              gh/bdhirsh/676/head         -> origin/gh/bdhirsh/676/head
2025-12-04T09:33:41.2854980Z  * [new branch]              gh/bdhirsh/676/orig         -> origin/gh/bdhirsh/676/orig
2025-12-04T09:33:41.2857067Z  * [new branch]              gh/bdhirsh/677/base         -> origin/gh/bdhirsh/677/base
2025-12-04T09:33:41.2859158Z  * [new branch]              gh/bdhirsh/677/head         -> origin/gh/bdhirsh/677/head
2025-12-04T09:33:41.2860613Z  * [new branch]              gh/bdhirsh/677/orig         -> origin/gh/bdhirsh/677/orig
2025-12-04T09:33:41.2862971Z  * [new branch]              gh/bdhirsh/678/base         -> origin/gh/bdhirsh/678/base
2025-12-04T09:33:41.2864583Z  * [new branch]              gh/bdhirsh/678/head         -> origin/gh/bdhirsh/678/head
2025-12-04T09:33:41.2866074Z  * [new branch]              gh/bdhirsh/678/orig         -> origin/gh/bdhirsh/678/orig
2025-12-04T09:33:41.2868253Z  * [new branch]              gh/bdhirsh/679/base         -> origin/gh/bdhirsh/679/base
2025-12-04T09:33:41.2869851Z  * [new branch]              gh/bdhirsh/679/head         -> origin/gh/bdhirsh/679/head
2025-12-04T09:33:41.2871491Z  * [new branch]              gh/bdhirsh/679/orig         -> origin/gh/bdhirsh/679/orig
2025-12-04T09:33:41.2873595Z  * [new branch]              gh/bdhirsh/680/base         -> origin/gh/bdhirsh/680/base
2025-12-04T09:33:41.2875142Z  * [new branch]              gh/bdhirsh/680/head         -> origin/gh/bdhirsh/680/head
2025-12-04T09:33:41.2876627Z  * [new branch]              gh/bdhirsh/680/orig         -> origin/gh/bdhirsh/680/orig
2025-12-04T09:33:41.2878505Z  * [new branch]              gh/bdhirsh/681/base         -> origin/gh/bdhirsh/681/base
2025-12-04T09:33:41.2880130Z  * [new branch]              gh/bdhirsh/681/head         -> origin/gh/bdhirsh/681/head
2025-12-04T09:33:41.2881745Z  * [new branch]              gh/bdhirsh/681/orig         -> origin/gh/bdhirsh/681/orig
2025-12-04T09:33:41.2884301Z  * [new branch]              gh/benjaminglass1/101/base  -> origin/gh/benjaminglass1/101/base
2025-12-04T09:33:41.2885873Z  * [new branch]              gh/benjaminglass1/101/head  -> origin/gh/benjaminglass1/101/head
2025-12-04T09:33:41.2887503Z  * [new branch]              gh/benjaminglass1/101/orig  -> origin/gh/benjaminglass1/101/orig
2025-12-04T09:33:41.2889673Z  * [new branch]              gh/benjaminglass1/102/base  -> origin/gh/benjaminglass1/102/base
2025-12-04T09:33:41.2891064Z  * [new branch]              gh/benjaminglass1/102/head  -> origin/gh/benjaminglass1/102/head
2025-12-04T09:33:41.2892568Z  * [new branch]              gh/benjaminglass1/102/orig  -> origin/gh/benjaminglass1/102/orig
2025-12-04T09:33:41.2894524Z  * [new branch]              gh/benjaminglass1/106/base  -> origin/gh/benjaminglass1/106/base
2025-12-04T09:33:41.2896014Z  * [new branch]              gh/benjaminglass1/106/head  -> origin/gh/benjaminglass1/106/head
2025-12-04T09:33:41.2897635Z  * [new branch]              gh/benjaminglass1/106/orig  -> origin/gh/benjaminglass1/106/orig
2025-12-04T09:33:41.2899560Z  * [new branch]              gh/benjaminglass1/107/base  -> origin/gh/benjaminglass1/107/base
2025-12-04T09:33:41.2901069Z  * [new branch]              gh/benjaminglass1/107/head  -> origin/gh/benjaminglass1/107/head
2025-12-04T09:33:41.2902684Z  * [new branch]              gh/benjaminglass1/107/orig  -> origin/gh/benjaminglass1/107/orig
2025-12-04T09:33:41.2904655Z  * [new branch]              gh/benjaminglass1/108/base  -> origin/gh/benjaminglass1/108/base
2025-12-04T09:33:41.2906140Z  * [new branch]              gh/benjaminglass1/108/head  -> origin/gh/benjaminglass1/108/head
2025-12-04T09:33:41.2907604Z  * [new branch]              gh/benjaminglass1/108/orig  -> origin/gh/benjaminglass1/108/orig
2025-12-04T09:33:41.2909572Z  * [new branch]              gh/benjaminglass1/109/base  -> origin/gh/benjaminglass1/109/base
2025-12-04T09:33:41.2911057Z  * [new branch]              gh/benjaminglass1/109/head  -> origin/gh/benjaminglass1/109/head
2025-12-04T09:33:41.2912600Z  * [new branch]              gh/benjaminglass1/109/orig  -> origin/gh/benjaminglass1/109/orig
2025-12-04T09:33:41.2914555Z  * [new branch]              gh/benjaminglass1/97/base   -> origin/gh/benjaminglass1/97/base
2025-12-04T09:33:41.2916130Z  * [new branch]              gh/benjaminglass1/97/head   -> origin/gh/benjaminglass1/97/head
2025-12-04T09:33:41.2917666Z  * [new branch]              gh/benjaminglass1/97/orig   -> origin/gh/benjaminglass1/97/orig
2025-12-04T09:33:41.2920003Z  * [new branch]              gh/bobrenjc93/570/base      -> origin/gh/bobrenjc93/570/base
2025-12-04T09:33:41.2921651Z  * [new branch]              gh/bobrenjc93/570/head      -> origin/gh/bobrenjc93/570/head
2025-12-04T09:33:41.2923127Z  * [new branch]              gh/bobrenjc93/570/orig      -> origin/gh/bobrenjc93/570/orig
2025-12-04T09:33:41.2925085Z  * [new branch]              gh/bobrenjc93/604/base      -> origin/gh/bobrenjc93/604/base
2025-12-04T09:33:41.2926616Z  * [new branch]              gh/bobrenjc93/604/head      -> origin/gh/bobrenjc93/604/head
2025-12-04T09:33:41.2928170Z  * [new branch]              gh/bobrenjc93/604/orig      -> origin/gh/bobrenjc93/604/orig
2025-12-04T09:33:41.2930075Z  * [new branch]              gh/bobrenjc93/638/base      -> origin/gh/bobrenjc93/638/base
2025-12-04T09:33:41.2931553Z  * [new branch]              gh/bobrenjc93/638/head      -> origin/gh/bobrenjc93/638/head
2025-12-04T09:33:41.2933020Z  * [new branch]              gh/bobrenjc93/638/orig      -> origin/gh/bobrenjc93/638/orig
2025-12-04T09:33:41.2934964Z  * [new branch]              gh/bobrenjc93/653/base      -> origin/gh/bobrenjc93/653/base
2025-12-04T09:33:41.2936581Z  * [new branch]              gh/bobrenjc93/653/head      -> origin/gh/bobrenjc93/653/head
2025-12-04T09:33:41.2938086Z  * [new branch]              gh/bobrenjc93/653/orig      -> origin/gh/bobrenjc93/653/orig
2025-12-04T09:33:41.2940244Z  * [new branch]              gh/bobrenjc93/654/base      -> origin/gh/bobrenjc93/654/base
2025-12-04T09:33:41.2941770Z  * [new branch]              gh/bobrenjc93/654/head      -> origin/gh/bobrenjc93/654/head
2025-12-04T09:33:41.2943188Z  * [new branch]              gh/bobrenjc93/654/orig      -> origin/gh/bobrenjc93/654/orig
2025-12-04T09:33:41.2945133Z  * [new branch]              gh/bobrenjc93/657/base      -> origin/gh/bobrenjc93/657/base
2025-12-04T09:33:41.2946570Z  * [new branch]              gh/bobrenjc93/657/head      -> origin/gh/bobrenjc93/657/head
2025-12-04T09:33:41.2948028Z  * [new branch]              gh/bobrenjc93/657/orig      -> origin/gh/bobrenjc93/657/orig
2025-12-04T09:33:41.2950020Z  * [new branch]              gh/bobrenjc93/672/base      -> origin/gh/bobrenjc93/672/base
2025-12-04T09:33:41.2951430Z  * [new branch]              gh/bobrenjc93/672/head      -> origin/gh/bobrenjc93/672/head
2025-12-04T09:33:41.2952891Z  * [new branch]              gh/bobrenjc93/672/orig      -> origin/gh/bobrenjc93/672/orig
2025-12-04T09:33:41.2954859Z  * [new branch]              gh/bobrenjc93/679/base      -> origin/gh/bobrenjc93/679/base
2025-12-04T09:33:41.2956741Z  * [new branch]              gh/bobrenjc93/679/head      -> origin/gh/bobrenjc93/679/head
2025-12-04T09:33:41.2958195Z  * [new branch]              gh/bobrenjc93/679/orig      -> origin/gh/bobrenjc93/679/orig
2025-12-04T09:33:41.2960228Z  * [new branch]              gh/bobrenjc93/680/base      -> origin/gh/bobrenjc93/680/base
2025-12-04T09:33:41.2961809Z  * [new branch]              gh/bobrenjc93/680/head      -> origin/gh/bobrenjc93/680/head
2025-12-04T09:33:41.2963609Z  * [new branch]              gh/bobrenjc93/680/orig      -> origin/gh/bobrenjc93/680/orig
2025-12-04T09:33:41.2965413Z  * [new branch]              gh/bobrenjc93/681/base      -> origin/gh/bobrenjc93/681/base
2025-12-04T09:33:41.2966957Z  * [new branch]              gh/bobrenjc93/681/head      -> origin/gh/bobrenjc93/681/head
2025-12-04T09:33:41.2968540Z  * [new branch]              gh/bobrenjc93/681/orig      -> origin/gh/bobrenjc93/681/orig
2025-12-04T09:33:41.2970429Z  * [new branch]              gh/bobrenjc93/682/base      -> origin/gh/bobrenjc93/682/base
2025-12-04T09:33:41.2972158Z  * [new branch]              gh/bobrenjc93/682/head      -> origin/gh/bobrenjc93/682/head
2025-12-04T09:33:41.2973646Z  * [new branch]              gh/bobrenjc93/682/orig      -> origin/gh/bobrenjc93/682/orig
2025-12-04T09:33:41.2975677Z  * [new branch]              gh/bobrenjc93/683/base      -> origin/gh/bobrenjc93/683/base
2025-12-04T09:33:41.2977322Z  * [new branch]              gh/bobrenjc93/683/head      -> origin/gh/bobrenjc93/683/head
2025-12-04T09:33:41.2978877Z  * [new branch]              gh/bobrenjc93/683/orig      -> origin/gh/bobrenjc93/683/orig
2025-12-04T09:33:41.2980825Z  * [new branch]              gh/bobrenjc93/684/base      -> origin/gh/bobrenjc93/684/base
2025-12-04T09:33:41.2982629Z  * [new branch]              gh/bobrenjc93/684/head      -> origin/gh/bobrenjc93/684/head
2025-12-04T09:33:41.2984351Z  * [new branch]              gh/bobrenjc93/684/orig      -> origin/gh/bobrenjc93/684/orig
2025-12-04T09:33:41.2986117Z  * [new branch]              gh/bobrenjc93/685/base      -> origin/gh/bobrenjc93/685/base
2025-12-04T09:33:41.2988465Z  * [new branch]              gh/bobrenjc93/685/head      -> origin/gh/bobrenjc93/685/head
2025-12-04T09:33:41.2991933Z  * [new branch]              gh/bobrenjc93/685/orig      -> origin/gh/bobrenjc93/685/orig
2025-12-04T09:33:41.2992339Z  * [new branch]              gh/bobrenjc93/686/base      -> origin/gh/bobrenjc93/686/base
2025-12-04T09:33:41.2993187Z  * [new branch]              gh/bobrenjc93/686/head      -> origin/gh/bobrenjc93/686/head
2025-12-04T09:33:41.2994804Z  * [new branch]              gh/bobrenjc93/686/orig      -> origin/gh/bobrenjc93/686/orig
2025-12-04T09:33:41.2996698Z  * [new branch]              gh/bobrenjc93/687/base      -> origin/gh/bobrenjc93/687/base
2025-12-04T09:33:41.2998631Z  * [new branch]              gh/bobrenjc93/687/head      -> origin/gh/bobrenjc93/687/head
2025-12-04T09:33:41.3000010Z  * [new branch]              gh/bobrenjc93/687/orig      -> origin/gh/bobrenjc93/687/orig
2025-12-04T09:33:41.3002550Z  * [new branch]              gh/bobrenjc93/688/base      -> origin/gh/bobrenjc93/688/base
2025-12-04T09:33:41.3004180Z  * [new branch]              gh/bobrenjc93/688/head      -> origin/gh/bobrenjc93/688/head
2025-12-04T09:33:41.3005717Z  * [new branch]              gh/bobrenjc93/688/orig      -> origin/gh/bobrenjc93/688/orig
2025-12-04T09:33:41.3007568Z  * [new branch]              gh/bobrenjc93/689/base      -> origin/gh/bobrenjc93/689/base
2025-12-04T09:33:41.3009148Z  * [new branch]              gh/bobrenjc93/689/head      -> origin/gh/bobrenjc93/689/head
2025-12-04T09:33:41.3010663Z  * [new branch]              gh/bobrenjc93/689/orig      -> origin/gh/bobrenjc93/689/orig
2025-12-04T09:33:41.3012521Z  * [new branch]              gh/bobrenjc93/690/base      -> origin/gh/bobrenjc93/690/base
2025-12-04T09:33:41.3013995Z  * [new branch]              gh/bobrenjc93/690/head      -> origin/gh/bobrenjc93/690/head
2025-12-04T09:33:41.3015511Z  * [new branch]              gh/bobrenjc93/690/orig      -> origin/gh/bobrenjc93/690/orig
2025-12-04T09:33:41.3018552Z  * [new branch]              gh/bobrenjc93/691/base      -> origin/gh/bobrenjc93/691/base
2025-12-04T09:33:41.3020417Z  * [new branch]              gh/bobrenjc93/691/head      -> origin/gh/bobrenjc93/691/head
2025-12-04T09:33:41.3022475Z  * [new branch]              gh/bobrenjc93/691/orig      -> origin/gh/bobrenjc93/691/orig
2025-12-04T09:33:41.3025351Z  * [new branch]              gh/bobrenjc93/692/base      -> origin/gh/bobrenjc93/692/base
2025-12-04T09:33:41.3026989Z  * [new branch]              gh/bobrenjc93/692/head      -> origin/gh/bobrenjc93/692/head
2025-12-04T09:33:41.3028524Z  * [new branch]              gh/bobrenjc93/692/orig      -> origin/gh/bobrenjc93/692/orig
2025-12-04T09:33:41.3030439Z  * [new branch]              gh/bobrenjc93/693/base      -> origin/gh/bobrenjc93/693/base
2025-12-04T09:33:41.3031918Z  * [new branch]              gh/bobrenjc93/693/head      -> origin/gh/bobrenjc93/693/head
2025-12-04T09:33:41.3033517Z  * [new branch]              gh/bobrenjc93/693/orig      -> origin/gh/bobrenjc93/693/orig
2025-12-04T09:33:41.3035557Z  * [new branch]              gh/bobrenjc93/694/base      -> origin/gh/bobrenjc93/694/base
2025-12-04T09:33:41.3037105Z  * [new branch]              gh/bobrenjc93/694/head      -> origin/gh/bobrenjc93/694/head
2025-12-04T09:33:41.3038653Z  * [new branch]              gh/bobrenjc93/694/orig      -> origin/gh/bobrenjc93/694/orig
2025-12-04T09:33:41.3040542Z  * [new branch]              gh/bobrenjc93/695/base      -> origin/gh/bobrenjc93/695/base
2025-12-04T09:33:41.3042024Z  * [new branch]              gh/bobrenjc93/695/head      -> origin/gh/bobrenjc93/695/head
2025-12-04T09:33:41.3043481Z  * [new branch]              gh/bobrenjc93/695/orig      -> origin/gh/bobrenjc93/695/orig
2025-12-04T09:33:41.3046089Z  * [new branch]              gh/c00w/23/base             -> origin/gh/c00w/23/base
2025-12-04T09:33:41.3047653Z  * [new branch]              gh/c00w/23/head             -> origin/gh/c00w/23/head
2025-12-04T09:33:41.3049744Z  * [new branch]              gh/c00w/53/base             -> origin/gh/c00w/53/base
2025-12-04T09:33:41.3051150Z  * [new branch]              gh/c00w/53/head             -> origin/gh/c00w/53/head
2025-12-04T09:33:41.3052629Z  * [new branch]              gh/c00w/53/orig             -> origin/gh/c00w/53/orig
2025-12-04T09:33:41.3054413Z  * [new branch]              gh/c00w/54/base             -> origin/gh/c00w/54/base
2025-12-04T09:33:41.3056605Z  * [new branch]              gh/c00w/54/head             -> origin/gh/c00w/54/head
2025-12-04T09:33:41.3058283Z  * [new branch]              gh/c00w/54/orig             -> origin/gh/c00w/54/orig
2025-12-04T09:33:41.3060778Z  * [new branch]              gh/c00w/56/base             -> origin/gh/c00w/56/base
2025-12-04T09:33:41.3062344Z  * [new branch]              gh/c00w/56/head             -> origin/gh/c00w/56/head
2025-12-04T09:33:41.3063762Z  * [new branch]              gh/c00w/56/orig             -> origin/gh/c00w/56/orig
2025-12-04T09:33:41.3065684Z  * [new branch]              gh/c00w/57/base             -> origin/gh/c00w/57/base
2025-12-04T09:33:41.3067285Z  * [new branch]              gh/c00w/57/head             -> origin/gh/c00w/57/head
2025-12-04T09:33:41.3068884Z  * [new branch]              gh/c00w/57/orig             -> origin/gh/c00w/57/orig
2025-12-04T09:33:41.3070760Z  * [new branch]              gh/c00w/58/base             -> origin/gh/c00w/58/base
2025-12-04T09:33:41.3075812Z  * [new branch]              gh/c00w/58/head             -> origin/gh/c00w/58/head
2025-12-04T09:33:41.3077238Z  * [new branch]              gh/c00w/58/orig             -> origin/gh/c00w/58/orig
2025-12-04T09:33:41.3079617Z  * [new branch]              gh/clee2000/1/base          -> origin/gh/clee2000/1/base
2025-12-04T09:33:41.3081292Z  * [new branch]              gh/clee2000/1/head          -> origin/gh/clee2000/1/head
2025-12-04T09:33:41.3082823Z  * [new branch]              gh/clee2000/1/orig          -> origin/gh/clee2000/1/orig
2025-12-04T09:33:41.3085360Z  * [new branch]              gh/coconutruben/1/base      -> origin/gh/coconutruben/1/base
2025-12-04T09:33:41.3087010Z  * [new branch]              gh/coconutruben/1/head      -> origin/gh/coconutruben/1/head
2025-12-04T09:33:41.3089496Z  * [new branch]              gh/coconutruben/55/base     -> origin/gh/coconutruben/55/base
2025-12-04T09:33:41.3090977Z  * [new branch]              gh/coconutruben/55/head     -> origin/gh/coconutruben/55/head
2025-12-04T09:33:41.3092547Z  * [new branch]              gh/coconutruben/55/orig     -> origin/gh/coconutruben/55/orig
2025-12-04T09:33:41.3094751Z  * [new branch]              gh/coconutruben/57/base     -> origin/gh/coconutruben/57/base
2025-12-04T09:33:41.3096526Z  * [new branch]              gh/coconutruben/57/head     -> origin/gh/coconutruben/57/head
2025-12-04T09:33:41.3098245Z  * [new branch]              gh/coconutruben/57/orig     -> origin/gh/coconutruben/57/orig
2025-12-04T09:33:41.3100295Z  * [new branch]              gh/coconutruben/70/base     -> origin/gh/coconutruben/70/base
2025-12-04T09:33:41.3101874Z  * [new branch]              gh/coconutruben/70/head     -> origin/gh/coconutruben/70/head
2025-12-04T09:33:41.3103568Z  * [new branch]              gh/coconutruben/70/orig     -> origin/gh/coconutruben/70/orig
2025-12-04T09:33:41.3105410Z  * [new branch]              gh/coconutruben/71/base     -> origin/gh/coconutruben/71/base
2025-12-04T09:33:41.3106989Z  * [new branch]              gh/coconutruben/71/head     -> origin/gh/coconutruben/71/head
2025-12-04T09:33:41.3108488Z  * [new branch]              gh/coconutruben/71/orig     -> origin/gh/coconutruben/71/orig
2025-12-04T09:33:41.3110873Z  * [new branch]              gh/coconutruben/72/base     -> origin/gh/coconutruben/72/base
2025-12-04T09:33:41.3112064Z  * [new branch]              gh/coconutruben/72/head     -> origin/gh/coconutruben/72/head
2025-12-04T09:33:41.3113591Z  * [new branch]              gh/coconutruben/72/orig     -> origin/gh/coconutruben/72/orig
2025-12-04T09:33:41.3115427Z  * [new branch]              gh/coconutruben/73/base     -> origin/gh/coconutruben/73/base
2025-12-04T09:33:41.3116999Z  * [new branch]              gh/coconutruben/73/head     -> origin/gh/coconutruben/73/head
2025-12-04T09:33:41.3118581Z  * [new branch]              gh/coconutruben/73/orig     -> origin/gh/coconutruben/73/orig
2025-12-04T09:33:41.3120733Z  * [new branch]              gh/coconutruben/74/base     -> origin/gh/coconutruben/74/base
2025-12-04T09:33:41.3122530Z  * [new branch]              gh/coconutruben/74/head     -> origin/gh/coconutruben/74/head
2025-12-04T09:33:41.3123929Z  * [new branch]              gh/coconutruben/74/orig     -> origin/gh/coconutruben/74/orig
2025-12-04T09:33:41.3126095Z  * [new branch]              gh/coconutruben/79/base     -> origin/gh/coconutruben/79/base
2025-12-04T09:33:41.3127820Z  * [new branch]              gh/coconutruben/79/head     -> origin/gh/coconutruben/79/head
2025-12-04T09:33:41.3129231Z  * [new branch]              gh/coconutruben/79/orig     -> origin/gh/coconutruben/79/orig
2025-12-04T09:33:41.3131460Z  * [new branch]              gh/coconutruben/80/base     -> origin/gh/coconutruben/80/base
2025-12-04T09:33:41.3132938Z  * [new branch]              gh/coconutruben/80/head     -> origin/gh/coconutruben/80/head
2025-12-04T09:33:41.3134592Z  * [new branch]              gh/coconutruben/80/orig     -> origin/gh/coconutruben/80/orig
2025-12-04T09:33:41.3136750Z  * [new branch]              gh/coconutruben/82/base     -> origin/gh/coconutruben/82/base
2025-12-04T09:33:41.3138313Z  * [new branch]              gh/coconutruben/82/head     -> origin/gh/coconutruben/82/head
2025-12-04T09:33:41.3139769Z  * [new branch]              gh/coconutruben/82/orig     -> origin/gh/coconutruben/82/orig
2025-12-04T09:33:41.3142499Z  * [new branch]              gh/coconutruben/83/base     -> origin/gh/coconutruben/83/base
2025-12-04T09:33:41.3143335Z  * [new branch]              gh/coconutruben/83/head     -> origin/gh/coconutruben/83/head
2025-12-04T09:33:41.3144931Z  * [new branch]              gh/coconutruben/83/orig     -> origin/gh/coconutruben/83/orig
2025-12-04T09:33:41.3147038Z  * [new branch]              gh/coconutruben/84/base     -> origin/gh/coconutruben/84/base
2025-12-04T09:33:41.3148674Z  * [new branch]              gh/coconutruben/84/head     -> origin/gh/coconutruben/84/head
2025-12-04T09:33:41.3150162Z  * [new branch]              gh/coconutruben/84/orig     -> origin/gh/coconutruben/84/orig
2025-12-04T09:33:41.3152538Z  * [new branch]              gh/coconutruben/85/base     -> origin/gh/coconutruben/85/base
2025-12-04T09:33:41.3153922Z  * [new branch]              gh/coconutruben/85/head     -> origin/gh/coconutruben/85/head
2025-12-04T09:33:41.3155537Z  * [new branch]              gh/coconutruben/85/orig     -> origin/gh/coconutruben/85/orig
2025-12-04T09:33:41.3157496Z  * [new branch]              gh/coconutruben/86/base     -> origin/gh/coconutruben/86/base
2025-12-04T09:33:41.3159017Z  * [new branch]              gh/coconutruben/86/head     -> origin/gh/coconutruben/86/head
2025-12-04T09:33:41.3160550Z  * [new branch]              gh/coconutruben/86/orig     -> origin/gh/coconutruben/86/orig
2025-12-04T09:33:41.3163075Z  * [new branch]              gh/colinchan15/1/base       -> origin/gh/colinchan15/1/base
2025-12-04T09:33:41.3164686Z  * [new branch]              gh/colinchan15/1/head       -> origin/gh/colinchan15/1/head
2025-12-04T09:33:41.3166502Z  * [new branch]              gh/colinchan15/2/base       -> origin/gh/colinchan15/2/base
2025-12-04T09:33:41.3167938Z  * [new branch]              gh/colinchan15/2/head       -> origin/gh/colinchan15/2/head
2025-12-04T09:33:41.3169773Z  * [new branch]              gh/colinchan15/3/base       -> origin/gh/colinchan15/3/base
2025-12-04T09:33:41.3171557Z  * [new branch]              gh/colinchan15/3/head       -> origin/gh/colinchan15/3/head
2025-12-04T09:33:41.3173399Z  * [new branch]              gh/colinchan15/6/base       -> origin/gh/colinchan15/6/base
2025-12-04T09:33:41.3174903Z  * [new branch]              gh/colinchan15/6/head       -> origin/gh/colinchan15/6/head
2025-12-04T09:33:41.3177452Z  * [new branch]              gh/d4l3k/1/base             -> origin/gh/d4l3k/1/base
2025-12-04T09:33:41.3178918Z  * [new branch]              gh/d4l3k/1/head             -> origin/gh/d4l3k/1/head
2025-12-04T09:33:41.3180954Z  * [new branch]              gh/d4l3k/2/base             -> origin/gh/d4l3k/2/base
2025-12-04T09:33:41.3182415Z  * [new branch]              gh/d4l3k/2/head             -> origin/gh/d4l3k/2/head
2025-12-04T09:33:41.3183897Z  * [new branch]              gh/d4l3k/2/orig             -> origin/gh/d4l3k/2/orig
2025-12-04T09:33:41.3185799Z  * [new branch]              gh/d4l3k/3/base             -> origin/gh/d4l3k/3/base
2025-12-04T09:33:41.3187308Z  * [new branch]              gh/d4l3k/3/head             -> origin/gh/d4l3k/3/head
2025-12-04T09:33:41.3188967Z  * [new branch]              gh/d4l3k/3/orig             -> origin/gh/d4l3k/3/orig
2025-12-04T09:33:41.3190880Z  * [new branch]              gh/d4l3k/4/base             -> origin/gh/d4l3k/4/base
2025-12-04T09:33:41.3192441Z  * [new branch]              gh/d4l3k/4/head             -> origin/gh/d4l3k/4/head
2025-12-04T09:33:41.3194006Z  * [new branch]              gh/d4l3k/4/orig             -> origin/gh/d4l3k/4/orig
2025-12-04T09:33:41.3195880Z  * [new branch]              gh/d4l3k/5/base             -> origin/gh/d4l3k/5/base
2025-12-04T09:33:41.3197335Z  * [new branch]              gh/d4l3k/5/orig             -> origin/gh/d4l3k/5/orig
2025-12-04T09:33:41.3199859Z  * [new branch]              gh/davidberard98/392/base   -> origin/gh/davidberard98/392/base
2025-12-04T09:33:41.3201360Z  * [new branch]              gh/davidberard98/392/head   -> origin/gh/davidberard98/392/head
2025-12-04T09:33:41.3202806Z  * [new branch]              gh/davidberard98/392/orig   -> origin/gh/davidberard98/392/orig
2025-12-04T09:33:41.3204901Z  * [new branch]              gh/davidberard98/399/base   -> origin/gh/davidberard98/399/base
2025-12-04T09:33:41.3206481Z  * [new branch]              gh/davidberard98/399/head   -> origin/gh/davidberard98/399/head
2025-12-04T09:33:41.3207987Z  * [new branch]              gh/davidberard98/399/orig   -> origin/gh/davidberard98/399/orig
2025-12-04T09:33:41.3210415Z  * [new branch]              gh/desertfire/605/base      -> origin/gh/desertfire/605/base
2025-12-04T09:33:41.3211893Z  * [new branch]              gh/desertfire/605/head      -> origin/gh/desertfire/605/head
2025-12-04T09:33:41.3213444Z  * [new branch]              gh/desertfire/605/orig      -> origin/gh/desertfire/605/orig
2025-12-04T09:33:41.3215338Z  * [new branch]              gh/desertfire/606/base      -> origin/gh/desertfire/606/base
2025-12-04T09:33:41.3216847Z  * [new branch]              gh/desertfire/606/head      -> origin/gh/desertfire/606/head
2025-12-04T09:33:41.3218624Z  * [new branch]              gh/desertfire/606/orig      -> origin/gh/desertfire/606/orig
2025-12-04T09:33:41.3220490Z  * [new branch]              gh/desertfire/607/base      -> origin/gh/desertfire/607/base
2025-12-04T09:33:41.3221942Z  * [new branch]              gh/desertfire/607/head      -> origin/gh/desertfire/607/head
2025-12-04T09:33:41.3223529Z  * [new branch]              gh/desertfire/607/orig      -> origin/gh/desertfire/607/orig
2025-12-04T09:33:41.3225433Z  * [new branch]              gh/desertfire/608/base      -> origin/gh/desertfire/608/base
2025-12-04T09:33:41.3226874Z  * [new branch]              gh/desertfire/608/head      -> origin/gh/desertfire/608/head
2025-12-04T09:33:41.3228555Z  * [new branch]              gh/desertfire/608/orig      -> origin/gh/desertfire/608/orig
2025-12-04T09:33:41.3230544Z  * [new branch]              gh/desertfire/609/base      -> origin/gh/desertfire/609/base
2025-12-04T09:33:41.3232075Z  * [new branch]              gh/desertfire/609/head      -> origin/gh/desertfire/609/head
2025-12-04T09:33:41.3233596Z  * [new branch]              gh/desertfire/609/orig      -> origin/gh/desertfire/609/orig
2025-12-04T09:33:41.3235805Z  * [new branch]              gh/desertfire/610/base      -> origin/gh/desertfire/610/base
2025-12-04T09:33:41.3237853Z  * [new branch]              gh/desertfire/610/head      -> origin/gh/desertfire/610/head
2025-12-04T09:33:41.3239429Z  * [new branch]              gh/desertfire/610/orig      -> origin/gh/desertfire/610/orig
2025-12-04T09:33:41.3257826Z  * [new branch]              gh/desertfire/611/base      -> origin/gh/desertfire/611/base
2025-12-04T09:33:41.3258688Z  * [new branch]              gh/desertfire/611/head      -> origin/gh/desertfire/611/head
2025-12-04T09:33:41.3258990Z  * [new branch]              gh/desertfire/611/orig      -> origin/gh/desertfire/611/orig
2025-12-04T09:33:41.3259278Z  * [new branch]              gh/desertfire/612/base      -> origin/gh/desertfire/612/base
2025-12-04T09:33:41.3259971Z  * [new branch]              gh/desertfire/612/head      -> origin/gh/desertfire/612/head
2025-12-04T09:33:41.3260243Z  * [new branch]              gh/desertfire/612/orig      -> origin/gh/desertfire/612/orig
2025-12-04T09:33:41.3260523Z  * [new branch]              gh/desertfire/613/base      -> origin/gh/desertfire/613/base
2025-12-04T09:33:41.3260799Z  * [new branch]              gh/desertfire/613/head      -> origin/gh/desertfire/613/head
2025-12-04T09:33:41.3261077Z  * [new branch]              gh/desertfire/613/orig      -> origin/gh/desertfire/613/orig
2025-12-04T09:33:41.3261341Z  * [new branch]              gh/desertfire/614/base      -> origin/gh/desertfire/614/base
2025-12-04T09:33:41.3261605Z  * [new branch]              gh/desertfire/614/head      -> origin/gh/desertfire/614/head
2025-12-04T09:33:41.3261955Z  * [new branch]              gh/desertfire/614/orig      -> origin/gh/desertfire/614/orig
2025-12-04T09:33:41.3262218Z  * [new branch]              gh/desertfire/615/base      -> origin/gh/desertfire/615/base
2025-12-04T09:33:41.3262505Z  * [new branch]              gh/desertfire/615/head      -> origin/gh/desertfire/615/head
2025-12-04T09:33:41.3263916Z  * [new branch]              gh/desertfire/615/orig      -> origin/gh/desertfire/615/orig
2025-12-04T09:33:41.3265726Z  * [new branch]              gh/desertfire/616/base      -> origin/gh/desertfire/616/base
2025-12-04T09:33:41.3267345Z  * [new branch]              gh/desertfire/616/head      -> origin/gh/desertfire/616/head
2025-12-04T09:33:41.3268854Z  * [new branch]              gh/desertfire/616/orig      -> origin/gh/desertfire/616/orig
2025-12-04T09:33:41.3270664Z  * [new branch]              gh/desertfire/617/base      -> origin/gh/desertfire/617/base
2025-12-04T09:33:41.3273948Z  * [new branch]              gh/desertfire/617/head      -> origin/gh/desertfire/617/head
2025-12-04T09:33:41.3275342Z  * [new branch]              gh/desertfire/617/orig      -> origin/gh/desertfire/617/orig
2025-12-04T09:33:41.3277718Z  * [new branch]              gh/dharakk/1/base           -> origin/gh/dharakk/1/base
2025-12-04T09:33:41.3279408Z  * [new branch]              gh/dharakk/1/head           -> origin/gh/dharakk/1/head
2025-12-04T09:33:41.3281716Z  * [new branch]              gh/drisspg/170/base         -> origin/gh/drisspg/170/base
2025-12-04T09:33:41.3283189Z  * [new branch]              gh/drisspg/170/head         -> origin/gh/drisspg/170/head
2025-12-04T09:33:41.3284671Z  * [new branch]              gh/drisspg/170/orig         -> origin/gh/drisspg/170/orig
2025-12-04T09:33:41.3286608Z  * [new branch]              gh/drisspg/182/base         -> origin/gh/drisspg/182/base
2025-12-04T09:33:41.3288251Z  * [new branch]              gh/drisspg/182/head         -> origin/gh/drisspg/182/head
2025-12-04T09:33:41.3290557Z  * [new branch]              gh/drisspg/183/base         -> origin/gh/drisspg/183/base
2025-12-04T09:33:41.3291948Z  * [new branch]              gh/drisspg/183/head         -> origin/gh/drisspg/183/head
2025-12-04T09:33:41.3294175Z  * [new branch]              gh/drisspg/184/base         -> origin/gh/drisspg/184/base
2025-12-04T09:33:41.3295582Z  * [new branch]              gh/drisspg/184/head         -> origin/gh/drisspg/184/head
2025-12-04T09:33:41.3297805Z  * [new branch]              gh/drisspg/185/base         -> origin/gh/drisspg/185/base
2025-12-04T09:33:41.3299286Z  * [new branch]              gh/drisspg/185/head         -> origin/gh/drisspg/185/head
2025-12-04T09:33:41.3301222Z  * [new branch]              gh/drisspg/194/base         -> origin/gh/drisspg/194/base
2025-12-04T09:33:41.3302763Z  * [new branch]              gh/drisspg/194/head         -> origin/gh/drisspg/194/head
2025-12-04T09:33:41.3304257Z  * [new branch]              gh/drisspg/194/orig         -> origin/gh/drisspg/194/orig
2025-12-04T09:33:41.3306156Z  * [new branch]              gh/drisspg/200/base         -> origin/gh/drisspg/200/base
2025-12-04T09:33:41.3307748Z  * [new branch]              gh/drisspg/200/head         -> origin/gh/drisspg/200/head
2025-12-04T09:33:41.3309446Z  * [new branch]              gh/drisspg/200/orig         -> origin/gh/drisspg/200/orig
2025-12-04T09:33:41.3311296Z  * [new branch]              gh/drisspg/218/base         -> origin/gh/drisspg/218/base
2025-12-04T09:33:41.3312817Z  * [new branch]              gh/drisspg/218/head         -> origin/gh/drisspg/218/head
2025-12-04T09:33:41.3314314Z  * [new branch]              gh/drisspg/218/orig         -> origin/gh/drisspg/218/orig
2025-12-04T09:33:41.3316222Z  * [new branch]              gh/drisspg/219/base         -> origin/gh/drisspg/219/base
2025-12-04T09:33:41.3317685Z  * [new branch]              gh/drisspg/219/head         -> origin/gh/drisspg/219/head
2025-12-04T09:33:41.3319220Z  * [new branch]              gh/drisspg/219/orig         -> origin/gh/drisspg/219/orig
2025-12-04T09:33:41.3321098Z  * [new branch]              gh/drisspg/220/base         -> origin/gh/drisspg/220/base
2025-12-04T09:33:41.3322582Z  * [new branch]              gh/drisspg/220/head         -> origin/gh/drisspg/220/head
2025-12-04T09:33:41.3324064Z  * [new branch]              gh/drisspg/220/orig         -> origin/gh/drisspg/220/orig
2025-12-04T09:33:41.3326048Z  * [new branch]              gh/drisspg/221/base         -> origin/gh/drisspg/221/base
2025-12-04T09:33:41.3327629Z  * [new branch]              gh/drisspg/221/head         -> origin/gh/drisspg/221/head
2025-12-04T09:33:41.3329079Z  * [new branch]              gh/drisspg/221/orig         -> origin/gh/drisspg/221/orig
2025-12-04T09:33:41.3330993Z  * [new branch]              gh/drisspg/222/base         -> origin/gh/drisspg/222/base
2025-12-04T09:33:41.3332511Z  * [new branch]              gh/drisspg/222/head         -> origin/gh/drisspg/222/head
2025-12-04T09:33:41.3334016Z  * [new branch]              gh/drisspg/222/orig         -> origin/gh/drisspg/222/orig
2025-12-04T09:33:41.3335946Z  * [new branch]              gh/drisspg/223/base         -> origin/gh/drisspg/223/base
2025-12-04T09:33:41.3337519Z  * [new branch]              gh/drisspg/223/head         -> origin/gh/drisspg/223/head
2025-12-04T09:33:41.3339058Z  * [new branch]              gh/drisspg/223/orig         -> origin/gh/drisspg/223/orig
2025-12-04T09:33:41.3341005Z  * [new branch]              gh/drisspg/224/base         -> origin/gh/drisspg/224/base
2025-12-04T09:33:41.3342485Z  * [new branch]              gh/drisspg/224/head         -> origin/gh/drisspg/224/head
2025-12-04T09:33:41.3343962Z  * [new branch]              gh/drisspg/224/orig         -> origin/gh/drisspg/224/orig
2025-12-04T09:33:41.3345905Z  * [new branch]              gh/drisspg/225/base         -> origin/gh/drisspg/225/base
2025-12-04T09:33:41.3347491Z  * [new branch]              gh/drisspg/225/head         -> origin/gh/drisspg/225/head
2025-12-04T09:33:41.3348993Z  * [new branch]              gh/drisspg/225/orig         -> origin/gh/drisspg/225/orig
2025-12-04T09:33:41.3350872Z  * [new branch]              gh/drisspg/226/base         -> origin/gh/drisspg/226/base
2025-12-04T09:33:41.3352300Z  * [new branch]              gh/drisspg/226/head         -> origin/gh/drisspg/226/head
2025-12-04T09:33:41.3353890Z  * [new branch]              gh/drisspg/226/orig         -> origin/gh/drisspg/226/orig
2025-12-04T09:33:41.3356378Z  * [new branch]              gh/drisspg/227/base         -> origin/gh/drisspg/227/base
2025-12-04T09:33:41.3357806Z  * [new branch]              gh/drisspg/227/head         -> origin/gh/drisspg/227/head
2025-12-04T09:33:41.3359323Z  * [new branch]              gh/drisspg/227/orig         -> origin/gh/drisspg/227/orig
2025-12-04T09:33:41.3361302Z  * [new branch]              gh/drisspg/228/base         -> origin/gh/drisspg/228/base
2025-12-04T09:33:41.3362809Z  * [new branch]              gh/drisspg/228/head         -> origin/gh/drisspg/228/head
2025-12-04T09:33:41.3364253Z  * [new branch]              gh/drisspg/228/orig         -> origin/gh/drisspg/228/orig
2025-12-04T09:33:41.3366194Z  * [new branch]              gh/drisspg/229/base         -> origin/gh/drisspg/229/base
2025-12-04T09:33:41.3367797Z  * [new branch]              gh/drisspg/229/head         -> origin/gh/drisspg/229/head
2025-12-04T09:33:41.3369365Z  * [new branch]              gh/drisspg/229/orig         -> origin/gh/drisspg/229/orig
2025-12-04T09:33:41.3371579Z  * [new branch]              gh/drisspg/230/base         -> origin/gh/drisspg/230/base
2025-12-04T09:33:41.3373102Z  * [new branch]              gh/drisspg/230/head         -> origin/gh/drisspg/230/head
2025-12-04T09:33:41.3374583Z  * [new branch]              gh/drisspg/230/orig         -> origin/gh/drisspg/230/orig
2025-12-04T09:33:41.3377019Z  * [new branch]              gh/dsjohns2/1/base          -> origin/gh/dsjohns2/1/base
2025-12-04T09:33:41.3378602Z  * [new branch]              gh/dsjohns2/1/head          -> origin/gh/dsjohns2/1/head
2025-12-04T09:33:41.3381090Z  * [new branch]              gh/dzmitry-huba/1/base      -> origin/gh/dzmitry-huba/1/base
2025-12-04T09:33:41.3382577Z  * [new branch]              gh/dzmitry-huba/1/head      -> origin/gh/dzmitry-huba/1/head
2025-12-04T09:33:41.3384829Z  * [new branch]              gh/dzmitry-huba/12/base     -> origin/gh/dzmitry-huba/12/base
2025-12-04T09:33:41.3386798Z  * [new branch]              gh/dzmitry-huba/12/head     -> origin/gh/dzmitry-huba/12/head
2025-12-04T09:33:41.3388072Z  * [new branch]              gh/dzmitry-huba/12/orig     -> origin/gh/dzmitry-huba/12/orig
2025-12-04T09:33:41.3390215Z  * [new branch]              gh/dzmitry-huba/13/base     -> origin/gh/dzmitry-huba/13/base
2025-12-04T09:33:41.3391742Z  * [new branch]              gh/dzmitry-huba/13/head     -> origin/gh/dzmitry-huba/13/head
2025-12-04T09:33:41.3393246Z  * [new branch]              gh/dzmitry-huba/13/orig     -> origin/gh/dzmitry-huba/13/orig
2025-12-04T09:33:41.3395146Z  * [new branch]              gh/dzmitry-huba/14/base     -> origin/gh/dzmitry-huba/14/base
2025-12-04T09:33:41.3396717Z  * [new branch]              gh/dzmitry-huba/14/head     -> origin/gh/dzmitry-huba/14/head
2025-12-04T09:33:41.3398215Z  * [new branch]              gh/dzmitry-huba/14/orig     -> origin/gh/dzmitry-huba/14/orig
2025-12-04T09:33:41.3400297Z  * [new branch]              gh/dzmitry-huba/15/base     -> origin/gh/dzmitry-huba/15/base
2025-12-04T09:33:41.3401784Z  * [new branch]              gh/dzmitry-huba/15/head     -> origin/gh/dzmitry-huba/15/head
2025-12-04T09:33:41.3403143Z  * [new branch]              gh/dzmitry-huba/15/orig     -> origin/gh/dzmitry-huba/15/orig
2025-12-04T09:33:41.3405263Z  * [new branch]              gh/dzmitry-huba/16/base     -> origin/gh/dzmitry-huba/16/base
2025-12-04T09:33:41.3406918Z  * [new branch]              gh/dzmitry-huba/16/head     -> origin/gh/dzmitry-huba/16/head
2025-12-04T09:33:41.3408583Z  * [new branch]              gh/dzmitry-huba/16/orig     -> origin/gh/dzmitry-huba/16/orig
2025-12-04T09:33:41.3410533Z  * [new branch]              gh/dzmitry-huba/17/base     -> origin/gh/dzmitry-huba/17/base
2025-12-04T09:33:41.3411991Z  * [new branch]              gh/dzmitry-huba/17/head     -> origin/gh/dzmitry-huba/17/head
2025-12-04T09:33:41.3413452Z  * [new branch]              gh/dzmitry-huba/17/orig     -> origin/gh/dzmitry-huba/17/orig
2025-12-04T09:33:41.3415194Z  * [new branch]              gh/dzmitry-huba/2/base      -> origin/gh/dzmitry-huba/2/base
2025-12-04T09:33:41.3416725Z  * [new branch]              gh/dzmitry-huba/2/head      -> origin/gh/dzmitry-huba/2/head
2025-12-04T09:33:41.3418559Z  * [new branch]              gh/dzmitry-huba/3/base      -> origin/gh/dzmitry-huba/3/base
2025-12-04T09:33:41.3419918Z  * [new branch]              gh/dzmitry-huba/3/head      -> origin/gh/dzmitry-huba/3/head
2025-12-04T09:33:41.3422340Z  * [new branch]              gh/eellison/808/base        -> origin/gh/eellison/808/base
2025-12-04T09:33:41.3423900Z  * [new branch]              gh/eellison/808/head        -> origin/gh/eellison/808/head
2025-12-04T09:33:41.3425479Z  * [new branch]              gh/eellison/808/orig        -> origin/gh/eellison/808/orig
2025-12-04T09:33:41.3427766Z  * [new branch]              gh/eellison/822/base        -> origin/gh/eellison/822/base
2025-12-04T09:33:41.3429402Z  * [new branch]              gh/eellison/822/head        -> origin/gh/eellison/822/head
2025-12-04T09:33:41.3430747Z  * [new branch]              gh/eellison/822/orig        -> origin/gh/eellison/822/orig
2025-12-04T09:33:41.3432686Z  * [new branch]              gh/eellison/823/base        -> origin/gh/eellison/823/base
2025-12-04T09:33:41.3434128Z  * [new branch]              gh/eellison/823/head        -> origin/gh/eellison/823/head
2025-12-04T09:33:41.3435602Z  * [new branch]              gh/eellison/823/orig        -> origin/gh/eellison/823/orig
2025-12-04T09:33:41.3437476Z  * [new branch]              gh/eellison/862/base        -> origin/gh/eellison/862/base
2025-12-04T09:33:41.3438931Z  * [new branch]              gh/eellison/862/head        -> origin/gh/eellison/862/head
2025-12-04T09:33:41.3440397Z  * [new branch]              gh/eellison/862/orig        -> origin/gh/eellison/862/orig
2025-12-04T09:33:41.3442294Z  * [new branch]              gh/eellison/863/base        -> origin/gh/eellison/863/base
2025-12-04T09:33:41.3443725Z  * [new branch]              gh/eellison/863/head        -> origin/gh/eellison/863/head
2025-12-04T09:33:41.3445342Z  * [new branch]              gh/eellison/863/orig        -> origin/gh/eellison/863/orig
2025-12-04T09:33:41.3447184Z  * [new branch]              gh/eellison/864/base        -> origin/gh/eellison/864/base
2025-12-04T09:33:41.3448726Z  * [new branch]              gh/eellison/864/head        -> origin/gh/eellison/864/head
2025-12-04T09:33:41.3450777Z  * [new branch]              gh/eellison/864/orig        -> origin/gh/eellison/864/orig
2025-12-04T09:33:41.3452564Z  * [new branch]              gh/eellison/865/base        -> origin/gh/eellison/865/base
2025-12-04T09:33:41.3453851Z  * [new branch]              gh/eellison/865/head        -> origin/gh/eellison/865/head
2025-12-04T09:33:41.3455452Z  * [new branch]              gh/eellison/865/orig        -> origin/gh/eellison/865/orig
2025-12-04T09:33:41.3457689Z  * [new branch]              gh/eellison/866/base        -> origin/gh/eellison/866/base
2025-12-04T09:33:41.3458828Z  * [new branch]              gh/eellison/866/head        -> origin/gh/eellison/866/head
2025-12-04T09:33:41.3460494Z  * [new branch]              gh/eellison/866/orig        -> origin/gh/eellison/866/orig
2025-12-04T09:33:41.3462672Z  * [new branch]              gh/eellison/867/base        -> origin/gh/eellison/867/base
2025-12-04T09:33:41.3463984Z  * [new branch]              gh/eellison/867/head        -> origin/gh/eellison/867/head
2025-12-04T09:33:41.3465608Z  * [new branch]              gh/eellison/867/orig        -> origin/gh/eellison/867/orig
2025-12-04T09:33:41.3467803Z  * [new branch]              gh/eellison/868/base        -> origin/gh/eellison/868/base
2025-12-04T09:33:41.3469680Z  * [new branch]              gh/eellison/868/head        -> origin/gh/eellison/868/head
2025-12-04T09:33:41.3471198Z  * [new branch]              gh/eellison/868/orig        -> origin/gh/eellison/868/orig
2025-12-04T09:33:41.3473454Z  * [new branch]              gh/eellison/869/base        -> origin/gh/eellison/869/base
2025-12-04T09:33:41.3474699Z  * [new branch]              gh/eellison/869/head        -> origin/gh/eellison/869/head
2025-12-04T09:33:41.3476249Z  * [new branch]              gh/eellison/869/orig        -> origin/gh/eellison/869/orig
2025-12-04T09:33:41.3478295Z  * [new branch]              gh/eellison/870/base        -> origin/gh/eellison/870/base
2025-12-04T09:33:41.3479595Z  * [new branch]              gh/eellison/870/head        -> origin/gh/eellison/870/head
2025-12-04T09:33:41.3480954Z  * [new branch]              gh/eellison/870/orig        -> origin/gh/eellison/870/orig
2025-12-04T09:33:41.3483206Z  * [new branch]              gh/eellison/871/base        -> origin/gh/eellison/871/base
2025-12-04T09:33:41.3484506Z  * [new branch]              gh/eellison/871/head        -> origin/gh/eellison/871/head
2025-12-04T09:33:41.3486797Z  * [new branch]              gh/eellison/871/orig        -> origin/gh/eellison/871/orig
2025-12-04T09:33:41.3489086Z  * [new branch]              gh/eellison/872/base        -> origin/gh/eellison/872/base
2025-12-04T09:33:41.3490240Z  * [new branch]              gh/eellison/872/head        -> origin/gh/eellison/872/head
2025-12-04T09:33:41.3491772Z  * [new branch]              gh/eellison/872/orig        -> origin/gh/eellison/872/orig
2025-12-04T09:33:41.3493952Z  * [new branch]              gh/eellison/873/base        -> origin/gh/eellison/873/base
2025-12-04T09:33:41.3495209Z  * [new branch]              gh/eellison/873/head        -> origin/gh/eellison/873/head
2025-12-04T09:33:41.3496897Z  * [new branch]              gh/eellison/873/orig        -> origin/gh/eellison/873/orig
2025-12-04T09:33:41.3498967Z  * [new branch]              gh/eellison/874/base        -> origin/gh/eellison/874/base
2025-12-04T09:33:41.3500566Z  * [new branch]              gh/eellison/874/head        -> origin/gh/eellison/874/head
2025-12-04T09:33:41.3501916Z  * [new branch]              gh/eellison/874/orig        -> origin/gh/eellison/874/orig
2025-12-04T09:33:41.3504557Z  * [new branch]              gh/eellison/875/base        -> origin/gh/eellison/875/base
2025-12-04T09:33:41.3506235Z  * [new branch]              gh/eellison/875/head        -> origin/gh/eellison/875/head
2025-12-04T09:33:41.3507827Z  * [new branch]              gh/eellison/875/orig        -> origin/gh/eellison/875/orig
2025-12-04T09:33:41.3510042Z  * [new branch]              gh/eellison/876/base        -> origin/gh/eellison/876/base
2025-12-04T09:33:41.3511583Z  * [new branch]              gh/eellison/876/head        -> origin/gh/eellison/876/head
2025-12-04T09:33:41.3512966Z  * [new branch]              gh/eellison/876/orig        -> origin/gh/eellison/876/orig
2025-12-04T09:33:41.3515232Z  * [new branch]              gh/eellison/877/base        -> origin/gh/eellison/877/base
2025-12-04T09:33:41.3516549Z  * [new branch]              gh/eellison/877/head        -> origin/gh/eellison/877/head
2025-12-04T09:33:41.3518116Z  * [new branch]              gh/eellison/877/orig        -> origin/gh/eellison/877/orig
2025-12-04T09:33:41.3520080Z  * [new branch]              gh/eellison/878/base        -> origin/gh/eellison/878/base
2025-12-04T09:33:41.3521966Z  * [new branch]              gh/eellison/878/head        -> origin/gh/eellison/878/head
2025-12-04T09:33:41.3522793Z  * [new branch]              gh/eellison/878/orig        -> origin/gh/eellison/878/orig
2025-12-04T09:33:41.3524936Z  * [new branch]              gh/eellison/879/base        -> origin/gh/eellison/879/base
2025-12-04T09:33:41.3526416Z  * [new branch]              gh/eellison/879/head        -> origin/gh/eellison/879/head
2025-12-04T09:33:41.3527996Z  * [new branch]              gh/eellison/879/orig        -> origin/gh/eellison/879/orig
2025-12-04T09:33:41.3529839Z  * [new branch]              gh/eellison/880/base        -> origin/gh/eellison/880/base
2025-12-04T09:33:41.3531339Z  * [new branch]              gh/eellison/880/head        -> origin/gh/eellison/880/head
2025-12-04T09:33:41.3532885Z  * [new branch]              gh/eellison/880/orig        -> origin/gh/eellison/880/orig
2025-12-04T09:33:41.3534978Z  * [new branch]              gh/eellison/881/base        -> origin/gh/eellison/881/base
2025-12-04T09:33:41.3536498Z  * [new branch]              gh/eellison/881/head        -> origin/gh/eellison/881/head
2025-12-04T09:33:41.3538027Z  * [new branch]              gh/eellison/881/orig        -> origin/gh/eellison/881/orig
2025-12-04T09:33:41.3539947Z  * [new branch]              gh/eellison/882/base        -> origin/gh/eellison/882/base
2025-12-04T09:33:41.3541420Z  * [new branch]              gh/eellison/882/head        -> origin/gh/eellison/882/head
2025-12-04T09:33:41.3543107Z  * [new branch]              gh/eellison/882/orig        -> origin/gh/eellison/882/orig
2025-12-04T09:33:41.3544996Z  * [new branch]              gh/eellison/883/base        -> origin/gh/eellison/883/base
2025-12-04T09:33:41.3546508Z  * [new branch]              gh/eellison/883/head        -> origin/gh/eellison/883/head
2025-12-04T09:33:41.3548137Z  * [new branch]              gh/eellison/883/orig        -> origin/gh/eellison/883/orig
2025-12-04T09:33:41.3549925Z  * [new branch]              gh/eellison/884/base        -> origin/gh/eellison/884/base
2025-12-04T09:33:41.3551387Z  * [new branch]              gh/eellison/884/head        -> origin/gh/eellison/884/head
2025-12-04T09:33:41.3552762Z  * [new branch]              gh/eellison/884/orig        -> origin/gh/eellison/884/orig
2025-12-04T09:33:41.3555164Z  * [new branch]              gh/etaf/147/base            -> origin/gh/etaf/147/base
2025-12-04T09:33:41.3556748Z  * [new branch]              gh/etaf/147/head            -> origin/gh/etaf/147/head
2025-12-04T09:33:41.3559002Z  * [new branch]              gh/etaf/154/base            -> origin/gh/etaf/154/base
2025-12-04T09:33:41.3560508Z  * [new branch]              gh/etaf/154/head            -> origin/gh/etaf/154/head
2025-12-04T09:33:41.3561970Z  * [new branch]              gh/etaf/154/orig            -> origin/gh/etaf/154/orig
2025-12-04T09:33:41.3564457Z  * [new branch]              gh/etaf/156/base            -> origin/gh/etaf/156/base
2025-12-04T09:33:41.3565908Z  * [new branch]              gh/etaf/156/head            -> origin/gh/etaf/156/head
2025-12-04T09:33:41.3567594Z  * [new branch]              gh/etaf/156/orig            -> origin/gh/etaf/156/orig
2025-12-04T09:33:41.3569782Z  * [new branch]              gh/etaf/157/base            -> origin/gh/etaf/157/base
2025-12-04T09:33:41.3571480Z  * [new branch]              gh/etaf/157/head            -> origin/gh/etaf/157/head
2025-12-04T09:33:41.3573042Z  * [new branch]              gh/etaf/157/orig            -> origin/gh/etaf/157/orig
2025-12-04T09:33:41.3575177Z  * [new branch]              gh/etaf/158/base            -> origin/gh/etaf/158/base
2025-12-04T09:33:41.3576884Z  * [new branch]              gh/etaf/158/head            -> origin/gh/etaf/158/head
2025-12-04T09:33:41.3578281Z  * [new branch]              gh/etaf/158/orig            -> origin/gh/etaf/158/orig
2025-12-04T09:33:41.3580273Z  * [new branch]              gh/etaf/159/base            -> origin/gh/etaf/159/base
2025-12-04T09:33:41.3581799Z  * [new branch]              gh/etaf/159/head            -> origin/gh/etaf/159/head
2025-12-04T09:33:41.3583239Z  * [new branch]              gh/etaf/159/orig            -> origin/gh/etaf/159/orig
2025-12-04T09:33:41.3585314Z  * [new branch]              gh/etaf/160/base            -> origin/gh/etaf/160/base
2025-12-04T09:33:41.3586870Z  * [new branch]              gh/etaf/160/head            -> origin/gh/etaf/160/head
2025-12-04T09:33:41.3588464Z  * [new branch]              gh/etaf/160/orig            -> origin/gh/etaf/160/orig
2025-12-04T09:33:41.3590354Z  * [new branch]              gh/etaf/161/base            -> origin/gh/etaf/161/base
2025-12-04T09:33:41.3591971Z  * [new branch]              gh/etaf/161/head            -> origin/gh/etaf/161/head
2025-12-04T09:33:41.3593434Z  * [new branch]              gh/etaf/161/orig            -> origin/gh/etaf/161/orig
2025-12-04T09:33:41.3595366Z  * [new branch]              gh/etaf/166/base            -> origin/gh/etaf/166/base
2025-12-04T09:33:41.3597065Z  * [new branch]              gh/etaf/166/head            -> origin/gh/etaf/166/head
2025-12-04T09:33:41.3598560Z  * [new branch]              gh/etaf/166/orig            -> origin/gh/etaf/166/orig
2025-12-04T09:33:41.3600407Z  * [new branch]              gh/etaf/167/base            -> origin/gh/etaf/167/base
2025-12-04T09:33:41.3601960Z  * [new branch]              gh/etaf/167/head            -> origin/gh/etaf/167/head
2025-12-04T09:33:41.3603409Z  * [new branch]              gh/etaf/167/orig            -> origin/gh/etaf/167/orig
2025-12-04T09:33:41.3605492Z  * [new branch]              gh/etaf/168/base            -> origin/gh/etaf/168/base
2025-12-04T09:33:41.3607120Z  * [new branch]              gh/etaf/168/head            -> origin/gh/etaf/168/head
2025-12-04T09:33:41.3608654Z  * [new branch]              gh/etaf/168/orig            -> origin/gh/etaf/168/orig
2025-12-04T09:33:41.3610883Z  * [new branch]              gh/etaf/172/base            -> origin/gh/etaf/172/base
2025-12-04T09:33:41.3612258Z  * [new branch]              gh/etaf/172/head            -> origin/gh/etaf/172/head
2025-12-04T09:33:41.3613797Z  * [new branch]              gh/etaf/172/orig            -> origin/gh/etaf/172/orig
2025-12-04T09:33:41.3615986Z  * [new branch]              gh/etaf/173/base            -> origin/gh/etaf/173/base
2025-12-04T09:33:41.3617739Z  * [new branch]              gh/etaf/173/head            -> origin/gh/etaf/173/head
2025-12-04T09:33:41.3619750Z  * [new branch]              gh/etaf/173/orig            -> origin/gh/etaf/173/orig
2025-12-04T09:33:41.3621839Z  * [new branch]              gh/etaf/174/base            -> origin/gh/etaf/174/base
2025-12-04T09:33:41.3623344Z  * [new branch]              gh/etaf/174/head            -> origin/gh/etaf/174/head
2025-12-04T09:33:41.3625305Z  * [new branch]              gh/etaf/175/base            -> origin/gh/etaf/175/base
2025-12-04T09:33:41.3626851Z  * [new branch]              gh/etaf/175/head            -> origin/gh/etaf/175/head
2025-12-04T09:33:41.3628201Z  * [new branch]              gh/etaf/175/orig            -> origin/gh/etaf/175/orig
2025-12-04T09:33:41.3630417Z  * [new branch]              gh/etaf/176/base            -> origin/gh/etaf/176/base
2025-12-04T09:33:41.3631978Z  * [new branch]              gh/etaf/176/head            -> origin/gh/etaf/176/head
2025-12-04T09:33:41.3633482Z  * [new branch]              gh/etaf/176/orig            -> origin/gh/etaf/176/orig
2025-12-04T09:33:41.3635935Z  * [new branch]              gh/etaf/177/base            -> origin/gh/etaf/177/base
2025-12-04T09:33:41.3637677Z  * [new branch]              gh/etaf/177/head            -> origin/gh/etaf/177/head
2025-12-04T09:33:41.3639202Z  * [new branch]              gh/etaf/177/orig            -> origin/gh/etaf/177/orig
2025-12-04T09:33:41.3641398Z  * [new branch]              gh/etaf/178/base            -> origin/gh/etaf/178/base
2025-12-04T09:33:41.3643079Z  * [new branch]              gh/etaf/178/head            -> origin/gh/etaf/178/head
2025-12-04T09:33:41.3644518Z  * [new branch]              gh/etaf/178/orig            -> origin/gh/etaf/178/orig
2025-12-04T09:33:41.3646572Z  * [new branch]              gh/etaf/179/base            -> origin/gh/etaf/179/base
2025-12-04T09:33:41.3648043Z  * [new branch]              gh/etaf/179/head            -> origin/gh/etaf/179/head
2025-12-04T09:33:41.3649491Z  * [new branch]              gh/etaf/179/orig            -> origin/gh/etaf/179/orig
2025-12-04T09:33:41.3651475Z  * [new branch]              gh/etaf/180/base            -> origin/gh/etaf/180/base
2025-12-04T09:33:41.3653000Z  * [new branch]              gh/etaf/180/head            -> origin/gh/etaf/180/head
2025-12-04T09:33:41.3654497Z  * [new branch]              gh/etaf/180/orig            -> origin/gh/etaf/180/orig
2025-12-04T09:33:41.3657520Z  * [new branch]              gh/exclamaforte/1/base      -> origin/gh/exclamaforte/1/base
2025-12-04T09:33:41.3658696Z  * [new branch]              gh/exclamaforte/1/head      -> origin/gh/exclamaforte/1/head
2025-12-04T09:33:41.3660600Z  * [new branch]              gh/exclamaforte/2/base      -> origin/gh/exclamaforte/2/base
2025-12-04T09:33:41.3661772Z  * [new branch]              gh/exclamaforte/2/head      -> origin/gh/exclamaforte/2/head
2025-12-04T09:33:41.3663738Z  * [new branch]              gh/exclamaforte/3/base      -> origin/gh/exclamaforte/3/base
2025-12-04T09:33:41.3665242Z  * [new branch]              gh/exclamaforte/3/head      -> origin/gh/exclamaforte/3/head
2025-12-04T09:33:41.3667200Z  * [new branch]              gh/exclamaforte/4/base      -> origin/gh/exclamaforte/4/base
2025-12-04T09:33:41.3668758Z  * [new branch]              gh/exclamaforte/4/head      -> origin/gh/exclamaforte/4/head
2025-12-04T09:33:41.3671322Z  * [new branch]              gh/ezyang/2374/base         -> origin/gh/ezyang/2374/base
2025-12-04T09:33:41.3672941Z  * [new branch]              gh/ezyang/2374/head         -> origin/gh/ezyang/2374/head
2025-12-04T09:33:41.3674582Z  * [new branch]              gh/ezyang/2374/orig         -> origin/gh/ezyang/2374/orig
2025-12-04T09:33:41.3676483Z  * [new branch]              gh/ezyang/2973/base         -> origin/gh/ezyang/2973/base
2025-12-04T09:33:41.3677949Z  * [new branch]              gh/ezyang/2973/head         -> origin/gh/ezyang/2973/head
2025-12-04T09:33:41.3679502Z  * [new branch]              gh/ezyang/2973/orig         -> origin/gh/ezyang/2973/orig
2025-12-04T09:33:41.3681392Z  * [new branch]              gh/ezyang/2974/base         -> origin/gh/ezyang/2974/base
2025-12-04T09:33:41.3682861Z  * [new branch]              gh/ezyang/2974/head         -> origin/gh/ezyang/2974/head
2025-12-04T09:33:41.3684523Z  * [new branch]              gh/ezyang/2974/orig         -> origin/gh/ezyang/2974/orig
2025-12-04T09:33:41.3686396Z  * [new branch]              gh/ezyang/3131/base         -> origin/gh/ezyang/3131/base
2025-12-04T09:33:41.3688078Z  * [new branch]              gh/ezyang/3131/head         -> origin/gh/ezyang/3131/head
2025-12-04T09:33:41.3689525Z  * [new branch]              gh/ezyang/3131/orig         -> origin/gh/ezyang/3131/orig
2025-12-04T09:33:41.3691450Z  * [new branch]              gh/ezyang/3139/base         -> origin/gh/ezyang/3139/base
2025-12-04T09:33:41.3692891Z  * [new branch]              gh/ezyang/3139/head         -> origin/gh/ezyang/3139/head
2025-12-04T09:33:41.3694373Z  * [new branch]              gh/ezyang/3139/orig         -> origin/gh/ezyang/3139/orig
2025-12-04T09:33:41.3696376Z  * [new branch]              gh/ezyang/3140/base         -> origin/gh/ezyang/3140/base
2025-12-04T09:33:41.3697860Z  * [new branch]              gh/ezyang/3140/head         -> origin/gh/ezyang/3140/head
2025-12-04T09:33:41.3699388Z  * [new branch]              gh/ezyang/3140/orig         -> origin/gh/ezyang/3140/orig
2025-12-04T09:33:41.3701307Z  * [new branch]              gh/ezyang/3143/base         -> origin/gh/ezyang/3143/base
2025-12-04T09:33:41.3702749Z  * [new branch]              gh/ezyang/3143/head         -> origin/gh/ezyang/3143/head
2025-12-04T09:33:41.3704208Z  * [new branch]              gh/ezyang/3143/orig         -> origin/gh/ezyang/3143/orig
2025-12-04T09:33:41.3706203Z  * [new branch]              gh/ezyang/3144/base         -> origin/gh/ezyang/3144/base
2025-12-04T09:33:41.3707977Z  * [new branch]              gh/ezyang/3144/head         -> origin/gh/ezyang/3144/head
2025-12-04T09:33:41.3709362Z  * [new branch]              gh/ezyang/3144/orig         -> origin/gh/ezyang/3144/orig
2025-12-04T09:33:41.3711292Z  * [new branch]              gh/ezyang/3167/base         -> origin/gh/ezyang/3167/base
2025-12-04T09:33:41.3712730Z  * [new branch]              gh/ezyang/3167/head         -> origin/gh/ezyang/3167/head
2025-12-04T09:33:41.3714240Z  * [new branch]              gh/ezyang/3167/orig         -> origin/gh/ezyang/3167/orig
2025-12-04T09:33:41.3716176Z  * [new branch]              gh/ezyang/3173/base         -> origin/gh/ezyang/3173/base
2025-12-04T09:33:41.3717632Z  * [new branch]              gh/ezyang/3173/head         -> origin/gh/ezyang/3173/head
2025-12-04T09:33:41.3719268Z  * [new branch]              gh/ezyang/3173/orig         -> origin/gh/ezyang/3173/orig
2025-12-04T09:33:41.3721136Z  * [new branch]              gh/ezyang/3175/base         -> origin/gh/ezyang/3175/base
2025-12-04T09:33:41.3722591Z  * [new branch]              gh/ezyang/3175/head         -> origin/gh/ezyang/3175/head
2025-12-04T09:33:41.3724047Z  * [new branch]              gh/ezyang/3175/orig         -> origin/gh/ezyang/3175/orig
2025-12-04T09:33:41.3726006Z  * [new branch]              gh/ezyang/3182/base         -> origin/gh/ezyang/3182/base
2025-12-04T09:33:41.3727658Z  * [new branch]              gh/ezyang/3182/head         -> origin/gh/ezyang/3182/head
2025-12-04T09:33:41.3729155Z  * [new branch]              gh/ezyang/3182/orig         -> origin/gh/ezyang/3182/orig
2025-12-04T09:33:41.3731130Z  * [new branch]              gh/ezyang/3185/base         -> origin/gh/ezyang/3185/base
2025-12-04T09:33:41.3732697Z  * [new branch]              gh/ezyang/3185/head         -> origin/gh/ezyang/3185/head
2025-12-04T09:33:41.3734076Z  * [new branch]              gh/ezyang/3185/orig         -> origin/gh/ezyang/3185/orig
2025-12-04T09:33:41.3735998Z  * [new branch]              gh/ezyang/3189/base         -> origin/gh/ezyang/3189/base
2025-12-04T09:33:41.3737611Z  * [new branch]              gh/ezyang/3189/head         -> origin/gh/ezyang/3189/head
2025-12-04T09:33:41.3739081Z  * [new branch]              gh/ezyang/3189/orig         -> origin/gh/ezyang/3189/orig
2025-12-04T09:33:41.3740993Z  * [new branch]              gh/ezyang/3191/base         -> origin/gh/ezyang/3191/base
2025-12-04T09:33:41.3742478Z  * [new branch]              gh/ezyang/3191/head         -> origin/gh/ezyang/3191/head
2025-12-04T09:33:41.3743987Z  * [new branch]              gh/ezyang/3191/orig         -> origin/gh/ezyang/3191/orig
2025-12-04T09:33:41.3746491Z  * [new branch]              gh/ezyang/3192/base         -> origin/gh/ezyang/3192/base
2025-12-04T09:33:41.3748050Z  * [new branch]              gh/ezyang/3192/head         -> origin/gh/ezyang/3192/head
2025-12-04T09:33:41.3749645Z  * [new branch]              gh/ezyang/3192/orig         -> origin/gh/ezyang/3192/orig
2025-12-04T09:33:41.3751714Z  * [new branch]              gh/ezyang/3193/base         -> origin/gh/ezyang/3193/base
2025-12-04T09:33:41.3753248Z  * [new branch]              gh/ezyang/3193/head         -> origin/gh/ezyang/3193/head
2025-12-04T09:33:41.3755503Z  * [new branch]              gh/ezyang/3193/orig         -> origin/gh/ezyang/3193/orig
2025-12-04T09:33:41.3757505Z  * [new branch]              gh/ezyang/3194/base         -> origin/gh/ezyang/3194/base
2025-12-04T09:33:41.3758997Z  * [new branch]              gh/ezyang/3194/head         -> origin/gh/ezyang/3194/head
2025-12-04T09:33:41.3760476Z  * [new branch]              gh/ezyang/3194/orig         -> origin/gh/ezyang/3194/orig
2025-12-04T09:33:41.3762387Z  * [new branch]              gh/ezyang/3195/base         -> origin/gh/ezyang/3195/base
2025-12-04T09:33:41.3764264Z  * [new branch]              gh/ezyang/3195/head         -> origin/gh/ezyang/3195/head
2025-12-04T09:33:41.3765758Z  * [new branch]              gh/ezyang/3195/orig         -> origin/gh/ezyang/3195/orig
2025-12-04T09:33:41.3767760Z  * [new branch]              gh/ezyang/3196/base         -> origin/gh/ezyang/3196/base
2025-12-04T09:33:41.3769355Z  * [new branch]              gh/ezyang/3196/head         -> origin/gh/ezyang/3196/head
2025-12-04T09:33:41.3770903Z  * [new branch]              gh/ezyang/3196/orig         -> origin/gh/ezyang/3196/orig
2025-12-04T09:33:41.3777416Z  * [new branch]              gh/ezyang/3197/base         -> origin/gh/ezyang/3197/base
2025-12-04T09:33:41.3778829Z  * [new branch]              gh/ezyang/3197/head         -> origin/gh/ezyang/3197/head
2025-12-04T09:33:41.3780351Z  * [new branch]              gh/ezyang/3197/orig         -> origin/gh/ezyang/3197/orig
2025-12-04T09:33:41.3782378Z  * [new branch]              gh/ezyang/3198/base         -> origin/gh/ezyang/3198/base
2025-12-04T09:33:41.3783895Z  * [new branch]              gh/ezyang/3198/head         -> origin/gh/ezyang/3198/head
2025-12-04T09:33:41.3785465Z  * [new branch]              gh/ezyang/3198/orig         -> origin/gh/ezyang/3198/orig
2025-12-04T09:33:41.3787462Z  * [new branch]              gh/ezyang/3199/base         -> origin/gh/ezyang/3199/base
2025-12-04T09:33:41.3789017Z  * [new branch]              gh/ezyang/3199/head         -> origin/gh/ezyang/3199/head
2025-12-04T09:33:41.3790496Z  * [new branch]              gh/ezyang/3199/orig         -> origin/gh/ezyang/3199/orig
2025-12-04T09:33:41.3792516Z  * [new branch]              gh/ezyang/3200/base         -> origin/gh/ezyang/3200/base
2025-12-04T09:33:41.3794119Z  * [new branch]              gh/ezyang/3200/head         -> origin/gh/ezyang/3200/head
2025-12-04T09:33:41.3795682Z  * [new branch]              gh/ezyang/3200/orig         -> origin/gh/ezyang/3200/orig
2025-12-04T09:33:41.3797666Z  * [new branch]              gh/ezyang/3201/base         -> origin/gh/ezyang/3201/base
2025-12-04T09:33:41.3799319Z  * [new branch]              gh/ezyang/3201/head         -> origin/gh/ezyang/3201/head
2025-12-04T09:33:41.3800684Z  * [new branch]              gh/ezyang/3201/orig         -> origin/gh/ezyang/3201/orig
2025-12-04T09:33:41.3802641Z  * [new branch]              gh/ezyang/3202/base         -> origin/gh/ezyang/3202/base
2025-12-04T09:33:41.3804082Z  * [new branch]              gh/ezyang/3202/head         -> origin/gh/ezyang/3202/head
2025-12-04T09:33:41.3805598Z  * [new branch]              gh/ezyang/3202/orig         -> origin/gh/ezyang/3202/orig
2025-12-04T09:33:41.3807570Z  * [new branch]              gh/ezyang/3203/base         -> origin/gh/ezyang/3203/base
2025-12-04T09:33:41.3809058Z  * [new branch]              gh/ezyang/3203/head         -> origin/gh/ezyang/3203/head
2025-12-04T09:33:41.3810704Z  * [new branch]              gh/ezyang/3203/orig         -> origin/gh/ezyang/3203/orig
2025-12-04T09:33:41.3812692Z  * [new branch]              gh/ezyang/3204/base         -> origin/gh/ezyang/3204/base
2025-12-04T09:33:41.3814329Z  * [new branch]              gh/ezyang/3204/head         -> origin/gh/ezyang/3204/head
2025-12-04T09:33:41.3815839Z  * [new branch]              gh/ezyang/3204/orig         -> origin/gh/ezyang/3204/orig
2025-12-04T09:33:41.3818026Z  * [new branch]              gh/ezyang/3205/base         -> origin/gh/ezyang/3205/base
2025-12-04T09:33:41.3819481Z  * [new branch]              gh/ezyang/3205/head         -> origin/gh/ezyang/3205/head
2025-12-04T09:33:41.3820978Z  * [new branch]              gh/ezyang/3205/orig         -> origin/gh/ezyang/3205/orig
2025-12-04T09:33:41.3822912Z  * [new branch]              gh/ezyang/3206/base         -> origin/gh/ezyang/3206/base
2025-12-04T09:33:41.3824367Z  * [new branch]              gh/ezyang/3206/head         -> origin/gh/ezyang/3206/head
2025-12-04T09:33:41.3825884Z  * [new branch]              gh/ezyang/3206/orig         -> origin/gh/ezyang/3206/orig
2025-12-04T09:33:41.3827871Z  * [new branch]              gh/ezyang/3207/base         -> origin/gh/ezyang/3207/base
2025-12-04T09:33:41.3829354Z  * [new branch]              gh/ezyang/3207/head         -> origin/gh/ezyang/3207/head
2025-12-04T09:33:41.3830853Z  * [new branch]              gh/ezyang/3207/orig         -> origin/gh/ezyang/3207/orig
2025-12-04T09:33:41.3832876Z  * [new branch]              gh/ezyang/3208/base         -> origin/gh/ezyang/3208/base
2025-12-04T09:33:41.3834531Z  * [new branch]              gh/ezyang/3208/head         -> origin/gh/ezyang/3208/head
2025-12-04T09:33:41.3836023Z  * [new branch]              gh/ezyang/3208/orig         -> origin/gh/ezyang/3208/orig
2025-12-04T09:33:41.3837990Z  * [new branch]              gh/ezyang/3209/base         -> origin/gh/ezyang/3209/base
2025-12-04T09:33:41.3839649Z  * [new branch]              gh/ezyang/3209/head         -> origin/gh/ezyang/3209/head
2025-12-04T09:33:41.3841120Z  * [new branch]              gh/ezyang/3209/orig         -> origin/gh/ezyang/3209/orig
2025-12-04T09:33:41.3843480Z  * [new branch]              gh/fadara01/3/base          -> origin/gh/fadara01/3/base
2025-12-04T09:33:41.3845020Z  * [new branch]              gh/fadara01/3/head          -> origin/gh/fadara01/3/head
2025-12-04T09:33:41.3846517Z  * [new branch]              gh/fadara01/3/orig          -> origin/gh/fadara01/3/orig
2025-12-04T09:33:41.3848616Z  * [new branch]              gh/fadara01/5/base          -> origin/gh/fadara01/5/base
2025-12-04T09:33:41.3850107Z  * [new branch]              gh/fadara01/5/head          -> origin/gh/fadara01/5/head
2025-12-04T09:33:41.3851677Z  * [new branch]              gh/fadara01/5/orig          -> origin/gh/fadara01/5/orig
2025-12-04T09:33:41.3853647Z  * [new branch]              gh/fadara01/6/base          -> origin/gh/fadara01/6/base
2025-12-04T09:33:41.3855142Z  * [new branch]              gh/fadara01/6/head          -> origin/gh/fadara01/6/head
2025-12-04T09:33:41.3856706Z  * [new branch]              gh/fadara01/6/orig          -> origin/gh/fadara01/6/orig
2025-12-04T09:33:41.3858845Z  * [new branch]              gh/fadara01/7/base          -> origin/gh/fadara01/7/base
2025-12-04T09:33:41.3860165Z  * [new branch]              gh/fadara01/7/head          -> origin/gh/fadara01/7/head
2025-12-04T09:33:41.3861729Z  * [new branch]              gh/fadara01/7/orig          -> origin/gh/fadara01/7/orig
2025-12-04T09:33:41.3863681Z  * [new branch]              gh/fadara01/8/base          -> origin/gh/fadara01/8/base
2025-12-04T09:33:41.3865160Z  * [new branch]              gh/fadara01/8/head          -> origin/gh/fadara01/8/head
2025-12-04T09:33:41.3866661Z  * [new branch]              gh/fadara01/8/orig          -> origin/gh/fadara01/8/orig
2025-12-04T09:33:41.3868581Z  * [new branch]              gh/fadara01/9/base          -> origin/gh/fadara01/9/base
2025-12-04T09:33:41.3870101Z  * [new branch]              gh/fadara01/9/head          -> origin/gh/fadara01/9/head
2025-12-04T09:33:41.3871813Z  * [new branch]              gh/fadara01/9/orig          -> origin/gh/fadara01/9/orig
2025-12-04T09:33:41.3874250Z  * [new branch]              gh/fduwjj/182/base          -> origin/gh/fduwjj/182/base
2025-12-04T09:33:41.3875776Z  * [new branch]              gh/fduwjj/182/head          -> origin/gh/fduwjj/182/head
2025-12-04T09:33:41.3877223Z  * [new branch]              gh/fduwjj/182/orig          -> origin/gh/fduwjj/182/orig
2025-12-04T09:33:41.3879274Z  * [new branch]              gh/fduwjj/211/base          -> origin/gh/fduwjj/211/base
2025-12-04T09:33:41.3880847Z  * [new branch]              gh/fduwjj/211/head          -> origin/gh/fduwjj/211/head
2025-12-04T09:33:41.3882377Z  * [new branch]              gh/fduwjj/211/orig          -> origin/gh/fduwjj/211/orig
2025-12-04T09:33:41.3884330Z  * [new branch]              gh/fduwjj/212/base          -> origin/gh/fduwjj/212/base
2025-12-04T09:33:41.3885802Z  * [new branch]              gh/fduwjj/212/head          -> origin/gh/fduwjj/212/head
2025-12-04T09:33:41.3887524Z  * [new branch]              gh/fduwjj/212/orig          -> origin/gh/fduwjj/212/orig
2025-12-04T09:33:41.3889253Z  * [new branch]              gh/fduwjj/213/base          -> origin/gh/fduwjj/213/base
2025-12-04T09:33:41.3890731Z  * [new branch]              gh/fduwjj/213/head          -> origin/gh/fduwjj/213/head
2025-12-04T09:33:41.3892276Z  * [new branch]              gh/fduwjj/213/orig          -> origin/gh/fduwjj/213/orig
2025-12-04T09:33:41.3894465Z  * [new branch]              gh/fduwjj/226/base          -> origin/gh/fduwjj/226/base
2025-12-04T09:33:41.3895850Z  * [new branch]              gh/fduwjj/226/head          -> origin/gh/fduwjj/226/head
2025-12-04T09:33:41.3897389Z  * [new branch]              gh/fduwjj/226/orig          -> origin/gh/fduwjj/226/orig
2025-12-04T09:33:41.3899548Z  * [new branch]              gh/fduwjj/229/base          -> origin/gh/fduwjj/229/base
2025-12-04T09:33:41.3900984Z  * [new branch]              gh/fduwjj/229/head          -> origin/gh/fduwjj/229/head
2025-12-04T09:33:41.3902429Z  * [new branch]              gh/fduwjj/229/orig          -> origin/gh/fduwjj/229/orig
2025-12-04T09:33:41.3904381Z  * [new branch]              gh/fduwjj/233/base          -> origin/gh/fduwjj/233/base
2025-12-04T09:33:41.3905892Z  * [new branch]              gh/fduwjj/233/head          -> origin/gh/fduwjj/233/head
2025-12-04T09:33:41.3907351Z  * [new branch]              gh/fduwjj/233/orig          -> origin/gh/fduwjj/233/orig
2025-12-04T09:33:41.3909330Z  * [new branch]              gh/fduwjj/234/base          -> origin/gh/fduwjj/234/base
2025-12-04T09:33:41.3910843Z  * [new branch]              gh/fduwjj/234/head          -> origin/gh/fduwjj/234/head
2025-12-04T09:33:41.3912305Z  * [new branch]              gh/fduwjj/234/orig          -> origin/gh/fduwjj/234/orig
2025-12-04T09:33:41.3914351Z  * [new branch]              gh/fduwjj/235/base          -> origin/gh/fduwjj/235/base
2025-12-04T09:33:41.3915853Z  * [new branch]              gh/fduwjj/235/head          -> origin/gh/fduwjj/235/head
2025-12-04T09:33:41.3917338Z  * [new branch]              gh/fduwjj/235/orig          -> origin/gh/fduwjj/235/orig
2025-12-04T09:33:41.3919319Z  * [new branch]              gh/fduwjj/236/base          -> origin/gh/fduwjj/236/base
2025-12-04T09:33:41.3920633Z  * [new branch]              gh/fduwjj/236/head          -> origin/gh/fduwjj/236/head
2025-12-04T09:33:41.3922144Z  * [new branch]              gh/fduwjj/236/orig          -> origin/gh/fduwjj/236/orig
2025-12-04T09:33:41.3923880Z  * [new branch]              gh/fduwjj/237/base          -> origin/gh/fduwjj/237/base
2025-12-04T09:33:41.3925344Z  * [new branch]              gh/fduwjj/237/head          -> origin/gh/fduwjj/237/head
2025-12-04T09:33:41.3926787Z  * [new branch]              gh/fduwjj/237/orig          -> origin/gh/fduwjj/237/orig
2025-12-04T09:33:41.3928730Z  * [new branch]              gh/fduwjj/238/base          -> origin/gh/fduwjj/238/base
2025-12-04T09:33:41.3930366Z  * [new branch]              gh/fduwjj/238/head          -> origin/gh/fduwjj/238/head
2025-12-04T09:33:41.3931781Z  * [new branch]              gh/fduwjj/238/orig          -> origin/gh/fduwjj/238/orig
2025-12-04T09:33:41.3933900Z  * [new branch]              gh/fduwjj/239/base          -> origin/gh/fduwjj/239/base
2025-12-04T09:33:41.3935457Z  * [new branch]              gh/fduwjj/239/head          -> origin/gh/fduwjj/239/head
2025-12-04T09:33:41.3936974Z  * [new branch]              gh/fduwjj/239/orig          -> origin/gh/fduwjj/239/orig
2025-12-04T09:33:41.3939943Z  * [new branch]              gh/fegin/332/base           -> origin/gh/fegin/332/base
2025-12-04T09:33:41.3941426Z  * [new branch]              gh/fegin/332/head           -> origin/gh/fegin/332/head
2025-12-04T09:33:41.3942999Z  * [new branch]              gh/fegin/332/orig           -> origin/gh/fegin/332/orig
2025-12-04T09:33:41.3944968Z  * [new branch]              gh/fegin/333/base           -> origin/gh/fegin/333/base
2025-12-04T09:33:41.3946597Z  * [new branch]              gh/fegin/333/head           -> origin/gh/fegin/333/head
2025-12-04T09:33:41.3948036Z  * [new branch]              gh/fegin/333/orig           -> origin/gh/fegin/333/orig
2025-12-04T09:33:41.3949973Z  * [new branch]              gh/fegin/334/base           -> origin/gh/fegin/334/base
2025-12-04T09:33:41.3951396Z  * [new branch]              gh/fegin/334/head           -> origin/gh/fegin/334/head
2025-12-04T09:33:41.3953656Z  * [new branch]              gh/fegin/334/orig           -> origin/gh/fegin/334/orig
2025-12-04T09:33:41.3955625Z  * [new branch]              gh/fegin/335/base           -> origin/gh/fegin/335/base
2025-12-04T09:33:41.3957076Z  * [new branch]              gh/fegin/335/head           -> origin/gh/fegin/335/head
2025-12-04T09:33:41.3958597Z  * [new branch]              gh/fegin/335/orig           -> origin/gh/fegin/335/orig
2025-12-04T09:33:41.3960937Z  * [new branch]              gh/fffrog/160/base          -> origin/gh/fffrog/160/base
2025-12-04T09:33:41.3962391Z  * [new branch]              gh/fffrog/160/head          -> origin/gh/fffrog/160/head
2025-12-04T09:33:41.3964393Z  * [new branch]              gh/fffrog/177/base          -> origin/gh/fffrog/177/base
2025-12-04T09:33:41.3965767Z  * [new branch]              gh/fffrog/177/head          -> origin/gh/fffrog/177/head
2025-12-04T09:33:41.3967303Z  * [new branch]              gh/fffrog/177/orig          -> origin/gh/fffrog/177/orig
2025-12-04T09:33:41.3969231Z  * [new branch]              gh/fffrog/178/base          -> origin/gh/fffrog/178/base
2025-12-04T09:33:41.3970692Z  * [new branch]              gh/fffrog/178/head          -> origin/gh/fffrog/178/head
2025-12-04T09:33:41.3972549Z  * [new branch]              gh/fffrog/178/orig          -> origin/gh/fffrog/178/orig
2025-12-04T09:33:41.3974457Z  * [new branch]              gh/fffrog/181/base          -> origin/gh/fffrog/181/base
2025-12-04T09:33:41.3975946Z  * [new branch]              gh/fffrog/181/head          -> origin/gh/fffrog/181/head
2025-12-04T09:33:41.3977600Z  * [new branch]              gh/fffrog/181/orig          -> origin/gh/fffrog/181/orig
2025-12-04T09:33:41.3979637Z  * [new branch]              gh/fffrog/183/base          -> origin/gh/fffrog/183/base
2025-12-04T09:33:41.3981491Z  * [new branch]              gh/fffrog/183/head          -> origin/gh/fffrog/183/head
2025-12-04T09:33:41.3983036Z  * [new branch]              gh/fffrog/183/orig          -> origin/gh/fffrog/183/orig
2025-12-04T09:33:41.3985467Z  * [new branch]              gh/fxdawnn/10/base          -> origin/gh/fxdawnn/10/base
2025-12-04T09:33:41.3987166Z  * [new branch]              gh/fxdawnn/10/head          -> origin/gh/fxdawnn/10/head
2025-12-04T09:33:41.3988559Z  * [new branch]              gh/fxdawnn/10/orig          -> origin/gh/fxdawnn/10/orig
2025-12-04T09:33:41.3991005Z  * [new branch]              gh/fxdawnn/11/base          -> origin/gh/fxdawnn/11/base
2025-12-04T09:33:41.3992152Z  * [new branch]              gh/fxdawnn/11/head          -> origin/gh/fxdawnn/11/head
2025-12-04T09:33:41.3993768Z  * [new branch]              gh/fxdawnn/11/orig          -> origin/gh/fxdawnn/11/orig
2025-12-04T09:33:41.3995814Z  * [new branch]              gh/fxdawnn/12/base          -> origin/gh/fxdawnn/12/base
2025-12-04T09:33:41.3997164Z  * [new branch]              gh/fxdawnn/12/head          -> origin/gh/fxdawnn/12/head
2025-12-04T09:33:41.3998655Z  * [new branch]              gh/fxdawnn/12/orig          -> origin/gh/fxdawnn/12/orig
2025-12-04T09:33:41.4000582Z  * [new branch]              gh/fxdawnn/13/base          -> origin/gh/fxdawnn/13/base
2025-12-04T09:33:41.4002018Z  * [new branch]              gh/fxdawnn/13/head          -> origin/gh/fxdawnn/13/head
2025-12-04T09:33:41.4003622Z  * [new branch]              gh/fxdawnn/13/orig          -> origin/gh/fxdawnn/13/orig
2025-12-04T09:33:41.4005788Z  * [new branch]              gh/fxdawnn/14/base          -> origin/gh/fxdawnn/14/base
2025-12-04T09:33:41.4007149Z  * [new branch]              gh/fxdawnn/14/head          -> origin/gh/fxdawnn/14/head
2025-12-04T09:33:41.4008594Z  * [new branch]              gh/fxdawnn/14/orig          -> origin/gh/fxdawnn/14/orig
2025-12-04T09:33:41.4010585Z  * [new branch]              gh/fxdawnn/15/base          -> origin/gh/fxdawnn/15/base
2025-12-04T09:33:41.4012171Z  * [new branch]              gh/fxdawnn/15/head          -> origin/gh/fxdawnn/15/head
2025-12-04T09:33:41.4013622Z  * [new branch]              gh/fxdawnn/15/orig          -> origin/gh/fxdawnn/15/orig
2025-12-04T09:33:41.4015587Z  * [new branch]              gh/fxdawnn/6/base           -> origin/gh/fxdawnn/6/base
2025-12-04T09:33:41.4017237Z  * [new branch]              gh/fxdawnn/6/head           -> origin/gh/fxdawnn/6/head
2025-12-04T09:33:41.4018677Z  * [new branch]              gh/fxdawnn/6/orig           -> origin/gh/fxdawnn/6/orig
2025-12-04T09:33:41.4020655Z  * [new branch]              gh/fxdawnn/7/base           -> origin/gh/fxdawnn/7/base
2025-12-04T09:33:41.4022226Z  * [new branch]              gh/fxdawnn/7/head           -> origin/gh/fxdawnn/7/head
2025-12-04T09:33:41.4023648Z  * [new branch]              gh/fxdawnn/7/orig           -> origin/gh/fxdawnn/7/orig
2025-12-04T09:33:41.4025681Z  * [new branch]              gh/fxdawnn/9/base           -> origin/gh/fxdawnn/9/base
2025-12-04T09:33:41.4027070Z  * [new branch]              gh/fxdawnn/9/head           -> origin/gh/fxdawnn/9/head
2025-12-04T09:33:41.4028925Z  * [new branch]              gh/fxdawnn/9/orig           -> origin/gh/fxdawnn/9/orig
2025-12-04T09:33:41.4031426Z  * [new branch]              gh/galv/1/base              -> origin/gh/galv/1/base
2025-12-04T09:33:41.4032916Z  * [new branch]              gh/galv/1/head              -> origin/gh/galv/1/head
2025-12-04T09:33:41.4034451Z  * [new branch]              gh/galv/1/orig              -> origin/gh/galv/1/orig
2025-12-04T09:33:41.4036426Z  * [new branch]              gh/galv/2/base              -> origin/gh/galv/2/base
2025-12-04T09:33:41.4037855Z  * [new branch]              gh/galv/2/head              -> origin/gh/galv/2/head
2025-12-04T09:33:41.4040015Z  * [new branch]              gh/galv/2/orig              -> origin/gh/galv/2/orig
2025-12-04T09:33:41.4042608Z  * [new branch]              gh/galv/3/base              -> origin/gh/galv/3/base
2025-12-04T09:33:41.4043230Z  * [new branch]              gh/galv/3/head              -> origin/gh/galv/3/head
2025-12-04T09:33:41.4044979Z  * [new branch]              gh/galv/3/orig              -> origin/gh/galv/3/orig
2025-12-04T09:33:41.4047302Z  * [new branch]              gh/guangyey/134/base        -> origin/gh/guangyey/134/base
2025-12-04T09:33:41.4048840Z  * [new branch]              gh/guangyey/134/head        -> origin/gh/guangyey/134/head
2025-12-04T09:33:41.4050860Z  * [new branch]              gh/guangyey/134/orig        -> origin/gh/guangyey/134/orig
2025-12-04T09:33:41.4053011Z  * [new branch]              gh/guangyey/163/base        -> origin/gh/guangyey/163/base
2025-12-04T09:33:41.4054439Z  * [new branch]              gh/guangyey/163/head        -> origin/gh/guangyey/163/head
2025-12-04T09:33:41.4055897Z  * [new branch]              gh/guangyey/163/orig        -> origin/gh/guangyey/163/orig
2025-12-04T09:33:41.4057970Z  * [new branch]              gh/guangyey/168/base        -> origin/gh/guangyey/168/base
2025-12-04T09:33:41.4059424Z  * [new branch]              gh/guangyey/168/head        -> origin/gh/guangyey/168/head
2025-12-04T09:33:41.4060905Z  * [new branch]              gh/guangyey/168/orig        -> origin/gh/guangyey/168/orig
2025-12-04T09:33:41.4062834Z  * [new branch]              gh/guangyey/169/base        -> origin/gh/guangyey/169/base
2025-12-04T09:33:41.4064316Z  * [new branch]              gh/guangyey/169/head        -> origin/gh/guangyey/169/head
2025-12-04T09:33:41.4065752Z  * [new branch]              gh/guangyey/169/orig        -> origin/gh/guangyey/169/orig
2025-12-04T09:33:41.4067701Z  * [new branch]              gh/guangyey/170/base        -> origin/gh/guangyey/170/base
2025-12-04T09:33:41.4069201Z  * [new branch]              gh/guangyey/170/head        -> origin/gh/guangyey/170/head
2025-12-04T09:33:41.4070674Z  * [new branch]              gh/guangyey/170/orig        -> origin/gh/guangyey/170/orig
2025-12-04T09:33:41.4072965Z  * [new branch]              gh/guangyey/171/base        -> origin/gh/guangyey/171/base
2025-12-04T09:33:41.4074424Z  * [new branch]              gh/guangyey/171/head        -> origin/gh/guangyey/171/head
2025-12-04T09:33:41.4075921Z  * [new branch]              gh/guangyey/171/orig        -> origin/gh/guangyey/171/orig
2025-12-04T09:33:41.4077827Z  * [new branch]              gh/guangyey/178/base        -> origin/gh/guangyey/178/base
2025-12-04T09:33:41.4079395Z  * [new branch]              gh/guangyey/178/head        -> origin/gh/guangyey/178/head
2025-12-04T09:33:41.4080791Z  * [new branch]              gh/guangyey/178/orig        -> origin/gh/guangyey/178/orig
2025-12-04T09:33:41.4082647Z  * [new branch]              gh/guangyey/182/base        -> origin/gh/guangyey/182/base
2025-12-04T09:33:41.4084170Z  * [new branch]              gh/guangyey/182/head        -> origin/gh/guangyey/182/head
2025-12-04T09:33:41.4085677Z  * [new branch]              gh/guangyey/182/orig        -> origin/gh/guangyey/182/orig
2025-12-04T09:33:41.4087626Z  * [new branch]              gh/guangyey/183/base        -> origin/gh/guangyey/183/base
2025-12-04T09:33:41.4089064Z  * [new branch]              gh/guangyey/183/head        -> origin/gh/guangyey/183/head
2025-12-04T09:33:41.4090602Z  * [new branch]              gh/guangyey/183/orig        -> origin/gh/guangyey/183/orig
2025-12-04T09:33:41.4092668Z  * [new branch]              gh/guangyey/185/base        -> origin/gh/guangyey/185/base
2025-12-04T09:33:41.4094144Z  * [new branch]              gh/guangyey/185/head        -> origin/gh/guangyey/185/head
2025-12-04T09:33:41.4095610Z  * [new branch]              gh/guangyey/185/orig        -> origin/gh/guangyey/185/orig
2025-12-04T09:33:41.4097741Z  * [new branch]              gh/guangyey/186/base        -> origin/gh/guangyey/186/base
2025-12-04T09:33:41.4099244Z  * [new branch]              gh/guangyey/186/head        -> origin/gh/guangyey/186/head
2025-12-04T09:33:41.4100752Z  * [new branch]              gh/guangyey/186/orig        -> origin/gh/guangyey/186/orig
2025-12-04T09:33:41.4103113Z  * [new branch]              gh/guangyey/187/base        -> origin/gh/guangyey/187/base
2025-12-04T09:33:41.4104715Z  * [new branch]              gh/guangyey/187/head        -> origin/gh/guangyey/187/head
2025-12-04T09:33:41.4106161Z  * [new branch]              gh/guangyey/187/orig        -> origin/gh/guangyey/187/orig
2025-12-04T09:33:41.4108165Z  * [new branch]              gh/guangyey/188/base        -> origin/gh/guangyey/188/base
2025-12-04T09:33:41.4109667Z  * [new branch]              gh/guangyey/188/head        -> origin/gh/guangyey/188/head
2025-12-04T09:33:41.4111191Z  * [new branch]              gh/guangyey/188/orig        -> origin/gh/guangyey/188/orig
2025-12-04T09:33:41.4113187Z  * [new branch]              gh/guangyey/190/base        -> origin/gh/guangyey/190/base
2025-12-04T09:33:41.4114618Z  * [new branch]              gh/guangyey/190/head        -> origin/gh/guangyey/190/head
2025-12-04T09:33:41.4116125Z  * [new branch]              gh/guangyey/190/orig        -> origin/gh/guangyey/190/orig
2025-12-04T09:33:41.4117935Z  * [new branch]              gh/guangyey/208/base        -> origin/gh/guangyey/208/base
2025-12-04T09:33:41.4119443Z  * [new branch]              gh/guangyey/208/head        -> origin/gh/guangyey/208/head
2025-12-04T09:33:41.4120926Z  * [new branch]              gh/guangyey/208/orig        -> origin/gh/guangyey/208/orig
2025-12-04T09:33:41.4122820Z  * [new branch]              gh/guangyey/228/base        -> origin/gh/guangyey/228/base
2025-12-04T09:33:41.4124322Z  * [new branch]              gh/guangyey/228/head        -> origin/gh/guangyey/228/head
2025-12-04T09:33:41.4125747Z  * [new branch]              gh/guangyey/228/orig        -> origin/gh/guangyey/228/orig
2025-12-04T09:33:41.4128291Z  * [new branch]              gh/guangyey/230/base        -> origin/gh/guangyey/230/base
2025-12-04T09:33:41.4129732Z  * [new branch]              gh/guangyey/230/head        -> origin/gh/guangyey/230/head
2025-12-04T09:33:41.4131220Z  * [new branch]              gh/guangyey/230/orig        -> origin/gh/guangyey/230/orig
2025-12-04T09:33:41.4133289Z  * [new branch]              gh/guangyey/231/base        -> origin/gh/guangyey/231/base
2025-12-04T09:33:41.4134770Z  * [new branch]              gh/guangyey/231/head        -> origin/gh/guangyey/231/head
2025-12-04T09:33:41.4136334Z  * [new branch]              gh/guangyey/231/orig        -> origin/gh/guangyey/231/orig
2025-12-04T09:33:41.4138429Z  * [new branch]              gh/guangyey/232/base        -> origin/gh/guangyey/232/base
2025-12-04T09:33:41.4140022Z  * [new branch]              gh/guangyey/232/head        -> origin/gh/guangyey/232/head
2025-12-04T09:33:41.4141395Z  * [new branch]              gh/guangyey/232/orig        -> origin/gh/guangyey/232/orig
2025-12-04T09:33:41.4143357Z  * [new branch]              gh/guangyey/233/base        -> origin/gh/guangyey/233/base
2025-12-04T09:33:41.4144834Z  * [new branch]              gh/guangyey/233/head        -> origin/gh/guangyey/233/head
2025-12-04T09:33:41.4146337Z  * [new branch]              gh/guangyey/233/orig        -> origin/gh/guangyey/233/orig
2025-12-04T09:33:41.4148334Z  * [new branch]              gh/guangyey/234/base        -> origin/gh/guangyey/234/base
2025-12-04T09:33:41.4149784Z  * [new branch]              gh/guangyey/234/head        -> origin/gh/guangyey/234/head
2025-12-04T09:33:41.4151288Z  * [new branch]              gh/guangyey/234/orig        -> origin/gh/guangyey/234/orig
2025-12-04T09:33:41.4153294Z  * [new branch]              gh/guangyey/235/base        -> origin/gh/guangyey/235/base
2025-12-04T09:33:41.4154781Z  * [new branch]              gh/guangyey/235/head        -> origin/gh/guangyey/235/head
2025-12-04T09:33:41.4156326Z  * [new branch]              gh/guangyey/235/orig        -> origin/gh/guangyey/235/orig
2025-12-04T09:33:41.4158392Z  * [new branch]              gh/guangyey/236/base        -> origin/gh/guangyey/236/base
2025-12-04T09:33:41.4159981Z  * [new branch]              gh/guangyey/236/head        -> origin/gh/guangyey/236/head
2025-12-04T09:33:41.4161340Z  * [new branch]              gh/guangyey/236/orig        -> origin/gh/guangyey/236/orig
2025-12-04T09:33:41.4163397Z  * [new branch]              gh/guangyey/237/base        -> origin/gh/guangyey/237/base
2025-12-04T09:33:41.4164914Z  * [new branch]              gh/guangyey/237/head        -> origin/gh/guangyey/237/head
2025-12-04T09:33:41.4166461Z  * [new branch]              gh/guangyey/237/orig        -> origin/gh/guangyey/237/orig
2025-12-04T09:33:41.4168429Z  * [new branch]              gh/guangyey/238/base        -> origin/gh/guangyey/238/base
2025-12-04T09:33:41.4169906Z  * [new branch]              gh/guangyey/238/head        -> origin/gh/guangyey/238/head
2025-12-04T09:33:41.4173694Z  * [new branch]              gh/guangyey/239/base        -> origin/gh/guangyey/239/base
2025-12-04T09:33:41.4175215Z  * [new branch]              gh/guangyey/239/head        -> origin/gh/guangyey/239/head
2025-12-04T09:33:41.4176822Z  * [new branch]              gh/guangyey/239/orig        -> origin/gh/guangyey/239/orig
2025-12-04T09:33:41.4178883Z  * [new branch]              gh/guangyey/240/base        -> origin/gh/guangyey/240/base
2025-12-04T09:33:41.4180361Z  * [new branch]              gh/guangyey/240/head        -> origin/gh/guangyey/240/head
2025-12-04T09:33:41.4181920Z  * [new branch]              gh/guangyey/240/orig        -> origin/gh/guangyey/240/orig
2025-12-04T09:33:41.4183942Z  * [new branch]              gh/guangyey/241/base        -> origin/gh/guangyey/241/base
2025-12-04T09:33:41.4185460Z  * [new branch]              gh/guangyey/241/head        -> origin/gh/guangyey/241/head
2025-12-04T09:33:41.4186950Z  * [new branch]              gh/guangyey/241/orig        -> origin/gh/guangyey/241/orig
2025-12-04T09:33:41.4189005Z  * [new branch]              gh/guangyey/242/base        -> origin/gh/guangyey/242/base
2025-12-04T09:33:41.4190459Z  * [new branch]              gh/guangyey/242/head        -> origin/gh/guangyey/242/head
2025-12-04T09:33:41.4191921Z  * [new branch]              gh/guangyey/242/orig        -> origin/gh/guangyey/242/orig
2025-12-04T09:33:41.4194092Z  * [new branch]              gh/guangyey/243/base        -> origin/gh/guangyey/243/base
2025-12-04T09:33:41.4195519Z  * [new branch]              gh/guangyey/243/head        -> origin/gh/guangyey/243/head
2025-12-04T09:33:41.4197032Z  * [new branch]              gh/guangyey/243/orig        -> origin/gh/guangyey/243/orig
2025-12-04T09:33:41.4199222Z  * [new branch]              gh/guangyey/244/base        -> origin/gh/guangyey/244/base
2025-12-04T09:33:41.4200715Z  * [new branch]              gh/guangyey/244/head        -> origin/gh/guangyey/244/head
2025-12-04T09:33:41.4202231Z  * [new branch]              gh/guangyey/244/orig        -> origin/gh/guangyey/244/orig
2025-12-04T09:33:41.4204308Z  * [new branch]              gh/guangyey/245/base        -> origin/gh/guangyey/245/base
2025-12-04T09:33:41.4205862Z  * [new branch]              gh/guangyey/245/head        -> origin/gh/guangyey/245/head
2025-12-04T09:33:41.4207363Z  * [new branch]              gh/guangyey/245/orig        -> origin/gh/guangyey/245/orig
2025-12-04T09:33:41.4209477Z  * [new branch]              gh/guangyey/246/base        -> origin/gh/guangyey/246/base
2025-12-04T09:33:41.4211220Z  * [new branch]              gh/guangyey/246/head        -> origin/gh/guangyey/246/head
2025-12-04T09:33:41.4212566Z  * [new branch]              gh/guangyey/246/orig        -> origin/gh/guangyey/246/orig
2025-12-04T09:33:41.4214716Z  * [new branch]              gh/guangyey/247/base        -> origin/gh/guangyey/247/base
2025-12-04T09:33:41.4216284Z  * [new branch]              gh/guangyey/247/head        -> origin/gh/guangyey/247/head
2025-12-04T09:33:41.4217850Z  * [new branch]              gh/guangyey/247/orig        -> origin/gh/guangyey/247/orig
2025-12-04T09:33:41.4219857Z  * [new branch]              gh/guangyey/248/base        -> origin/gh/guangyey/248/base
2025-12-04T09:33:41.4221463Z  * [new branch]              gh/guangyey/248/head        -> origin/gh/guangyey/248/head
2025-12-04T09:33:41.4222780Z  * [new branch]              gh/guangyey/248/orig        -> origin/gh/guangyey/248/orig
2025-12-04T09:33:41.4224821Z  * [new branch]              gh/guangyey/249/base        -> origin/gh/guangyey/249/base
2025-12-04T09:33:41.4226440Z  * [new branch]              gh/guangyey/249/head        -> origin/gh/guangyey/249/head
2025-12-04T09:33:41.4227906Z  * [new branch]              gh/guangyey/249/orig        -> origin/gh/guangyey/249/orig
2025-12-04T09:33:41.4230002Z  * [new branch]              gh/guangyey/250/base        -> origin/gh/guangyey/250/base
2025-12-04T09:33:41.4231502Z  * [new branch]              gh/guangyey/250/head        -> origin/gh/guangyey/250/head
2025-12-04T09:33:41.4233063Z  * [new branch]              gh/guangyey/250/orig        -> origin/gh/guangyey/250/orig
2025-12-04T09:33:41.4235654Z  * [new branch]              gh/guangyey/251/base        -> origin/gh/guangyey/251/base
2025-12-04T09:33:41.4237183Z  * [new branch]              gh/guangyey/251/head        -> origin/gh/guangyey/251/head
2025-12-04T09:33:41.4238718Z  * [new branch]              gh/guangyey/251/orig        -> origin/gh/guangyey/251/orig
2025-12-04T09:33:41.4240686Z  * [new branch]              gh/guangyey/252/base        -> origin/gh/guangyey/252/base
2025-12-04T09:33:41.4242237Z  * [new branch]              gh/guangyey/252/head        -> origin/gh/guangyey/252/head
2025-12-04T09:33:41.4243727Z  * [new branch]              gh/guangyey/252/orig        -> origin/gh/guangyey/252/orig
2025-12-04T09:33:41.4245723Z  * [new branch]              gh/guangyey/253/base        -> origin/gh/guangyey/253/base
2025-12-04T09:33:41.4247229Z  * [new branch]              gh/guangyey/253/head        -> origin/gh/guangyey/253/head
2025-12-04T09:33:41.4248689Z  * [new branch]              gh/guangyey/253/orig        -> origin/gh/guangyey/253/orig
2025-12-04T09:33:41.4250660Z  * [new branch]              gh/guangyey/254/base        -> origin/gh/guangyey/254/base
2025-12-04T09:33:41.4252232Z  * [new branch]              gh/guangyey/254/head        -> origin/gh/guangyey/254/head
2025-12-04T09:33:41.4254300Z  * [new branch]              gh/guangyey/254/orig        -> origin/gh/guangyey/254/orig
2025-12-04T09:33:41.4256432Z  * [new branch]              gh/guangyey/255/base        -> origin/gh/guangyey/255/base
2025-12-04T09:33:41.4258107Z  * [new branch]              gh/guangyey/255/head        -> origin/gh/guangyey/255/head
2025-12-04T09:33:41.4259612Z  * [new branch]              gh/guangyey/255/orig        -> origin/gh/guangyey/255/orig
2025-12-04T09:33:41.4262294Z  * [new branch]              gh/guilhermeleobas/107/base -> origin/gh/guilhermeleobas/107/base
2025-12-04T09:33:41.4264111Z  * [new branch]              gh/guilhermeleobas/107/head -> origin/gh/guilhermeleobas/107/head
2025-12-04T09:33:41.4265400Z  * [new branch]              gh/guilhermeleobas/107/orig -> origin/gh/guilhermeleobas/107/orig
2025-12-04T09:33:41.4267310Z  * [new branch]              gh/guilhermeleobas/108/base -> origin/gh/guilhermeleobas/108/base
2025-12-04T09:33:41.4268791Z  * [new branch]              gh/guilhermeleobas/108/head -> origin/gh/guilhermeleobas/108/head
2025-12-04T09:33:41.4270728Z  * [new branch]              gh/guilhermeleobas/108/orig -> origin/gh/guilhermeleobas/108/orig
2025-12-04T09:33:41.4272947Z  * [new branch]              gh/guilhermeleobas/150/base -> origin/gh/guilhermeleobas/150/base
2025-12-04T09:33:41.4276127Z  * [new branch]              gh/guilhermeleobas/150/head -> origin/gh/guilhermeleobas/150/head
2025-12-04T09:33:41.4277403Z  * [new branch]              gh/guilhermeleobas/150/orig -> origin/gh/guilhermeleobas/150/orig
2025-12-04T09:33:41.4279615Z  * [new branch]              gh/guilhermeleobas/168/base -> origin/gh/guilhermeleobas/168/base
2025-12-04T09:33:41.4280988Z  * [new branch]              gh/guilhermeleobas/168/head -> origin/gh/guilhermeleobas/168/head
2025-12-04T09:33:41.4282457Z  * [new branch]              gh/guilhermeleobas/168/orig -> origin/gh/guilhermeleobas/168/orig
2025-12-04T09:33:41.4284828Z  * [new branch]              gh/guilhermeleobas/169/base -> origin/gh/guilhermeleobas/169/base
2025-12-04T09:33:41.4285965Z  * [new branch]              gh/guilhermeleobas/169/head -> origin/gh/guilhermeleobas/169/head
2025-12-04T09:33:41.4288642Z  * [new branch]              gh/guilhermeleobas/169/orig -> origin/gh/guilhermeleobas/169/orig
2025-12-04T09:33:41.4289458Z  * [new branch]              gh/guilhermeleobas/170/base -> origin/gh/guilhermeleobas/170/base
2025-12-04T09:33:41.4290814Z  * [new branch]              gh/guilhermeleobas/170/head -> origin/gh/guilhermeleobas/170/head
2025-12-04T09:33:41.4292273Z  * [new branch]              gh/guilhermeleobas/170/orig -> origin/gh/guilhermeleobas/170/orig
2025-12-04T09:33:41.4294645Z  * [new branch]              gh/guilhermeleobas/171/base -> origin/gh/guilhermeleobas/171/base
2025-12-04T09:33:41.4296125Z  * [new branch]              gh/guilhermeleobas/171/head -> origin/gh/guilhermeleobas/171/head
2025-12-04T09:33:41.4297792Z  * [new branch]              gh/guilhermeleobas/171/orig -> origin/gh/guilhermeleobas/171/orig
2025-12-04T09:33:41.4299880Z  * [new branch]              gh/guilhermeleobas/173/base -> origin/gh/guilhermeleobas/173/base
2025-12-04T09:33:41.4301376Z  * [new branch]              gh/guilhermeleobas/173/head -> origin/gh/guilhermeleobas/173/head
2025-12-04T09:33:41.4302885Z  * [new branch]              gh/guilhermeleobas/173/orig -> origin/gh/guilhermeleobas/173/orig
2025-12-04T09:33:41.4304881Z  * [new branch]              gh/guilhermeleobas/193/base -> origin/gh/guilhermeleobas/193/base
2025-12-04T09:33:41.4306383Z  * [new branch]              gh/guilhermeleobas/193/head -> origin/gh/guilhermeleobas/193/head
2025-12-04T09:33:41.4308087Z  * [new branch]              gh/guilhermeleobas/193/orig -> origin/gh/guilhermeleobas/193/orig
2025-12-04T09:33:41.4310567Z  * [new branch]              gh/guilhermeleobas/204/base -> origin/gh/guilhermeleobas/204/base
2025-12-04T09:33:41.4312077Z  * [new branch]              gh/guilhermeleobas/204/head -> origin/gh/guilhermeleobas/204/head
2025-12-04T09:33:41.4313543Z  * [new branch]              gh/guilhermeleobas/204/orig -> origin/gh/guilhermeleobas/204/orig
2025-12-04T09:33:41.4315501Z  * [new branch]              gh/guilhermeleobas/211/base -> origin/gh/guilhermeleobas/211/base
2025-12-04T09:33:41.4316993Z  * [new branch]              gh/guilhermeleobas/211/head -> origin/gh/guilhermeleobas/211/head
2025-12-04T09:33:41.4318493Z  * [new branch]              gh/guilhermeleobas/211/orig -> origin/gh/guilhermeleobas/211/orig
2025-12-04T09:33:41.4320659Z  * [new branch]              gh/guilhermeleobas/226/base -> origin/gh/guilhermeleobas/226/base
2025-12-04T09:33:41.4322097Z  * [new branch]              gh/guilhermeleobas/226/head -> origin/gh/guilhermeleobas/226/head
2025-12-04T09:33:41.4323528Z  * [new branch]              gh/guilhermeleobas/226/orig -> origin/gh/guilhermeleobas/226/orig
2025-12-04T09:33:41.4325474Z  * [new branch]              gh/guilhermeleobas/236/base -> origin/gh/guilhermeleobas/236/base
2025-12-04T09:33:41.4328961Z  * [new branch]              gh/guilhermeleobas/236/head -> origin/gh/guilhermeleobas/236/head
2025-12-04T09:33:41.4329270Z  * [new branch]              gh/guilhermeleobas/236/orig -> origin/gh/guilhermeleobas/236/orig
2025-12-04T09:33:41.4330794Z  * [new branch]              gh/guilhermeleobas/247/base -> origin/gh/guilhermeleobas/247/base
2025-12-04T09:33:41.4331539Z  * [new branch]              gh/guilhermeleobas/247/head -> origin/gh/guilhermeleobas/247/head
2025-12-04T09:33:41.4333249Z  * [new branch]              gh/guilhermeleobas/247/orig -> origin/gh/guilhermeleobas/247/orig
2025-12-04T09:33:41.4335176Z  * [new branch]              gh/guilhermeleobas/248/base -> origin/gh/guilhermeleobas/248/base
2025-12-04T09:33:41.4336736Z  * [new branch]              gh/guilhermeleobas/248/head -> origin/gh/guilhermeleobas/248/head
2025-12-04T09:33:41.4338305Z  * [new branch]              gh/guilhermeleobas/248/orig -> origin/gh/guilhermeleobas/248/orig
2025-12-04T09:33:41.4340484Z  * [new branch]              gh/guilhermeleobas/250/base -> origin/gh/guilhermeleobas/250/base
2025-12-04T09:33:41.4341867Z  * [new branch]              gh/guilhermeleobas/250/head -> origin/gh/guilhermeleobas/250/head
2025-12-04T09:33:41.4343419Z  * [new branch]              gh/guilhermeleobas/250/orig -> origin/gh/guilhermeleobas/250/orig
2025-12-04T09:33:41.4345892Z  * [new branch]              gh/guilhermeleobas/253/base -> origin/gh/guilhermeleobas/253/base
2025-12-04T09:33:41.4347411Z  * [new branch]              gh/guilhermeleobas/253/head -> origin/gh/guilhermeleobas/253/head
2025-12-04T09:33:41.4349016Z  * [new branch]              gh/guilhermeleobas/253/orig -> origin/gh/guilhermeleobas/253/orig
2025-12-04T09:33:41.4351068Z  * [new branch]              gh/guilhermeleobas/254/base -> origin/gh/guilhermeleobas/254/base
2025-12-04T09:33:41.4352538Z  * [new branch]              gh/guilhermeleobas/254/head -> origin/gh/guilhermeleobas/254/head
2025-12-04T09:33:41.4354082Z  * [new branch]              gh/guilhermeleobas/254/orig -> origin/gh/guilhermeleobas/254/orig
2025-12-04T09:33:41.4356058Z  * [new branch]              gh/guilhermeleobas/255/base -> origin/gh/guilhermeleobas/255/base
2025-12-04T09:33:41.4357576Z  * [new branch]              gh/guilhermeleobas/255/head -> origin/gh/guilhermeleobas/255/head
2025-12-04T09:33:41.4359069Z  * [new branch]              gh/guilhermeleobas/255/orig -> origin/gh/guilhermeleobas/255/orig
2025-12-04T09:33:41.4361992Z  * [new branch]              gh/guilhermeleobas/256/base -> origin/gh/guilhermeleobas/256/base
2025-12-04T09:33:41.4363382Z  * [new branch]              gh/guilhermeleobas/256/head -> origin/gh/guilhermeleobas/256/head
2025-12-04T09:33:41.4364891Z  * [new branch]              gh/guilhermeleobas/256/orig -> origin/gh/guilhermeleobas/256/orig
2025-12-04T09:33:41.4366934Z  * [new branch]              gh/guilhermeleobas/257/base -> origin/gh/guilhermeleobas/257/base
2025-12-04T09:33:41.4368415Z  * [new branch]              gh/guilhermeleobas/257/head -> origin/gh/guilhermeleobas/257/head
2025-12-04T09:33:41.4370099Z  * [new branch]              gh/guilhermeleobas/257/orig -> origin/gh/guilhermeleobas/257/orig
2025-12-04T09:33:41.4372629Z  * [new branch]              gh/guilhermeleobas/258/base -> origin/gh/guilhermeleobas/258/base
2025-12-04T09:33:41.4373830Z  * [new branch]              gh/guilhermeleobas/258/head -> origin/gh/guilhermeleobas/258/head
2025-12-04T09:33:41.4375366Z  * [new branch]              gh/guilhermeleobas/258/orig -> origin/gh/guilhermeleobas/258/orig
2025-12-04T09:33:41.4377551Z  * [new branch]              gh/guilhermeleobas/259/base -> origin/gh/guilhermeleobas/259/base
2025-12-04T09:33:41.4379056Z  * [new branch]              gh/guilhermeleobas/259/head -> origin/gh/guilhermeleobas/259/head
2025-12-04T09:33:41.4380551Z  * [new branch]              gh/guilhermeleobas/259/orig -> origin/gh/guilhermeleobas/259/orig
2025-12-04T09:33:41.4382782Z  * [new branch]              gh/guilhermeleobas/260/base -> origin/gh/guilhermeleobas/260/base
2025-12-04T09:33:41.4384256Z  * [new branch]              gh/guilhermeleobas/260/head -> origin/gh/guilhermeleobas/260/head
2025-12-04T09:33:41.4385715Z  * [new branch]              gh/guilhermeleobas/260/orig -> origin/gh/guilhermeleobas/260/orig
2025-12-04T09:33:41.4387751Z  * [new branch]              gh/guilhermeleobas/261/base -> origin/gh/guilhermeleobas/261/base
2025-12-04T09:33:41.4389244Z  * [new branch]              gh/guilhermeleobas/261/head -> origin/gh/guilhermeleobas/261/head
2025-12-04T09:33:41.4390711Z  * [new branch]              gh/guilhermeleobas/261/orig -> origin/gh/guilhermeleobas/261/orig
2025-12-04T09:33:41.4392734Z  * [new branch]              gh/guilhermeleobas/262/base -> origin/gh/guilhermeleobas/262/base
2025-12-04T09:33:41.4394384Z  * [new branch]              gh/guilhermeleobas/262/head -> origin/gh/guilhermeleobas/262/head
2025-12-04T09:33:41.4395803Z  * [new branch]              gh/guilhermeleobas/262/orig -> origin/gh/guilhermeleobas/262/orig
2025-12-04T09:33:41.4398136Z  * [new branch]              gh/guilhermeleobas/263/base -> origin/gh/guilhermeleobas/263/base
2025-12-04T09:33:41.4399478Z  * [new branch]              gh/guilhermeleobas/263/head -> origin/gh/guilhermeleobas/263/head
2025-12-04T09:33:41.4401011Z  * [new branch]              gh/guilhermeleobas/263/orig -> origin/gh/guilhermeleobas/263/orig
2025-12-04T09:33:41.4403143Z  * [new branch]              gh/guilhermeleobas/264/base -> origin/gh/guilhermeleobas/264/base
2025-12-04T09:33:41.4404650Z  * [new branch]              gh/guilhermeleobas/264/head -> origin/gh/guilhermeleobas/264/head
2025-12-04T09:33:41.4406154Z  * [new branch]              gh/guilhermeleobas/264/orig -> origin/gh/guilhermeleobas/264/orig
2025-12-04T09:33:41.4408162Z  * [new branch]              gh/guilhermeleobas/265/base -> origin/gh/guilhermeleobas/265/base
2025-12-04T09:33:41.4409668Z  * [new branch]              gh/guilhermeleobas/265/head -> origin/gh/guilhermeleobas/265/head
2025-12-04T09:33:41.4411209Z  * [new branch]              gh/guilhermeleobas/265/orig -> origin/gh/guilhermeleobas/265/orig
2025-12-04T09:33:41.4413289Z  * [new branch]              gh/guilhermeleobas/266/base -> origin/gh/guilhermeleobas/266/base
2025-12-04T09:33:41.4414773Z  * [new branch]              gh/guilhermeleobas/266/head -> origin/gh/guilhermeleobas/266/head
2025-12-04T09:33:41.4416904Z  * [new branch]              gh/guilhermeleobas/266/orig -> origin/gh/guilhermeleobas/266/orig
2025-12-04T09:33:41.4418488Z  * [new branch]              gh/guilhermeleobas/267/base -> origin/gh/guilhermeleobas/267/base
2025-12-04T09:33:41.4419939Z  * [new branch]              gh/guilhermeleobas/267/head -> origin/gh/guilhermeleobas/267/head
2025-12-04T09:33:41.4421425Z  * [new branch]              gh/guilhermeleobas/267/orig -> origin/gh/guilhermeleobas/267/orig
2025-12-04T09:33:41.4424106Z  * [new branch]              gh/hameerabbasi/1/base      -> origin/gh/hameerabbasi/1/base
2025-12-04T09:33:41.4425626Z  * [new branch]              gh/hameerabbasi/1/head      -> origin/gh/hameerabbasi/1/head
2025-12-04T09:33:41.4427552Z  * [new branch]              gh/hameerabbasi/2/base      -> origin/gh/hameerabbasi/2/base
2025-12-04T09:33:41.4429028Z  * [new branch]              gh/hameerabbasi/2/head      -> origin/gh/hameerabbasi/2/head
2025-12-04T09:33:41.4430652Z  * [new branch]              gh/hameerabbasi/2/orig      -> origin/gh/hameerabbasi/2/orig
2025-12-04T09:33:41.4432507Z  * [new branch]              gh/hameerabbasi/3/base      -> origin/gh/hameerabbasi/3/base
2025-12-04T09:33:41.4433988Z  * [new branch]              gh/hameerabbasi/3/head      -> origin/gh/hameerabbasi/3/head
2025-12-04T09:33:41.4435763Z  * [new branch]              gh/hameerabbasi/3/orig      -> origin/gh/hameerabbasi/3/orig
2025-12-04T09:33:41.4437630Z  * [new branch]              gh/hameerabbasi/4/base      -> origin/gh/hameerabbasi/4/base
2025-12-04T09:33:41.4439193Z  * [new branch]              gh/hameerabbasi/4/head      -> origin/gh/hameerabbasi/4/head
2025-12-04T09:33:41.4440555Z  * [new branch]              gh/hameerabbasi/4/orig      -> origin/gh/hameerabbasi/4/orig
2025-12-04T09:33:41.4443043Z  * [new branch]              gh/huydhn/1/next            -> origin/gh/huydhn/1/next
2025-12-04T09:33:41.4444881Z  * [new branch]              gh/huydhn/2/next            -> origin/gh/huydhn/2/next
2025-12-04T09:33:41.4446789Z  * [new branch]              gh/huydhn/3/next            -> origin/gh/huydhn/3/next
2025-12-04T09:33:41.4448780Z  * [new branch]              gh/huydhn/4/next            -> origin/gh/huydhn/4/next
2025-12-04T09:33:41.4450659Z  * [new branch]              gh/huydhn/5/next            -> origin/gh/huydhn/5/next
2025-12-04T09:33:41.4452540Z  * [new branch]              gh/huydhn/6/next            -> origin/gh/huydhn/6/next
2025-12-04T09:33:41.4454862Z  * [new branch]              gh/int3/97/base             -> origin/gh/int3/97/base
2025-12-04T09:33:41.4456532Z  * [new branch]              gh/int3/97/head             -> origin/gh/int3/97/head
2025-12-04T09:33:41.4459135Z  * [new branch]              gh/isuruf/101/base          -> origin/gh/isuruf/101/base
2025-12-04T09:33:41.4460476Z  * [new branch]              gh/isuruf/101/head          -> origin/gh/isuruf/101/head
2025-12-04T09:33:41.4462419Z  * [new branch]              gh/isuruf/146/base          -> origin/gh/isuruf/146/base
2025-12-04T09:33:41.4463893Z  * [new branch]              gh/isuruf/146/head          -> origin/gh/isuruf/146/head
2025-12-04T09:33:41.4465354Z  * [new branch]              gh/isuruf/146/orig          -> origin/gh/isuruf/146/orig
2025-12-04T09:33:41.4467289Z  * [new branch]              gh/isuruf/158/base          -> origin/gh/isuruf/158/base
2025-12-04T09:33:41.4468752Z  * [new branch]              gh/isuruf/158/head          -> origin/gh/isuruf/158/head
2025-12-04T09:33:41.4470567Z  * [new branch]              gh/isuruf/159/base          -> origin/gh/isuruf/159/base
2025-12-04T09:33:41.4472217Z  * [new branch]              gh/isuruf/159/head          -> origin/gh/isuruf/159/head
2025-12-04T09:33:41.4474403Z  * [new branch]              gh/isuruf/160/base          -> origin/gh/isuruf/160/base
2025-12-04T09:33:41.4475837Z  * [new branch]              gh/isuruf/160/head          -> origin/gh/isuruf/160/head
2025-12-04T09:33:41.4477386Z  * [new branch]              gh/isuruf/160/orig          -> origin/gh/isuruf/160/orig
2025-12-04T09:33:41.4479305Z  * [new branch]              gh/isuruf/81/base           -> origin/gh/isuruf/81/base
2025-12-04T09:33:41.4480757Z  * [new branch]              gh/isuruf/81/head           -> origin/gh/isuruf/81/head
2025-12-04T09:33:41.4482415Z  * [new branch]              gh/isuruf/81/orig           -> origin/gh/isuruf/81/orig
2025-12-04T09:33:41.4484640Z  * [new branch]              gh/jamesjwu/176/base        -> origin/gh/jamesjwu/176/base
2025-12-04T09:33:41.4486187Z  * [new branch]              gh/jamesjwu/176/head        -> origin/gh/jamesjwu/176/head
2025-12-04T09:33:41.4487617Z  * [new branch]              gh/jamesjwu/176/orig        -> origin/gh/jamesjwu/176/orig
2025-12-04T09:33:41.4489589Z  * [new branch]              gh/jamesjwu/187/base        -> origin/gh/jamesjwu/187/base
2025-12-04T09:33:41.4490992Z  * [new branch]              gh/jamesjwu/187/head        -> origin/gh/jamesjwu/187/head
2025-12-04T09:33:41.4492441Z  * [new branch]              gh/jamesjwu/187/orig        -> origin/gh/jamesjwu/187/orig
2025-12-04T09:33:41.4494557Z  * [new branch]              gh/jamesjwu/196/base        -> origin/gh/jamesjwu/196/base
2025-12-04T09:33:41.4496014Z  * [new branch]              gh/jamesjwu/196/head        -> origin/gh/jamesjwu/196/head
2025-12-04T09:33:41.4497640Z  * [new branch]              gh/jamesjwu/196/orig        -> origin/gh/jamesjwu/196/orig
2025-12-04T09:33:41.4499516Z  * [new branch]              gh/jamesjwu/198/base        -> origin/gh/jamesjwu/198/base
2025-12-04T09:33:41.4500988Z  * [new branch]              gh/jamesjwu/198/head        -> origin/gh/jamesjwu/198/head
2025-12-04T09:33:41.4502420Z  * [new branch]              gh/jamesjwu/198/orig        -> origin/gh/jamesjwu/198/orig
2025-12-04T09:33:41.4504384Z  * [new branch]              gh/jamesjwu/207/base        -> origin/gh/jamesjwu/207/base
2025-12-04T09:33:41.4506133Z  * [new branch]              gh/jamesjwu/207/head        -> origin/gh/jamesjwu/207/head
2025-12-04T09:33:41.4507555Z  * [new branch]              gh/jamesjwu/207/orig        -> origin/gh/jamesjwu/207/orig
2025-12-04T09:33:41.4509654Z  * [new branch]              gh/jamesjwu/208/base        -> origin/gh/jamesjwu/208/base
2025-12-04T09:33:41.4511105Z  * [new branch]              gh/jamesjwu/208/head        -> origin/gh/jamesjwu/208/head
2025-12-04T09:33:41.4512553Z  * [new branch]              gh/jamesjwu/208/orig        -> origin/gh/jamesjwu/208/orig
2025-12-04T09:33:41.4514630Z  * [new branch]              gh/jamesjwu/52/base         -> origin/gh/jamesjwu/52/base
2025-12-04T09:33:41.4516110Z  * [new branch]              gh/jamesjwu/52/head         -> origin/gh/jamesjwu/52/head
2025-12-04T09:33:41.4518050Z  * [new branch]              gh/jamesjwu/53/base         -> origin/gh/jamesjwu/53/base
2025-12-04T09:33:41.4519348Z  * [new branch]              gh/jamesjwu/53/head         -> origin/gh/jamesjwu/53/head
2025-12-04T09:33:41.4521113Z  * [new branch]              gh/jamesjwu/54/base         -> origin/gh/jamesjwu/54/base
2025-12-04T09:33:41.4522544Z  * [new branch]              gh/jamesjwu/54/head         -> origin/gh/jamesjwu/54/head
2025-12-04T09:33:41.4524285Z  * [new branch]              gh/jamesjwu/55/base         -> origin/gh/jamesjwu/55/base
2025-12-04T09:33:41.4525687Z  * [new branch]              gh/jamesjwu/55/head         -> origin/gh/jamesjwu/55/head
2025-12-04T09:33:41.4527443Z  * [new branch]              gh/jamesjwu/56/base         -> origin/gh/jamesjwu/56/base
2025-12-04T09:33:41.4528893Z  * [new branch]              gh/jamesjwu/56/head         -> origin/gh/jamesjwu/56/head
2025-12-04T09:33:41.4530789Z  * [new branch]              gh/jamesjwu/57/base         -> origin/gh/jamesjwu/57/base
2025-12-04T09:33:41.4532268Z  * [new branch]              gh/jamesjwu/57/head         -> origin/gh/jamesjwu/57/head
2025-12-04T09:33:41.4533979Z  * [new branch]              gh/jamesjwu/58/base         -> origin/gh/jamesjwu/58/base
2025-12-04T09:33:41.4535520Z  * [new branch]              gh/jamesjwu/58/head         -> origin/gh/jamesjwu/58/head
2025-12-04T09:33:41.4537408Z  * [new branch]              gh/jamesjwu/59/base         -> origin/gh/jamesjwu/59/base
2025-12-04T09:33:41.4538862Z  * [new branch]              gh/jamesjwu/59/head         -> origin/gh/jamesjwu/59/head
2025-12-04T09:33:41.4540678Z  * [new branch]              gh/jamesjwu/60/base         -> origin/gh/jamesjwu/60/base
2025-12-04T09:33:41.4542094Z  * [new branch]              gh/jamesjwu/60/head         -> origin/gh/jamesjwu/60/head
2025-12-04T09:33:41.4543881Z  * [new branch]              gh/jamesjwu/61/base         -> origin/gh/jamesjwu/61/base
2025-12-04T09:33:41.4545271Z  * [new branch]              gh/jamesjwu/61/head         -> origin/gh/jamesjwu/61/head
2025-12-04T09:33:41.4547068Z  * [new branch]              gh/jamesjwu/62/base         -> origin/gh/jamesjwu/62/base
2025-12-04T09:33:41.4548580Z  * [new branch]              gh/jamesjwu/62/head         -> origin/gh/jamesjwu/62/head
2025-12-04T09:33:41.4550385Z  * [new branch]              gh/jamesjwu/63/base         -> origin/gh/jamesjwu/63/base
2025-12-04T09:33:41.4551975Z  * [new branch]              gh/jamesjwu/63/head         -> origin/gh/jamesjwu/63/head
2025-12-04T09:33:41.4554350Z  * [new branch]              gh/jamesjwu/64/base         -> origin/gh/jamesjwu/64/base
2025-12-04T09:33:41.4555829Z  * [new branch]              gh/jamesjwu/64/head         -> origin/gh/jamesjwu/64/head
2025-12-04T09:33:41.4558593Z  * [new branch]              gh/jamesjwu/65/base         -> origin/gh/jamesjwu/65/base
2025-12-04T09:33:41.4559941Z  * [new branch]              gh/jamesjwu/65/head         -> origin/gh/jamesjwu/65/head
2025-12-04T09:33:41.4562947Z  * [new branch]              gh/janeyx99/165/base        -> origin/gh/janeyx99/165/base
2025-12-04T09:33:41.4564151Z  * [new branch]              gh/janeyx99/165/head        -> origin/gh/janeyx99/165/head
2025-12-04T09:33:41.4565488Z  * [new branch]              gh/janeyx99/165/orig        -> origin/gh/janeyx99/165/orig
2025-12-04T09:33:41.4567333Z  * [new branch]              gh/janeyx99/201/base        -> origin/gh/janeyx99/201/base
2025-12-04T09:33:41.4568798Z  * [new branch]              gh/janeyx99/201/head        -> origin/gh/janeyx99/201/head
2025-12-04T09:33:41.4570228Z  * [new branch]              gh/janeyx99/201/orig        -> origin/gh/janeyx99/201/orig
2025-12-04T09:33:41.4573375Z  * [new branch]              gh/janeyx99/225/base        -> origin/gh/janeyx99/225/base
2025-12-04T09:33:41.4574317Z  * [new branch]              gh/janeyx99/225/head        -> origin/gh/janeyx99/225/head
2025-12-04T09:33:41.4575888Z  * [new branch]              gh/janeyx99/225/orig        -> origin/gh/janeyx99/225/orig
2025-12-04T09:33:41.4578022Z  * [new branch]              gh/janeyx99/299/base        -> origin/gh/janeyx99/299/base
2025-12-04T09:33:41.4579628Z  * [new branch]              gh/janeyx99/299/head        -> origin/gh/janeyx99/299/head
2025-12-04T09:33:41.4580913Z  * [new branch]              gh/janeyx99/299/orig        -> origin/gh/janeyx99/299/orig
2025-12-04T09:33:41.4583268Z  * [new branch]              gh/janeyx99/302/base        -> origin/gh/janeyx99/302/base
2025-12-04T09:33:41.4584824Z  * [new branch]              gh/janeyx99/302/head        -> origin/gh/janeyx99/302/head
2025-12-04T09:33:41.4586603Z  * [new branch]              gh/janeyx99/303/base        -> origin/gh/janeyx99/303/base
2025-12-04T09:33:41.4588123Z  * [new branch]              gh/janeyx99/303/head        -> origin/gh/janeyx99/303/head
2025-12-04T09:33:41.4590633Z  * [new branch]              gh/janeyx99/305/base        -> origin/gh/janeyx99/305/base
2025-12-04T09:33:41.4592173Z  * [new branch]              gh/janeyx99/305/head        -> origin/gh/janeyx99/305/head
2025-12-04T09:33:41.4593955Z  * [new branch]              gh/janeyx99/306/base        -> origin/gh/janeyx99/306/base
2025-12-04T09:33:41.4595799Z  * [new branch]              gh/janeyx99/306/head        -> origin/gh/janeyx99/306/head
2025-12-04T09:33:41.4597738Z  * [new branch]              gh/janeyx99/314/base        -> origin/gh/janeyx99/314/base
2025-12-04T09:33:41.4599309Z  * [new branch]              gh/janeyx99/314/head        -> origin/gh/janeyx99/314/head
2025-12-04T09:33:41.4600840Z  * [new branch]              gh/janeyx99/314/orig        -> origin/gh/janeyx99/314/orig
2025-12-04T09:33:41.4602841Z  * [new branch]              gh/janeyx99/315/base        -> origin/gh/janeyx99/315/base
2025-12-04T09:33:41.4604347Z  * [new branch]              gh/janeyx99/315/head        -> origin/gh/janeyx99/315/head
2025-12-04T09:33:41.4605887Z  * [new branch]              gh/janeyx99/315/orig        -> origin/gh/janeyx99/315/orig
2025-12-04T09:33:41.4607961Z  * [new branch]              gh/janeyx99/316/base        -> origin/gh/janeyx99/316/base
2025-12-04T09:33:41.4609480Z  * [new branch]              gh/janeyx99/316/head        -> origin/gh/janeyx99/316/head
2025-12-04T09:33:41.4610975Z  * [new branch]              gh/janeyx99/316/orig        -> origin/gh/janeyx99/316/orig
2025-12-04T09:33:41.4613111Z  * [new branch]              gh/janeyx99/317/base        -> origin/gh/janeyx99/317/base
2025-12-04T09:33:41.4614546Z  * [new branch]              gh/janeyx99/317/head        -> origin/gh/janeyx99/317/head
2025-12-04T09:33:41.4615949Z  * [new branch]              gh/janeyx99/317/orig        -> origin/gh/janeyx99/317/orig
2025-12-04T09:33:41.4618122Z  * [new branch]              gh/janeyx99/325/base        -> origin/gh/janeyx99/325/base
2025-12-04T09:33:41.4619649Z  * [new branch]              gh/janeyx99/325/head        -> origin/gh/janeyx99/325/head
2025-12-04T09:33:41.4621049Z  * [new branch]              gh/janeyx99/325/orig        -> origin/gh/janeyx99/325/orig
2025-12-04T09:33:41.4623031Z  * [new branch]              gh/janeyx99/327/base        -> origin/gh/janeyx99/327/base
2025-12-04T09:33:41.4624502Z  * [new branch]              gh/janeyx99/327/head        -> origin/gh/janeyx99/327/head
2025-12-04T09:33:41.4625977Z  * [new branch]              gh/janeyx99/327/orig        -> origin/gh/janeyx99/327/orig
2025-12-04T09:33:41.4628055Z  * [new branch]              gh/janeyx99/328/base        -> origin/gh/janeyx99/328/base
2025-12-04T09:33:41.4629659Z  * [new branch]              gh/janeyx99/328/head        -> origin/gh/janeyx99/328/head
2025-12-04T09:33:41.4631173Z  * [new branch]              gh/janeyx99/328/orig        -> origin/gh/janeyx99/328/orig
2025-12-04T09:33:41.4632967Z  * [new branch]              gh/janeyx99/329/base        -> origin/gh/janeyx99/329/base
2025-12-04T09:33:41.4634523Z  * [new branch]              gh/janeyx99/329/head        -> origin/gh/janeyx99/329/head
2025-12-04T09:33:41.4635957Z  * [new branch]              gh/janeyx99/329/orig        -> origin/gh/janeyx99/329/orig
2025-12-04T09:33:41.4638509Z  * [new branch]              gh/janeyx99/330/base        -> origin/gh/janeyx99/330/base
2025-12-04T09:33:41.4640144Z  * [new branch]              gh/janeyx99/330/head        -> origin/gh/janeyx99/330/head
2025-12-04T09:33:41.4641877Z  * [new branch]              gh/janeyx99/330/orig        -> origin/gh/janeyx99/330/orig
2025-12-04T09:33:41.4643672Z  * [new branch]              gh/janeyx99/331/base        -> origin/gh/janeyx99/331/base
2025-12-04T09:33:41.4645138Z  * [new branch]              gh/janeyx99/331/head        -> origin/gh/janeyx99/331/head
2025-12-04T09:33:41.4646594Z  * [new branch]              gh/janeyx99/331/orig        -> origin/gh/janeyx99/331/orig
2025-12-04T09:33:41.4648797Z  * [new branch]              gh/janeyx99/332/base        -> origin/gh/janeyx99/332/base
2025-12-04T09:33:41.4650202Z  * [new branch]              gh/janeyx99/332/head        -> origin/gh/janeyx99/332/head
2025-12-04T09:33:41.4651689Z  * [new branch]              gh/janeyx99/332/orig        -> origin/gh/janeyx99/332/orig
2025-12-04T09:33:41.4653515Z  * [new branch]              gh/janeyx99/333/base        -> origin/gh/janeyx99/333/base
2025-12-04T09:33:41.4655359Z  * [new branch]              gh/janeyx99/333/head        -> origin/gh/janeyx99/333/head
2025-12-04T09:33:41.4656535Z  * [new branch]              gh/janeyx99/333/orig        -> origin/gh/janeyx99/333/orig
2025-12-04T09:33:41.4658773Z  * [new branch]              gh/janeyx99/88/base         -> origin/gh/janeyx99/88/base
2025-12-04T09:33:41.4660289Z  * [new branch]              gh/janeyx99/88/head         -> origin/gh/janeyx99/88/head
2025-12-04T09:33:41.4661824Z  * [new branch]              gh/janeyx99/88/orig         -> origin/gh/janeyx99/88/orig
2025-12-04T09:33:41.4664739Z  * [new branch]              gh/jansel/360/base          -> origin/gh/jansel/360/base
2025-12-04T09:33:41.4666178Z  * [new branch]              gh/jansel/360/head          -> origin/gh/jansel/360/head
2025-12-04T09:33:41.4668249Z  * [new branch]              gh/jansel/451/base          -> origin/gh/jansel/451/base
2025-12-04T09:33:41.4669726Z  * [new branch]              gh/jansel/451/head          -> origin/gh/jansel/451/head
2025-12-04T09:33:41.4671411Z  * [new branch]              gh/jansel/451/orig          -> origin/gh/jansel/451/orig
2025-12-04T09:33:41.4676608Z  * [new branch]              gh/jansel/462/base          -> origin/gh/jansel/462/base
2025-12-04T09:33:41.4678020Z  * [new branch]              gh/jansel/462/head          -> origin/gh/jansel/462/head
2025-12-04T09:33:41.4679517Z  * [new branch]              gh/jansel/462/orig          -> origin/gh/jansel/462/orig
2025-12-04T09:33:41.4681999Z  * [new branch]              gh/jansel/533/base          -> origin/gh/jansel/533/base
2025-12-04T09:33:41.4683462Z  * [new branch]              gh/jansel/533/head          -> origin/gh/jansel/533/head
2025-12-04T09:33:41.4684911Z  * [new branch]              gh/jansel/533/orig          -> origin/gh/jansel/533/orig
2025-12-04T09:33:41.4686816Z  * [new branch]              gh/jansel/552/base          -> origin/gh/jansel/552/base
2025-12-04T09:33:41.4688264Z  * [new branch]              gh/jansel/552/head          -> origin/gh/jansel/552/head
2025-12-04T09:33:41.4689755Z  * [new branch]              gh/jansel/552/orig          -> origin/gh/jansel/552/orig
2025-12-04T09:33:41.4691791Z  * [new branch]              gh/jansel/553/base          -> origin/gh/jansel/553/base
2025-12-04T09:33:41.4693221Z  * [new branch]              gh/jansel/553/head          -> origin/gh/jansel/553/head
2025-12-04T09:33:41.4694750Z  * [new branch]              gh/jansel/553/orig          -> origin/gh/jansel/553/orig
2025-12-04T09:33:41.4696733Z  * [new branch]              gh/jansel/554/base          -> origin/gh/jansel/554/base
2025-12-04T09:33:41.4698251Z  * [new branch]              gh/jansel/554/head          -> origin/gh/jansel/554/head
2025-12-04T09:33:41.4699796Z  * [new branch]              gh/jansel/554/orig          -> origin/gh/jansel/554/orig
2025-12-04T09:33:41.4701748Z  * [new branch]              gh/jansel/555/base          -> origin/gh/jansel/555/base
2025-12-04T09:33:41.4703352Z  * [new branch]              gh/jansel/555/head          -> origin/gh/jansel/555/head
2025-12-04T09:33:41.4704684Z  * [new branch]              gh/jansel/555/orig          -> origin/gh/jansel/555/orig
2025-12-04T09:33:41.4706559Z  * [new branch]              gh/jansel/556/base          -> origin/gh/jansel/556/base
2025-12-04T09:33:41.4708007Z  * [new branch]              gh/jansel/556/head          -> origin/gh/jansel/556/head
2025-12-04T09:33:41.4709794Z  * [new branch]              gh/jansel/556/orig          -> origin/gh/jansel/556/orig
2025-12-04T09:33:41.4711540Z  * [new branch]              gh/jansel/557/base          -> origin/gh/jansel/557/base
2025-12-04T09:33:41.4713036Z  * [new branch]              gh/jansel/557/head          -> origin/gh/jansel/557/head
2025-12-04T09:33:41.4714529Z  * [new branch]              gh/jansel/557/orig          -> origin/gh/jansel/557/orig
2025-12-04T09:33:41.4717063Z  * [new branch]              gh/jansel/558/base          -> origin/gh/jansel/558/base
2025-12-04T09:33:41.4718558Z  * [new branch]              gh/jansel/558/head          -> origin/gh/jansel/558/head
2025-12-04T09:33:41.4720001Z  * [new branch]              gh/jansel/558/orig          -> origin/gh/jansel/558/orig
2025-12-04T09:33:41.4721938Z  * [new branch]              gh/jansel/559/base          -> origin/gh/jansel/559/base
2025-12-04T09:33:41.4723437Z  * [new branch]              gh/jansel/559/head          -> origin/gh/jansel/559/head
2025-12-04T09:33:41.4724877Z  * [new branch]              gh/jansel/559/orig          -> origin/gh/jansel/559/orig
2025-12-04T09:33:41.4726801Z  * [new branch]              gh/jansel/560/base          -> origin/gh/jansel/560/base
2025-12-04T09:33:41.4728248Z  * [new branch]              gh/jansel/560/head          -> origin/gh/jansel/560/head
2025-12-04T09:33:41.4729834Z  * [new branch]              gh/jansel/560/orig          -> origin/gh/jansel/560/orig
2025-12-04T09:33:41.4731837Z  * [new branch]              gh/jansel/561/base          -> origin/gh/jansel/561/base
2025-12-04T09:33:41.4733338Z  * [new branch]              gh/jansel/561/head          -> origin/gh/jansel/561/head
2025-12-04T09:33:41.4734772Z  * [new branch]              gh/jansel/561/orig          -> origin/gh/jansel/561/orig
2025-12-04T09:33:41.4737166Z  * [new branch]              gh/jansel/562/base          -> origin/gh/jansel/562/base
2025-12-04T09:33:41.4738307Z  * [new branch]              gh/jansel/562/head          -> origin/gh/jansel/562/head
2025-12-04T09:33:41.4739751Z  * [new branch]              gh/jansel/562/orig          -> origin/gh/jansel/562/orig
2025-12-04T09:33:41.4741726Z  * [new branch]              gh/jansel/563/base          -> origin/gh/jansel/563/base
2025-12-04T09:33:41.4743145Z  * [new branch]              gh/jansel/563/head          -> origin/gh/jansel/563/head
2025-12-04T09:33:41.4744628Z  * [new branch]              gh/jansel/563/orig          -> origin/gh/jansel/563/orig
2025-12-04T09:33:41.4747215Z  * [new branch]              gh/jansel/564/base          -> origin/gh/jansel/564/base
2025-12-04T09:33:41.4748890Z  * [new branch]              gh/jansel/564/head          -> origin/gh/jansel/564/head
2025-12-04T09:33:41.4750244Z  * [new branch]              gh/jansel/564/orig          -> origin/gh/jansel/564/orig
2025-12-04T09:33:41.4752384Z  * [new branch]              gh/jansel/565/base          -> origin/gh/jansel/565/base
2025-12-04T09:33:41.4753898Z  * [new branch]              gh/jansel/565/head          -> origin/gh/jansel/565/head
2025-12-04T09:33:41.4755417Z  * [new branch]              gh/jansel/565/orig          -> origin/gh/jansel/565/orig
2025-12-04T09:33:41.4757452Z  * [new branch]              gh/jansel/566/base          -> origin/gh/jansel/566/base
2025-12-04T09:33:41.4758915Z  * [new branch]              gh/jansel/566/head          -> origin/gh/jansel/566/head
2025-12-04T09:33:41.4760357Z  * [new branch]              gh/jansel/566/orig          -> origin/gh/jansel/566/orig
2025-12-04T09:33:41.4762465Z  * [new branch]              gh/jansel/567/base          -> origin/gh/jansel/567/base
2025-12-04T09:33:41.4763990Z  * [new branch]              gh/jansel/567/head          -> origin/gh/jansel/567/head
2025-12-04T09:33:41.4765417Z  * [new branch]              gh/jansel/567/orig          -> origin/gh/jansel/567/orig
2025-12-04T09:33:41.4767482Z  * [new branch]              gh/jansel/568/base          -> origin/gh/jansel/568/base
2025-12-04T09:33:41.4769042Z  * [new branch]              gh/jansel/568/head          -> origin/gh/jansel/568/head
2025-12-04T09:33:41.4770455Z  * [new branch]              gh/jansel/568/orig          -> origin/gh/jansel/568/orig
2025-12-04T09:33:41.4772752Z  * [new branch]              gh/jansel/569/base          -> origin/gh/jansel/569/base
2025-12-04T09:33:41.4774178Z  * [new branch]              gh/jansel/569/head          -> origin/gh/jansel/569/head
2025-12-04T09:33:41.4775676Z  * [new branch]              gh/jansel/569/orig          -> origin/gh/jansel/569/orig
2025-12-04T09:33:41.4777773Z  * [new branch]              gh/jansel/570/base          -> origin/gh/jansel/570/base
2025-12-04T09:33:41.4779238Z  * [new branch]              gh/jansel/570/head          -> origin/gh/jansel/570/head
2025-12-04T09:33:41.4780722Z  * [new branch]              gh/jansel/570/orig          -> origin/gh/jansel/570/orig
2025-12-04T09:33:41.4782665Z  * [new branch]              gh/jansel/571/base          -> origin/gh/jansel/571/base
2025-12-04T09:33:41.4784190Z  * [new branch]              gh/jansel/571/head          -> origin/gh/jansel/571/head
2025-12-04T09:33:41.4785701Z  * [new branch]              gh/jansel/571/orig          -> origin/gh/jansel/571/orig
2025-12-04T09:33:41.4787600Z  * [new branch]              gh/jansel/572/base          -> origin/gh/jansel/572/base
2025-12-04T09:33:41.4789103Z  * [new branch]              gh/jansel/572/head          -> origin/gh/jansel/572/head
2025-12-04T09:33:41.4790573Z  * [new branch]              gh/jansel/572/orig          -> origin/gh/jansel/572/orig
2025-12-04T09:33:41.4792799Z  * [new branch]              gh/jansel/573/base          -> origin/gh/jansel/573/base
2025-12-04T09:33:41.4794291Z  * [new branch]              gh/jansel/573/head          -> origin/gh/jansel/573/head
2025-12-04T09:33:41.4795818Z  * [new branch]              gh/jansel/573/orig          -> origin/gh/jansel/573/orig
2025-12-04T09:33:41.4797891Z  * [new branch]              gh/jansel/574/base          -> origin/gh/jansel/574/base
2025-12-04T09:33:41.4799375Z  * [new branch]              gh/jansel/574/head          -> origin/gh/jansel/574/head
2025-12-04T09:33:41.4800914Z  * [new branch]              gh/jansel/574/orig          -> origin/gh/jansel/574/orig
2025-12-04T09:33:41.4802903Z  * [new branch]              gh/jansel/575/base          -> origin/gh/jansel/575/base
2025-12-04T09:33:41.4804465Z  * [new branch]              gh/jansel/575/head          -> origin/gh/jansel/575/head
2025-12-04T09:33:41.4805962Z  * [new branch]              gh/jansel/575/orig          -> origin/gh/jansel/575/orig
2025-12-04T09:33:41.4808048Z  * [new branch]              gh/jansel/576/base          -> origin/gh/jansel/576/base
2025-12-04T09:33:41.4809564Z  * [new branch]              gh/jansel/576/head          -> origin/gh/jansel/576/head
2025-12-04T09:33:41.4811526Z  * [new branch]              gh/jansel/576/orig          -> origin/gh/jansel/576/orig
2025-12-04T09:33:41.4814179Z  * [new branch]              gh/jbschlosser/247/base     -> origin/gh/jbschlosser/247/base
2025-12-04T09:33:41.4815706Z  * [new branch]              gh/jbschlosser/247/head     -> origin/gh/jbschlosser/247/head
2025-12-04T09:33:41.4817347Z  * [new branch]              gh/jbschlosser/247/orig     -> origin/gh/jbschlosser/247/orig
2025-12-04T09:33:41.4819973Z  * [new branch]              gh/jbschlosser/250/base     -> origin/gh/jbschlosser/250/base
2025-12-04T09:33:41.4821340Z  * [new branch]              gh/jbschlosser/250/head     -> origin/gh/jbschlosser/250/head
2025-12-04T09:33:41.4822868Z  * [new branch]              gh/jbschlosser/250/orig     -> origin/gh/jbschlosser/250/orig
2025-12-04T09:33:41.4826163Z  * [new branch]              gh/jerryzh168/1/base        -> origin/gh/jerryzh168/1/base
2025-12-04T09:33:41.4827487Z  * [new branch]              gh/jerryzh168/1/head        -> origin/gh/jerryzh168/1/head
2025-12-04T09:33:41.4828937Z  * [new branch]              gh/jerryzh168/1/orig        -> origin/gh/jerryzh168/1/orig
2025-12-04T09:33:41.4831336Z  * [new branch]              gh/jiayisunx/59/base        -> origin/gh/jiayisunx/59/base
2025-12-04T09:33:41.4833064Z  * [new branch]              gh/jiayisunx/59/head        -> origin/gh/jiayisunx/59/head
2025-12-04T09:33:41.4834569Z  * [new branch]              gh/jiayisunx/59/orig        -> origin/gh/jiayisunx/59/orig
2025-12-04T09:33:41.4836434Z  * [new branch]              gh/jiayisunx/61/base        -> origin/gh/jiayisunx/61/base
2025-12-04T09:33:41.4837922Z  * [new branch]              gh/jiayisunx/61/head        -> origin/gh/jiayisunx/61/head
2025-12-04T09:33:41.4839447Z  * [new branch]              gh/jiayisunx/61/orig        -> origin/gh/jiayisunx/61/orig
2025-12-04T09:33:41.4841441Z  * [new branch]              gh/jiayisunx/68/base        -> origin/gh/jiayisunx/68/base
2025-12-04T09:33:41.4842857Z  * [new branch]              gh/jiayisunx/68/head        -> origin/gh/jiayisunx/68/head
2025-12-04T09:33:41.4844380Z  * [new branch]              gh/jiayisunx/68/orig        -> origin/gh/jiayisunx/68/orig
2025-12-04T09:33:41.4846340Z  * [new branch]              gh/jiayisunx/77/base        -> origin/gh/jiayisunx/77/base
2025-12-04T09:33:41.4847840Z  * [new branch]              gh/jiayisunx/77/head        -> origin/gh/jiayisunx/77/head
2025-12-04T09:33:41.4849287Z  * [new branch]              gh/jiayisunx/77/orig        -> origin/gh/jiayisunx/77/orig
2025-12-04T09:33:41.4851833Z  * [new branch]              gh/jiayisunx/78/base        -> origin/gh/jiayisunx/78/base
2025-12-04T09:33:41.4853470Z  * [new branch]              gh/jiayisunx/78/head        -> origin/gh/jiayisunx/78/head
2025-12-04T09:33:41.4854986Z  * [new branch]              gh/jiayisunx/78/orig        -> origin/gh/jiayisunx/78/orig
2025-12-04T09:33:41.4857094Z  * [new branch]              gh/jiayisunx/79/base        -> origin/gh/jiayisunx/79/base
2025-12-04T09:33:41.4858737Z  * [new branch]              gh/jiayisunx/79/head        -> origin/gh/jiayisunx/79/head
2025-12-04T09:33:41.4860751Z  * [new branch]              gh/jiayisunx/79/orig        -> origin/gh/jiayisunx/79/orig
2025-12-04T09:33:41.4862763Z  * [new branch]              gh/jiayisunx/82/base        -> origin/gh/jiayisunx/82/base
2025-12-04T09:33:41.4864266Z  * [new branch]              gh/jiayisunx/82/head        -> origin/gh/jiayisunx/82/head
2025-12-04T09:33:41.4865794Z  * [new branch]              gh/jiayisunx/82/orig        -> origin/gh/jiayisunx/82/orig
2025-12-04T09:33:41.4867693Z  * [new branch]              gh/jiayisunx/83/base        -> origin/gh/jiayisunx/83/base
2025-12-04T09:33:41.4869267Z  * [new branch]              gh/jiayisunx/83/head        -> origin/gh/jiayisunx/83/head
2025-12-04T09:33:41.4870760Z  * [new branch]              gh/jiayisunx/83/orig        -> origin/gh/jiayisunx/83/orig
2025-12-04T09:33:41.4872896Z  * [new branch]              gh/jiayisunx/84/base        -> origin/gh/jiayisunx/84/base
2025-12-04T09:33:41.4874484Z  * [new branch]              gh/jiayisunx/84/head        -> origin/gh/jiayisunx/84/head
2025-12-04T09:33:41.4875984Z  * [new branch]              gh/jiayisunx/84/orig        -> origin/gh/jiayisunx/84/orig
2025-12-04T09:33:41.4877985Z  * [new branch]              gh/jiayisunx/85/base        -> origin/gh/jiayisunx/85/base
2025-12-04T09:33:41.4879484Z  * [new branch]              gh/jiayisunx/85/head        -> origin/gh/jiayisunx/85/head
2025-12-04T09:33:41.4880976Z  * [new branch]              gh/jiayisunx/85/orig        -> origin/gh/jiayisunx/85/orig
2025-12-04T09:33:41.4882825Z  * [new branch]              gh/jiayisunx/86/base        -> origin/gh/jiayisunx/86/base
2025-12-04T09:33:41.4884324Z  * [new branch]              gh/jiayisunx/86/head        -> origin/gh/jiayisunx/86/head
2025-12-04T09:33:41.4886183Z  * [new branch]              gh/jiayisunx/86/orig        -> origin/gh/jiayisunx/86/orig
2025-12-04T09:33:41.4887814Z  * [new branch]              gh/jiayisunx/87/base        -> origin/gh/jiayisunx/87/base
2025-12-04T09:33:41.4889261Z  * [new branch]              gh/jiayisunx/87/head        -> origin/gh/jiayisunx/87/head
2025-12-04T09:33:41.4890719Z  * [new branch]              gh/jiayisunx/87/orig        -> origin/gh/jiayisunx/87/orig
2025-12-04T09:33:41.4892693Z  * [new branch]              gh/jiayisunx/88/base        -> origin/gh/jiayisunx/88/base
2025-12-04T09:33:41.4894278Z  * [new branch]              gh/jiayisunx/88/head        -> origin/gh/jiayisunx/88/head
2025-12-04T09:33:41.4895774Z  * [new branch]              gh/jiayisunx/88/orig        -> origin/gh/jiayisunx/88/orig
2025-12-04T09:33:41.4897869Z  * [new branch]              gh/jiayisunx/89/base        -> origin/gh/jiayisunx/89/base
2025-12-04T09:33:41.4899308Z  * [new branch]              gh/jiayisunx/89/head        -> origin/gh/jiayisunx/89/head
2025-12-04T09:33:41.4900789Z  * [new branch]              gh/jiayisunx/89/orig        -> origin/gh/jiayisunx/89/orig
2025-12-04T09:33:41.4902771Z  * [new branch]              gh/jiayisunx/90/base        -> origin/gh/jiayisunx/90/base
2025-12-04T09:33:41.4904321Z  * [new branch]              gh/jiayisunx/90/head        -> origin/gh/jiayisunx/90/head
2025-12-04T09:33:41.4905791Z  * [new branch]              gh/jiayisunx/90/orig        -> origin/gh/jiayisunx/90/orig
2025-12-04T09:33:41.4908138Z  * [new branch]              gh/jjwu@meta.com/1/base     -> origin/gh/jjwu@meta.com/1/base
2025-12-04T09:33:41.4909580Z  * [new branch]              gh/jjwu@meta.com/1/head     -> origin/gh/jjwu@meta.com/1/head
2025-12-04T09:33:41.4912064Z  * [new branch]              gh/jturney/1/base           -> origin/gh/jturney/1/base
2025-12-04T09:33:41.4913664Z  * [new branch]              gh/jturney/1/head           -> origin/gh/jturney/1/head
2025-12-04T09:33:41.4915158Z  * [new branch]              gh/jturney/1/orig           -> origin/gh/jturney/1/orig
2025-12-04T09:33:41.4917085Z  * [new branch]              gh/jturney/2/base           -> origin/gh/jturney/2/base
2025-12-04T09:33:41.4918547Z  * [new branch]              gh/jturney/2/head           -> origin/gh/jturney/2/head
2025-12-04T09:33:41.4920031Z  * [new branch]              gh/jturney/2/orig           -> origin/gh/jturney/2/orig
2025-12-04T09:33:41.4922651Z  * [new branch]              gh/karthickai/10/base       -> origin/gh/karthickai/10/base
2025-12-04T09:33:41.4924285Z  * [new branch]              gh/karthickai/10/head       -> origin/gh/karthickai/10/head
2025-12-04T09:33:41.4925778Z  * [new branch]              gh/karthickai/10/orig       -> origin/gh/karthickai/10/orig
2025-12-04T09:33:41.4927734Z  * [new branch]              gh/karthickai/11/base       -> origin/gh/karthickai/11/base
2025-12-04T09:33:41.4929388Z  * [new branch]              gh/karthickai/11/head       -> origin/gh/karthickai/11/head
2025-12-04T09:33:41.4930945Z  * [new branch]              gh/karthickai/11/orig       -> origin/gh/karthickai/11/orig
2025-12-04T09:33:41.4933495Z  * [new branch]              gh/karthickai/12/base       -> origin/gh/karthickai/12/base
2025-12-04T09:33:41.4935058Z  * [new branch]              gh/karthickai/12/head       -> origin/gh/karthickai/12/head
2025-12-04T09:33:41.4936669Z  * [new branch]              gh/karthickai/12/orig       -> origin/gh/karthickai/12/orig
2025-12-04T09:33:41.4938705Z  * [new branch]              gh/karthickai/13/base       -> origin/gh/karthickai/13/base
2025-12-04T09:33:41.4940261Z  * [new branch]              gh/karthickai/13/head       -> origin/gh/karthickai/13/head
2025-12-04T09:33:41.4941752Z  * [new branch]              gh/karthickai/13/orig       -> origin/gh/karthickai/13/orig
2025-12-04T09:33:41.4943989Z  * [new branch]              gh/karthickai/14/base       -> origin/gh/karthickai/14/base
2025-12-04T09:33:41.4945636Z  * [new branch]              gh/karthickai/14/head       -> origin/gh/karthickai/14/head
2025-12-04T09:33:41.4947291Z  * [new branch]              gh/karthickai/14/orig       -> origin/gh/karthickai/14/orig
2025-12-04T09:33:41.4949494Z  * [new branch]              gh/karthickai/15/base       -> origin/gh/karthickai/15/base
2025-12-04T09:33:41.4951057Z  * [new branch]              gh/karthickai/15/head       -> origin/gh/karthickai/15/head
2025-12-04T09:33:41.4952572Z  * [new branch]              gh/karthickai/15/orig       -> origin/gh/karthickai/15/orig
2025-12-04T09:33:41.4954571Z  * [new branch]              gh/karthickai/16/base       -> origin/gh/karthickai/16/base
2025-12-04T09:33:41.4956083Z  * [new branch]              gh/karthickai/16/head       -> origin/gh/karthickai/16/head
2025-12-04T09:33:41.4957839Z  * [new branch]              gh/karthickai/16/orig       -> origin/gh/karthickai/16/orig
2025-12-04T09:33:41.4959799Z  * [new branch]              gh/karthickai/17/base       -> origin/gh/karthickai/17/base
2025-12-04T09:33:41.4961862Z  * [new branch]              gh/karthickai/17/head       -> origin/gh/karthickai/17/head
2025-12-04T09:33:41.4963333Z  * [new branch]              gh/karthickai/17/orig       -> origin/gh/karthickai/17/orig
2025-12-04T09:33:41.4965412Z  * [new branch]              gh/karthickai/18/base       -> origin/gh/karthickai/18/base
2025-12-04T09:33:41.4967285Z  * [new branch]              gh/karthickai/18/head       -> origin/gh/karthickai/18/head
2025-12-04T09:33:41.4968981Z  * [new branch]              gh/karthickai/18/orig       -> origin/gh/karthickai/18/orig
2025-12-04T09:33:41.4971213Z  * [new branch]              gh/karthickai/19/base       -> origin/gh/karthickai/19/base
2025-12-04T09:33:41.4972770Z  * [new branch]              gh/karthickai/19/head       -> origin/gh/karthickai/19/head
2025-12-04T09:33:41.4974226Z  * [new branch]              gh/karthickai/19/orig       -> origin/gh/karthickai/19/orig
2025-12-04T09:33:41.4977347Z  * [new branch]              gh/karthickai/20/base       -> origin/gh/karthickai/20/base
2025-12-04T09:33:41.4979449Z  * [new branch]              gh/karthickai/20/head       -> origin/gh/karthickai/20/head
2025-12-04T09:33:41.4980957Z  * [new branch]              gh/karthickai/20/orig       -> origin/gh/karthickai/20/orig
2025-12-04T09:33:41.4983059Z  * [new branch]              gh/karthickai/21/base       -> origin/gh/karthickai/21/base
2025-12-04T09:33:41.4984857Z  * [new branch]              gh/karthickai/21/head       -> origin/gh/karthickai/21/head
2025-12-04T09:33:41.4986361Z  * [new branch]              gh/karthickai/21/orig       -> origin/gh/karthickai/21/orig
2025-12-04T09:33:41.4988568Z  * [new branch]              gh/karthickai/22/base       -> origin/gh/karthickai/22/base
2025-12-04T09:33:41.4990013Z  * [new branch]              gh/karthickai/22/head       -> origin/gh/karthickai/22/head
2025-12-04T09:33:41.4991461Z  * [new branch]              gh/karthickai/22/orig       -> origin/gh/karthickai/22/orig
2025-12-04T09:33:41.4993669Z  * [new branch]              gh/karthickai/23/base       -> origin/gh/karthickai/23/base
2025-12-04T09:33:41.4995350Z  * [new branch]              gh/karthickai/23/head       -> origin/gh/karthickai/23/head
2025-12-04T09:33:41.4996826Z  * [new branch]              gh/karthickai/23/orig       -> origin/gh/karthickai/23/orig
2025-12-04T09:33:41.4998962Z  * [new branch]              gh/karthickai/24/base       -> origin/gh/karthickai/24/base
2025-12-04T09:33:41.5000498Z  * [new branch]              gh/karthickai/24/head       -> origin/gh/karthickai/24/head
2025-12-04T09:33:41.5001985Z  * [new branch]              gh/karthickai/24/orig       -> origin/gh/karthickai/24/orig
2025-12-04T09:33:41.5004529Z  * [new branch]              gh/karthickai/25/base       -> origin/gh/karthickai/25/base
2025-12-04T09:33:41.5006191Z  * [new branch]              gh/karthickai/25/head       -> origin/gh/karthickai/25/head
2025-12-04T09:33:41.5007659Z  * [new branch]              gh/karthickai/25/orig       -> origin/gh/karthickai/25/orig
2025-12-04T09:33:41.5009516Z  * [new branch]              gh/karthickai/26/base       -> origin/gh/karthickai/26/base
2025-12-04T09:33:41.5011338Z  * [new branch]              gh/karthickai/26/head       -> origin/gh/karthickai/26/head
2025-12-04T09:33:41.5012761Z  * [new branch]              gh/karthickai/26/orig       -> origin/gh/karthickai/26/orig
2025-12-04T09:33:41.5016547Z  * [new branch]              gh/karthickai/6/base        -> origin/gh/karthickai/6/base
2025-12-04T09:33:41.5018846Z  * [new branch]              gh/karthickai/6/head        -> origin/gh/karthickai/6/head
2025-12-04T09:33:41.5020349Z  * [new branch]              gh/karthickai/6/orig        -> origin/gh/karthickai/6/orig
2025-12-04T09:33:41.5022839Z  * [new branch]              gh/krocki/1/base            -> origin/gh/krocki/1/base
2025-12-04T09:33:41.5024322Z  * [new branch]              gh/krocki/1/head            -> origin/gh/krocki/1/head
2025-12-04T09:33:41.5025830Z  * [new branch]              gh/krocki/1/orig            -> origin/gh/krocki/1/orig
2025-12-04T09:33:41.5027852Z  * [new branch]              gh/krocki/2/base            -> origin/gh/krocki/2/base
2025-12-04T09:33:41.5029408Z  * [new branch]              gh/krocki/2/head            -> origin/gh/krocki/2/head
2025-12-04T09:33:41.5030854Z  * [new branch]              gh/krocki/2/orig            -> origin/gh/krocki/2/orig
2025-12-04T09:33:41.5033246Z  * [new branch]              gh/kurtamohler/60/base      -> origin/gh/kurtamohler/60/base
2025-12-04T09:33:41.5034740Z  * [new branch]              gh/kurtamohler/60/head      -> origin/gh/kurtamohler/60/head
2025-12-04T09:33:41.5036190Z  * [new branch]              gh/kurtamohler/60/orig      -> origin/gh/kurtamohler/60/orig
2025-12-04T09:33:41.5038151Z  * [new branch]              gh/kurtamohler/61/base      -> origin/gh/kurtamohler/61/base
2025-12-04T09:33:41.5039721Z  * [new branch]              gh/kurtamohler/61/head      -> origin/gh/kurtamohler/61/head
2025-12-04T09:33:41.5041328Z  * [new branch]              gh/kurtamohler/61/orig      -> origin/gh/kurtamohler/61/orig
2025-12-04T09:33:41.5043723Z  * [new branch]              gh/kurtamohler/62/base      -> origin/gh/kurtamohler/62/base
2025-12-04T09:33:41.5045226Z  * [new branch]              gh/kurtamohler/62/head      -> origin/gh/kurtamohler/62/head
2025-12-04T09:33:41.5046722Z  * [new branch]              gh/kurtamohler/62/orig      -> origin/gh/kurtamohler/62/orig
2025-12-04T09:33:41.5049256Z  * [new branch]              gh/kurtamohler/63/base      -> origin/gh/kurtamohler/63/base
2025-12-04T09:33:41.5050780Z  * [new branch]              gh/kurtamohler/63/head      -> origin/gh/kurtamohler/63/head
2025-12-04T09:33:41.5052285Z  * [new branch]              gh/kurtamohler/63/orig      -> origin/gh/kurtamohler/63/orig
2025-12-04T09:33:41.5054440Z  * [new branch]              gh/kurtamohler/64/base      -> origin/gh/kurtamohler/64/base
2025-12-04T09:33:41.5055912Z  * [new branch]              gh/kurtamohler/64/head      -> origin/gh/kurtamohler/64/head
2025-12-04T09:33:41.5057596Z  * [new branch]              gh/kurtamohler/64/orig      -> origin/gh/kurtamohler/64/orig
2025-12-04T09:33:41.5059553Z  * [new branch]              gh/kurtamohler/65/base      -> origin/gh/kurtamohler/65/base
2025-12-04T09:33:41.5061141Z  * [new branch]              gh/kurtamohler/65/head      -> origin/gh/kurtamohler/65/head
2025-12-04T09:33:41.5062650Z  * [new branch]              gh/kurtamohler/65/orig      -> origin/gh/kurtamohler/65/orig
2025-12-04T09:33:41.5064522Z  * [new branch]              gh/kurtamohler/66/base      -> origin/gh/kurtamohler/66/base
2025-12-04T09:33:41.5066059Z  * [new branch]              gh/kurtamohler/66/head      -> origin/gh/kurtamohler/66/head
2025-12-04T09:33:41.5067543Z  * [new branch]              gh/kurtamohler/66/orig      -> origin/gh/kurtamohler/66/orig
2025-12-04T09:33:41.5069454Z  * [new branch]              gh/kurtamohler/67/base      -> origin/gh/kurtamohler/67/base
2025-12-04T09:33:41.5070897Z  * [new branch]              gh/kurtamohler/67/head      -> origin/gh/kurtamohler/67/head
2025-12-04T09:33:41.5075494Z  * [new branch]              gh/kurtamohler/67/orig      -> origin/gh/kurtamohler/67/orig
2025-12-04T09:33:41.5077858Z  * [new branch]              gh/kwen2501/130/base        -> origin/gh/kwen2501/130/base
2025-12-04T09:33:41.5079535Z  * [new branch]              gh/kwen2501/130/head        -> origin/gh/kwen2501/130/head
2025-12-04T09:33:41.5081178Z  * [new branch]              gh/kwen2501/130/orig        -> origin/gh/kwen2501/130/orig
2025-12-04T09:33:41.5083299Z  * [new branch]              gh/kwen2501/170/base        -> origin/gh/kwen2501/170/base
2025-12-04T09:33:41.5084770Z  * [new branch]              gh/kwen2501/170/head        -> origin/gh/kwen2501/170/head
2025-12-04T09:33:41.5086890Z  * [new branch]              gh/kwen2501/187/base        -> origin/gh/kwen2501/187/base
2025-12-04T09:33:41.5088459Z  * [new branch]              gh/kwen2501/187/head        -> origin/gh/kwen2501/187/head
2025-12-04T09:33:41.5090023Z  * [new branch]              gh/kwen2501/187/orig        -> origin/gh/kwen2501/187/orig
2025-12-04T09:33:41.5092494Z  * [new branch]              gh/kwen2501/188/base        -> origin/gh/kwen2501/188/base
2025-12-04T09:33:41.5094014Z  * [new branch]              gh/kwen2501/188/head        -> origin/gh/kwen2501/188/head
2025-12-04T09:33:41.5095587Z  * [new branch]              gh/kwen2501/188/orig        -> origin/gh/kwen2501/188/orig
2025-12-04T09:33:41.5097670Z  * [new branch]              gh/kwen2501/211/base        -> origin/gh/kwen2501/211/base
2025-12-04T09:33:41.5099163Z  * [new branch]              gh/kwen2501/211/head        -> origin/gh/kwen2501/211/head
2025-12-04T09:33:41.5101454Z  * [new branch]              gh/kwen2501/224/base        -> origin/gh/kwen2501/224/base
2025-12-04T09:33:41.5102927Z  * [new branch]              gh/kwen2501/224/head        -> origin/gh/kwen2501/224/head
2025-12-04T09:33:41.5105035Z  * [new branch]              gh/kwen2501/224/orig        -> origin/gh/kwen2501/224/orig
2025-12-04T09:33:41.5107641Z  * [new branch]              gh/kwen2501/228/base        -> origin/gh/kwen2501/228/base
2025-12-04T09:33:41.5109156Z  * [new branch]              gh/kwen2501/228/head        -> origin/gh/kwen2501/228/head
2025-12-04T09:33:41.5110637Z  * [new branch]              gh/kwen2501/228/orig        -> origin/gh/kwen2501/228/orig
2025-12-04T09:33:41.5112791Z  * [new branch]              gh/kwen2501/234/base        -> origin/gh/kwen2501/234/base
2025-12-04T09:33:41.5114290Z  * [new branch]              gh/kwen2501/234/head        -> origin/gh/kwen2501/234/head
2025-12-04T09:33:41.5115756Z  * [new branch]              gh/kwen2501/234/orig        -> origin/gh/kwen2501/234/orig
2025-12-04T09:33:41.5117684Z  * [new branch]              gh/kwen2501/235/base        -> origin/gh/kwen2501/235/base
2025-12-04T09:33:41.5119269Z  * [new branch]              gh/kwen2501/235/head        -> origin/gh/kwen2501/235/head
2025-12-04T09:33:41.5120743Z  * [new branch]              gh/kwen2501/235/orig        -> origin/gh/kwen2501/235/orig
2025-12-04T09:33:41.5122603Z  * [new branch]              gh/kwen2501/236/base        -> origin/gh/kwen2501/236/base
2025-12-04T09:33:41.5124132Z  * [new branch]              gh/kwen2501/236/head        -> origin/gh/kwen2501/236/head
2025-12-04T09:33:41.5125730Z  * [new branch]              gh/kwen2501/236/orig        -> origin/gh/kwen2501/236/orig
2025-12-04T09:33:41.5127700Z  * [new branch]              gh/kwen2501/237/base        -> origin/gh/kwen2501/237/base
2025-12-04T09:33:41.5129121Z  * [new branch]              gh/kwen2501/237/head        -> origin/gh/kwen2501/237/head
2025-12-04T09:33:41.5130628Z  * [new branch]              gh/kwen2501/237/orig        -> origin/gh/kwen2501/237/orig
2025-12-04T09:33:41.5133229Z  * [new branch]              gh/kwen2501/238/base        -> origin/gh/kwen2501/238/base
2025-12-04T09:33:41.5134125Z  * [new branch]              gh/kwen2501/238/head        -> origin/gh/kwen2501/238/head
2025-12-04T09:33:41.5135840Z  * [new branch]              gh/kwen2501/238/orig        -> origin/gh/kwen2501/238/orig
2025-12-04T09:33:41.5138110Z  * [new branch]              gh/kwen2501/240/base        -> origin/gh/kwen2501/240/base
2025-12-04T09:33:41.5139271Z  * [new branch]              gh/kwen2501/240/head        -> origin/gh/kwen2501/240/head
2025-12-04T09:33:41.5140918Z  * [new branch]              gh/kwen2501/240/orig        -> origin/gh/kwen2501/240/orig
2025-12-04T09:33:41.5142922Z  * [new branch]              gh/kwen2501/241/base        -> origin/gh/kwen2501/241/base
2025-12-04T09:33:41.5144260Z  * [new branch]              gh/kwen2501/241/head        -> origin/gh/kwen2501/241/head
2025-12-04T09:33:41.5145994Z  * [new branch]              gh/kwen2501/241/orig        -> origin/gh/kwen2501/241/orig
2025-12-04T09:33:41.5148168Z  * [new branch]              gh/kwen2501/247/base        -> origin/gh/kwen2501/247/base
2025-12-04T09:33:41.5149446Z  * [new branch]              gh/kwen2501/247/head        -> origin/gh/kwen2501/247/head
2025-12-04T09:33:41.5151070Z  * [new branch]              gh/kwen2501/247/orig        -> origin/gh/kwen2501/247/orig
2025-12-04T09:33:41.5153257Z  * [new branch]              gh/kwen2501/252/base        -> origin/gh/kwen2501/252/base
2025-12-04T09:33:41.5154091Z  * [new branch]              gh/kwen2501/252/head        -> origin/gh/kwen2501/252/head
2025-12-04T09:33:41.5156288Z  * [new branch]              gh/kwen2501/252/orig        -> origin/gh/kwen2501/252/orig
2025-12-04T09:33:41.5158855Z  * [new branch]              gh/kwen2501/259/base        -> origin/gh/kwen2501/259/base
2025-12-04T09:33:41.5160492Z  * [new branch]              gh/kwen2501/259/head        -> origin/gh/kwen2501/259/head
2025-12-04T09:33:41.5162042Z  * [new branch]              gh/kwen2501/259/orig        -> origin/gh/kwen2501/259/orig
2025-12-04T09:33:41.5164128Z  * [new branch]              gh/kwen2501/260/base        -> origin/gh/kwen2501/260/base
2025-12-04T09:33:41.5165750Z  * [new branch]              gh/kwen2501/260/head        -> origin/gh/kwen2501/260/head
2025-12-04T09:33:41.5167439Z  * [new branch]              gh/kwen2501/260/orig        -> origin/gh/kwen2501/260/orig
2025-12-04T09:33:41.5169461Z  * [new branch]              gh/kwen2501/268/base        -> origin/gh/kwen2501/268/base
2025-12-04T09:33:41.5171111Z  * [new branch]              gh/kwen2501/268/head        -> origin/gh/kwen2501/268/head
2025-12-04T09:33:41.5172714Z  * [new branch]              gh/kwen2501/268/orig        -> origin/gh/kwen2501/268/orig
2025-12-04T09:33:41.5174717Z  * [new branch]              gh/kwen2501/269/base        -> origin/gh/kwen2501/269/base
2025-12-04T09:33:41.5176453Z  * [new branch]              gh/kwen2501/269/head        -> origin/gh/kwen2501/269/head
2025-12-04T09:33:41.5177987Z  * [new branch]              gh/kwen2501/269/orig        -> origin/gh/kwen2501/269/orig
2025-12-04T09:33:41.5180157Z  * [new branch]              gh/kwen2501/270/base        -> origin/gh/kwen2501/270/base
2025-12-04T09:33:41.5181819Z  * [new branch]              gh/kwen2501/270/head        -> origin/gh/kwen2501/270/head
2025-12-04T09:33:41.5183242Z  * [new branch]              gh/kwen2501/270/orig        -> origin/gh/kwen2501/270/orig
2025-12-04T09:33:41.5185879Z  * [new branch]              gh/kwen2501/271/base        -> origin/gh/kwen2501/271/base
2025-12-04T09:33:41.5187439Z  * [new branch]              gh/kwen2501/271/head        -> origin/gh/kwen2501/271/head
2025-12-04T09:33:41.5189164Z  * [new branch]              gh/kwen2501/271/orig        -> origin/gh/kwen2501/271/orig
2025-12-04T09:33:41.5191255Z  * [new branch]              gh/kwen2501/274/base        -> origin/gh/kwen2501/274/base
2025-12-04T09:33:41.5192926Z  * [new branch]              gh/kwen2501/274/head        -> origin/gh/kwen2501/274/head
2025-12-04T09:33:41.5194444Z  * [new branch]              gh/kwen2501/274/orig        -> origin/gh/kwen2501/274/orig
2025-12-04T09:33:41.5196602Z  * [new branch]              gh/kwen2501/275/base        -> origin/gh/kwen2501/275/base
2025-12-04T09:33:41.5198284Z  * [new branch]              gh/kwen2501/275/head        -> origin/gh/kwen2501/275/head
2025-12-04T09:33:41.5200063Z  * [new branch]              gh/kwen2501/275/orig        -> origin/gh/kwen2501/275/orig
2025-12-04T09:33:41.5201916Z  * [new branch]              gh/kwen2501/276/base        -> origin/gh/kwen2501/276/base
2025-12-04T09:33:41.5203400Z  * [new branch]              gh/kwen2501/276/head        -> origin/gh/kwen2501/276/head
2025-12-04T09:33:41.5204854Z  * [new branch]              gh/kwen2501/276/orig        -> origin/gh/kwen2501/276/orig
2025-12-04T09:33:41.5206923Z  * [new branch]              gh/kwen2501/277/base        -> origin/gh/kwen2501/277/base
2025-12-04T09:33:41.5208369Z  * [new branch]              gh/kwen2501/277/head        -> origin/gh/kwen2501/277/head
2025-12-04T09:33:41.5209930Z  * [new branch]              gh/kwen2501/277/orig        -> origin/gh/kwen2501/277/orig
2025-12-04T09:33:41.5211930Z  * [new branch]              gh/kwen2501/278/base        -> origin/gh/kwen2501/278/base
2025-12-04T09:33:41.5213416Z  * [new branch]              gh/kwen2501/278/head        -> origin/gh/kwen2501/278/head
2025-12-04T09:33:41.5214909Z  * [new branch]              gh/kwen2501/278/orig        -> origin/gh/kwen2501/278/orig
2025-12-04T09:33:41.5217127Z  * [new branch]              gh/kwen2501/279/base        -> origin/gh/kwen2501/279/base
2025-12-04T09:33:41.5218798Z  * [new branch]              gh/kwen2501/279/head        -> origin/gh/kwen2501/279/head
2025-12-04T09:33:41.5220317Z  * [new branch]              gh/kwen2501/279/orig        -> origin/gh/kwen2501/279/orig
2025-12-04T09:33:41.5222433Z  * [new branch]              gh/kwen2501/280/base        -> origin/gh/kwen2501/280/base
2025-12-04T09:33:41.5223945Z  * [new branch]              gh/kwen2501/280/head        -> origin/gh/kwen2501/280/head
2025-12-04T09:33:41.5226037Z  * [new branch]              gh/kwen2501/280/orig        -> origin/gh/kwen2501/280/orig
2025-12-04T09:33:41.5228131Z  * [new branch]              gh/kwen2501/281/base        -> origin/gh/kwen2501/281/base
2025-12-04T09:33:41.5229634Z  * [new branch]              gh/kwen2501/281/head        -> origin/gh/kwen2501/281/head
2025-12-04T09:33:41.5231199Z  * [new branch]              gh/kwen2501/281/orig        -> origin/gh/kwen2501/281/orig
2025-12-04T09:33:41.5233216Z  * [new branch]              gh/kwen2501/282/base        -> origin/gh/kwen2501/282/base
2025-12-04T09:33:41.5234850Z  * [new branch]              gh/kwen2501/282/head        -> origin/gh/kwen2501/282/head
2025-12-04T09:33:41.5236304Z  * [new branch]              gh/kwen2501/282/orig        -> origin/gh/kwen2501/282/orig
2025-12-04T09:33:41.5238314Z  * [new branch]              gh/kwen2501/283/base        -> origin/gh/kwen2501/283/base
2025-12-04T09:33:41.5239913Z  * [new branch]              gh/kwen2501/283/head        -> origin/gh/kwen2501/283/head
2025-12-04T09:33:41.5241433Z  * [new branch]              gh/kwen2501/283/orig        -> origin/gh/kwen2501/283/orig
2025-12-04T09:33:41.5243497Z  * [new branch]              gh/kwen2501/284/base        -> origin/gh/kwen2501/284/base
2025-12-04T09:33:41.5245044Z  * [new branch]              gh/kwen2501/284/head        -> origin/gh/kwen2501/284/head
2025-12-04T09:33:41.5246597Z  * [new branch]              gh/kwen2501/284/orig        -> origin/gh/kwen2501/284/orig
2025-12-04T09:33:41.5248759Z  * [new branch]              gh/kwen2501/285/base        -> origin/gh/kwen2501/285/base
2025-12-04T09:33:41.5250254Z  * [new branch]              gh/kwen2501/285/head        -> origin/gh/kwen2501/285/head
2025-12-04T09:33:41.5251902Z  * [new branch]              gh/kwen2501/285/orig        -> origin/gh/kwen2501/285/orig
2025-12-04T09:33:41.5253902Z  * [new branch]              gh/kwen2501/286/base        -> origin/gh/kwen2501/286/base
2025-12-04T09:33:41.5255522Z  * [new branch]              gh/kwen2501/286/head        -> origin/gh/kwen2501/286/head
2025-12-04T09:33:41.5257077Z  * [new branch]              gh/kwen2501/286/orig        -> origin/gh/kwen2501/286/orig
2025-12-04T09:33:41.5258967Z  * [new branch]              gh/kwen2501/287/base        -> origin/gh/kwen2501/287/base
2025-12-04T09:33:41.5260552Z  * [new branch]              gh/kwen2501/287/head        -> origin/gh/kwen2501/287/head
2025-12-04T09:33:41.5261927Z  * [new branch]              gh/kwen2501/287/orig        -> origin/gh/kwen2501/287/orig
2025-12-04T09:33:41.5264015Z  * [new branch]              gh/kwen2501/288/base        -> origin/gh/kwen2501/288/base
2025-12-04T09:33:41.5265606Z  * [new branch]              gh/kwen2501/288/head        -> origin/gh/kwen2501/288/head
2025-12-04T09:33:41.5267162Z  * [new branch]              gh/kwen2501/288/orig        -> origin/gh/kwen2501/288/orig
2025-12-04T09:33:41.5269539Z  * [new branch]              gh/laithsakka/251/base      -> origin/gh/laithsakka/251/base
2025-12-04T09:33:41.5271368Z  * [new branch]              gh/laithsakka/251/head      -> origin/gh/laithsakka/251/head
2025-12-04T09:33:41.5272878Z  * [new branch]              gh/laithsakka/251/orig      -> origin/gh/laithsakka/251/orig
2025-12-04T09:33:41.5274854Z  * [new branch]              gh/laithsakka/276/base      -> origin/gh/laithsakka/276/base
2025-12-04T09:33:41.5276328Z  * [new branch]              gh/laithsakka/276/head      -> origin/gh/laithsakka/276/head
2025-12-04T09:33:41.5277802Z  * [new branch]              gh/laithsakka/276/orig      -> origin/gh/laithsakka/276/orig
2025-12-04T09:33:41.5279859Z  * [new branch]              gh/laithsakka/28/base       -> origin/gh/laithsakka/28/base
2025-12-04T09:33:41.5281624Z  * [new branch]              gh/laithsakka/29/base       -> origin/gh/laithsakka/29/base
2025-12-04T09:33:41.5283858Z  * [new branch]              gh/laithsakka/30/base       -> origin/gh/laithsakka/30/base
2025-12-04T09:33:41.5285483Z  * [new branch]              gh/laithsakka/30/head       -> origin/gh/laithsakka/30/head
2025-12-04T09:33:41.5287238Z  * [new branch]              gh/laithsakka/31/base       -> origin/gh/laithsakka/31/base
2025-12-04T09:33:41.5288646Z  * [new branch]              gh/laithsakka/31/head       -> origin/gh/laithsakka/31/head
2025-12-04T09:33:41.5290949Z  * [new branch]              gh/laithsakka/313/base      -> origin/gh/laithsakka/313/base
2025-12-04T09:33:41.5292434Z  * [new branch]              gh/laithsakka/313/head      -> origin/gh/laithsakka/313/head
2025-12-04T09:33:41.5293873Z  * [new branch]              gh/laithsakka/313/orig      -> origin/gh/laithsakka/313/orig
2025-12-04T09:33:41.5296218Z  * [new branch]              gh/laithsakka/316/base      -> origin/gh/laithsakka/316/base
2025-12-04T09:33:41.5297811Z  * [new branch]              gh/laithsakka/316/head      -> origin/gh/laithsakka/316/head
2025-12-04T09:33:41.5299317Z  * [new branch]              gh/laithsakka/316/orig      -> origin/gh/laithsakka/316/orig
2025-12-04T09:33:41.5301549Z  * [new branch]              gh/laithsakka/317/base      -> origin/gh/laithsakka/317/base
2025-12-04T09:33:41.5302940Z  * [new branch]              gh/laithsakka/317/head      -> origin/gh/laithsakka/317/head
2025-12-04T09:33:41.5304411Z  * [new branch]              gh/laithsakka/317/orig      -> origin/gh/laithsakka/317/orig
2025-12-04T09:33:41.5306426Z  * [new branch]              gh/laithsakka/319/base      -> origin/gh/laithsakka/319/base
2025-12-04T09:33:41.5307954Z  * [new branch]              gh/laithsakka/319/head      -> origin/gh/laithsakka/319/head
2025-12-04T09:33:41.5309432Z  * [new branch]              gh/laithsakka/319/orig      -> origin/gh/laithsakka/319/orig
2025-12-04T09:33:41.5311895Z  * [new branch]              gh/laithsakka/32/base       -> origin/gh/laithsakka/32/base
2025-12-04T09:33:41.5313309Z  * [new branch]              gh/laithsakka/32/head       -> origin/gh/laithsakka/32/head
2025-12-04T09:33:41.5315434Z  * [new branch]              gh/laithsakka/320/base      -> origin/gh/laithsakka/320/base
2025-12-04T09:33:41.5316849Z  * [new branch]              gh/laithsakka/320/head      -> origin/gh/laithsakka/320/head
2025-12-04T09:33:41.5318284Z  * [new branch]              gh/laithsakka/320/orig      -> origin/gh/laithsakka/320/orig
2025-12-04T09:33:41.5320203Z  * [new branch]              gh/laithsakka/321/base      -> origin/gh/laithsakka/321/base
2025-12-04T09:33:41.5321852Z  * [new branch]              gh/laithsakka/321/head      -> origin/gh/laithsakka/321/head
2025-12-04T09:33:41.5323231Z  * [new branch]              gh/laithsakka/321/orig      -> origin/gh/laithsakka/321/orig
2025-12-04T09:33:41.5325471Z  * [new branch]              gh/laithsakka/322/base      -> origin/gh/laithsakka/322/base
2025-12-04T09:33:41.5326999Z  * [new branch]              gh/laithsakka/322/head      -> origin/gh/laithsakka/322/head
2025-12-04T09:33:41.5328536Z  * [new branch]              gh/laithsakka/322/orig      -> origin/gh/laithsakka/322/orig
2025-12-04T09:33:41.5330747Z  * [new branch]              gh/laithsakka/323/base      -> origin/gh/laithsakka/323/base
2025-12-04T09:33:41.5332928Z  * [new branch]              gh/laithsakka/323/head      -> origin/gh/laithsakka/323/head
2025-12-04T09:33:41.5334541Z  * [new branch]              gh/laithsakka/323/orig      -> origin/gh/laithsakka/323/orig
2025-12-04T09:33:41.5336689Z  * [new branch]              gh/laithsakka/324/base      -> origin/gh/laithsakka/324/base
2025-12-04T09:33:41.5338660Z  * [new branch]              gh/laithsakka/324/head      -> origin/gh/laithsakka/324/head
2025-12-04T09:33:41.5340067Z  * [new branch]              gh/laithsakka/324/orig      -> origin/gh/laithsakka/324/orig
2025-12-04T09:33:41.5342124Z  * [new branch]              gh/laithsakka/325/base      -> origin/gh/laithsakka/325/base
2025-12-04T09:33:41.5343598Z  * [new branch]              gh/laithsakka/325/head      -> origin/gh/laithsakka/325/head
2025-12-04T09:33:41.5345080Z  * [new branch]              gh/laithsakka/325/orig      -> origin/gh/laithsakka/325/orig
2025-12-04T09:33:41.5347547Z  * [new branch]              gh/laithsakka/326/base      -> origin/gh/laithsakka/326/base
2025-12-04T09:33:41.5349014Z  * [new branch]              gh/laithsakka/326/head      -> origin/gh/laithsakka/326/head
2025-12-04T09:33:41.5350544Z  * [new branch]              gh/laithsakka/326/orig      -> origin/gh/laithsakka/326/orig
2025-12-04T09:33:41.5352742Z  * [new branch]              gh/laithsakka/327/base      -> origin/gh/laithsakka/327/base
2025-12-04T09:33:41.5354352Z  * [new branch]              gh/laithsakka/327/head      -> origin/gh/laithsakka/327/head
2025-12-04T09:33:41.5355864Z  * [new branch]              gh/laithsakka/327/orig      -> origin/gh/laithsakka/327/orig
2025-12-04T09:33:41.5357894Z  * [new branch]              gh/laithsakka/328/base      -> origin/gh/laithsakka/328/base
2025-12-04T09:33:41.5359402Z  * [new branch]              gh/laithsakka/328/head      -> origin/gh/laithsakka/328/head
2025-12-04T09:33:41.5360957Z  * [new branch]              gh/laithsakka/328/orig      -> origin/gh/laithsakka/328/orig
2025-12-04T09:33:41.5363288Z  * [new branch]              gh/liangel/4/base           -> origin/gh/liangel/4/base
2025-12-04T09:33:41.5364785Z  * [new branch]              gh/liangel/4/head           -> origin/gh/liangel/4/head
2025-12-04T09:33:41.5366266Z  * [new branch]              gh/liangel/4/orig           -> origin/gh/liangel/4/orig
2025-12-04T09:33:41.5370804Z  * [new branch]              gh/lucaskabela/1/base       -> origin/gh/lucaskabela/1/base
2025-12-04T09:33:41.5375889Z  * [new branch]              gh/lucaskabela/1/head       -> origin/gh/lucaskabela/1/head
2025-12-04T09:33:41.5378467Z  * [new branch]              gh/lw/4/base                -> origin/gh/lw/4/base
2025-12-04T09:33:41.5379943Z  * [new branch]              gh/lw/4/head                -> origin/gh/lw/4/head
2025-12-04T09:33:41.5381453Z  * [new branch]              gh/lw/4/orig                -> origin/gh/lw/4/orig
2025-12-04T09:33:41.5383354Z  * [new branch]              gh/lw/5/base                -> origin/gh/lw/5/base
2025-12-04T09:33:41.5384852Z  * [new branch]              gh/lw/5/head                -> origin/gh/lw/5/head
2025-12-04T09:33:41.5386290Z  * [new branch]              gh/lw/5/orig                -> origin/gh/lw/5/orig
2025-12-04T09:33:41.5388243Z  * [new branch]              gh/lw/6/base                -> origin/gh/lw/6/base
2025-12-04T09:33:41.5389875Z  * [new branch]              gh/lw/6/head                -> origin/gh/lw/6/head
2025-12-04T09:33:41.5391198Z  * [new branch]              gh/lw/6/orig                -> origin/gh/lw/6/orig
2025-12-04T09:33:41.5393534Z  * [new branch]              gh/malfet/14/base           -> origin/gh/malfet/14/base
2025-12-04T09:33:41.5395508Z  * [new branch]              gh/malfet/417/base          -> origin/gh/malfet/417/base
2025-12-04T09:33:41.5396978Z  * [new branch]              gh/malfet/417/head          -> origin/gh/malfet/417/head
2025-12-04T09:33:41.5398451Z  * [new branch]              gh/malfet/417/orig          -> origin/gh/malfet/417/orig
2025-12-04T09:33:41.5400367Z  * [new branch]              gh/malfet/506/base          -> origin/gh/malfet/506/base
2025-12-04T09:33:41.5401977Z  * [new branch]              gh/malfet/506/head          -> origin/gh/malfet/506/head
2025-12-04T09:33:41.5403424Z  * [new branch]              gh/malfet/506/orig          -> origin/gh/malfet/506/orig
2025-12-04T09:33:41.5405381Z  * [new branch]              gh/malfet/517/base          -> origin/gh/malfet/517/base
2025-12-04T09:33:41.5406902Z  * [new branch]              gh/malfet/517/head          -> origin/gh/malfet/517/head
2025-12-04T09:33:41.5408855Z  * [new branch]              gh/malfet/528/base          -> origin/gh/malfet/528/base
2025-12-04T09:33:41.5410337Z  * [new branch]              gh/malfet/528/head          -> origin/gh/malfet/528/head
2025-12-04T09:33:41.5411815Z  * [new branch]              gh/malfet/528/orig          -> origin/gh/malfet/528/orig
2025-12-04T09:33:41.5413818Z  * [new branch]              gh/malfet/537/base          -> origin/gh/malfet/537/base
2025-12-04T09:33:41.5415241Z  * [new branch]              gh/malfet/537/head          -> origin/gh/malfet/537/head
2025-12-04T09:33:41.5416840Z  * [new branch]              gh/malfet/537/orig          -> origin/gh/malfet/537/orig
2025-12-04T09:33:41.5418789Z  * [new branch]              gh/malfet/546/base          -> origin/gh/malfet/546/base
2025-12-04T09:33:41.5420413Z  * [new branch]              gh/malfet/546/head          -> origin/gh/malfet/546/head
2025-12-04T09:33:41.5421754Z  * [new branch]              gh/malfet/546/orig          -> origin/gh/malfet/546/orig
2025-12-04T09:33:41.5424081Z  * [new branch]              gh/malfet/565/base          -> origin/gh/malfet/565/base
2025-12-04T09:33:41.5425631Z  * [new branch]              gh/malfet/565/head          -> origin/gh/malfet/565/head
2025-12-04T09:33:41.5427110Z  * [new branch]              gh/malfet/565/orig          -> origin/gh/malfet/565/orig
2025-12-04T09:33:41.5429131Z  * [new branch]              gh/malfet/575/base          -> origin/gh/malfet/575/base
2025-12-04T09:33:41.5430576Z  * [new branch]              gh/malfet/575/head          -> origin/gh/malfet/575/head
2025-12-04T09:33:41.5432083Z  * [new branch]              gh/malfet/575/orig          -> origin/gh/malfet/575/orig
2025-12-04T09:33:41.5434132Z  * [new branch]              gh/malfet/580/base          -> origin/gh/malfet/580/base
2025-12-04T09:33:41.5435613Z  * [new branch]              gh/malfet/580/head          -> origin/gh/malfet/580/head
2025-12-04T09:33:41.5437115Z  * [new branch]              gh/malfet/580/orig          -> origin/gh/malfet/580/orig
2025-12-04T09:33:41.5439031Z  * [new branch]              gh/malfet/581/base          -> origin/gh/malfet/581/base
2025-12-04T09:33:41.5440499Z  * [new branch]              gh/malfet/581/head          -> origin/gh/malfet/581/head
2025-12-04T09:33:41.5442034Z  * [new branch]              gh/malfet/581/orig          -> origin/gh/malfet/581/orig
2025-12-04T09:33:41.5443904Z  * [new branch]              gh/malfet/583/base          -> origin/gh/malfet/583/base
2025-12-04T09:33:41.5445408Z  * [new branch]              gh/malfet/583/head          -> origin/gh/malfet/583/head
2025-12-04T09:33:41.5446830Z  * [new branch]              gh/malfet/583/orig          -> origin/gh/malfet/583/orig
2025-12-04T09:33:41.5448712Z  * [new branch]              gh/malfet/586/base          -> origin/gh/malfet/586/base
2025-12-04T09:33:41.5450251Z  * [new branch]              gh/malfet/586/head          -> origin/gh/malfet/586/head
2025-12-04T09:33:41.5451708Z  * [new branch]              gh/malfet/586/orig          -> origin/gh/malfet/586/orig
2025-12-04T09:33:41.5453714Z  * [new branch]              gh/malfet/587/base          -> origin/gh/malfet/587/base
2025-12-04T09:33:41.5455142Z  * [new branch]              gh/malfet/587/head          -> origin/gh/malfet/587/head
2025-12-04T09:33:41.5456691Z  * [new branch]              gh/malfet/587/orig          -> origin/gh/malfet/587/orig
2025-12-04T09:33:41.5458683Z  * [new branch]              gh/malfet/588/base          -> origin/gh/malfet/588/base
2025-12-04T09:33:41.5460159Z  * [new branch]              gh/malfet/588/head          -> origin/gh/malfet/588/head
2025-12-04T09:33:41.5461796Z  * [new branch]              gh/malfet/588/orig          -> origin/gh/malfet/588/orig
2025-12-04T09:33:41.5463779Z  * [new branch]              gh/malfet/589/base          -> origin/gh/malfet/589/base
2025-12-04T09:33:41.5465239Z  * [new branch]              gh/malfet/589/head          -> origin/gh/malfet/589/head
2025-12-04T09:33:41.5467314Z  * [new branch]              gh/malfet/589/orig          -> origin/gh/malfet/589/orig
2025-12-04T09:33:41.5469147Z  * [new branch]              gh/malfet/590/base          -> origin/gh/malfet/590/base
2025-12-04T09:33:41.5470633Z  * [new branch]              gh/malfet/590/head          -> origin/gh/malfet/590/head
2025-12-04T09:33:41.5472326Z  * [new branch]              gh/malfet/590/orig          -> origin/gh/malfet/590/orig
2025-12-04T09:33:41.5474844Z  * [new branch]              gh/malfet/591/base          -> origin/gh/malfet/591/base
2025-12-04T09:33:41.5476297Z  * [new branch]              gh/malfet/591/head          -> origin/gh/malfet/591/head
2025-12-04T09:33:41.5477834Z  * [new branch]              gh/malfet/591/orig          -> origin/gh/malfet/591/orig
2025-12-04T09:33:41.5479840Z  * [new branch]              gh/malfet/592/base          -> origin/gh/malfet/592/base
2025-12-04T09:33:41.5481345Z  * [new branch]              gh/malfet/592/head          -> origin/gh/malfet/592/head
2025-12-04T09:33:41.5482842Z  * [new branch]              gh/malfet/592/orig          -> origin/gh/malfet/592/orig
2025-12-04T09:33:41.5484833Z  * [new branch]              gh/malfet/593/base          -> origin/gh/malfet/593/base
2025-12-04T09:33:41.5486266Z  * [new branch]              gh/malfet/593/head          -> origin/gh/malfet/593/head
2025-12-04T09:33:41.5487814Z  * [new branch]              gh/malfet/593/orig          -> origin/gh/malfet/593/orig
2025-12-04T09:33:41.5489793Z  * [new branch]              gh/malfet/594/base          -> origin/gh/malfet/594/base
2025-12-04T09:33:41.5491249Z  * [new branch]              gh/malfet/594/head          -> origin/gh/malfet/594/head
2025-12-04T09:33:41.5492758Z  * [new branch]              gh/malfet/594/orig          -> origin/gh/malfet/594/orig
2025-12-04T09:33:41.5494700Z  * [new branch]              gh/malfet/595/base          -> origin/gh/malfet/595/base
2025-12-04T09:33:41.5496219Z  * [new branch]              gh/malfet/595/head          -> origin/gh/malfet/595/head
2025-12-04T09:33:41.5497850Z  * [new branch]              gh/malfet/595/orig          -> origin/gh/malfet/595/orig
2025-12-04T09:33:41.5499762Z  * [new branch]              gh/malfet/596/base          -> origin/gh/malfet/596/base
2025-12-04T09:33:41.5501277Z  * [new branch]              gh/malfet/596/head          -> origin/gh/malfet/596/head
2025-12-04T09:33:41.5503254Z  * [new branch]              gh/malfet/596/orig          -> origin/gh/malfet/596/orig
2025-12-04T09:33:41.5505286Z  * [new branch]              gh/malfet/597/base          -> origin/gh/malfet/597/base
2025-12-04T09:33:41.5506734Z  * [new branch]              gh/malfet/597/head          -> origin/gh/malfet/597/head
2025-12-04T09:33:41.5508209Z  * [new branch]              gh/malfet/597/orig          -> origin/gh/malfet/597/orig
2025-12-04T09:33:41.5510159Z  * [new branch]              gh/malfet/598/base          -> origin/gh/malfet/598/base
2025-12-04T09:33:41.5511738Z  * [new branch]              gh/malfet/598/head          -> origin/gh/malfet/598/head
2025-12-04T09:33:41.5513140Z  * [new branch]              gh/malfet/598/orig          -> origin/gh/malfet/598/orig
2025-12-04T09:33:41.5515243Z  * [new branch]              gh/malfet/599/base          -> origin/gh/malfet/599/base
2025-12-04T09:33:41.5516698Z  * [new branch]              gh/malfet/599/head          -> origin/gh/malfet/599/head
2025-12-04T09:33:41.5518197Z  * [new branch]              gh/malfet/599/orig          -> origin/gh/malfet/599/orig
2025-12-04T09:33:41.5520134Z  * [new branch]              gh/malfet/600/base          -> origin/gh/malfet/600/base
2025-12-04T09:33:41.5521576Z  * [new branch]              gh/malfet/600/head          -> origin/gh/malfet/600/head
2025-12-04T09:33:41.5523021Z  * [new branch]              gh/malfet/600/orig          -> origin/gh/malfet/600/orig
2025-12-04T09:33:41.5525744Z  * [new branch]              gh/malfet/601/base          -> origin/gh/malfet/601/base
2025-12-04T09:33:41.5527257Z  * [new branch]              gh/malfet/601/head          -> origin/gh/malfet/601/head
2025-12-04T09:33:41.5528760Z  * [new branch]              gh/malfet/601/orig          -> origin/gh/malfet/601/orig
2025-12-04T09:33:41.5530865Z  * [new branch]              gh/malfet/602/base          -> origin/gh/malfet/602/base
2025-12-04T09:33:41.5532351Z  * [new branch]              gh/malfet/602/head          -> origin/gh/malfet/602/head
2025-12-04T09:33:41.5533816Z  * [new branch]              gh/malfet/602/orig          -> origin/gh/malfet/602/orig
2025-12-04T09:33:41.5535833Z  * [new branch]              gh/malfet/603/base          -> origin/gh/malfet/603/base
2025-12-04T09:33:41.5537346Z  * [new branch]              gh/malfet/603/head          -> origin/gh/malfet/603/head
2025-12-04T09:33:41.5538899Z  * [new branch]              gh/malfet/603/orig          -> origin/gh/malfet/603/orig
2025-12-04T09:33:41.5540889Z  * [new branch]              gh/malfet/604/base          -> origin/gh/malfet/604/base
2025-12-04T09:33:41.5542313Z  * [new branch]              gh/malfet/604/head          -> origin/gh/malfet/604/head
2025-12-04T09:33:41.5544260Z  * [new branch]              gh/malfet/604/orig          -> origin/gh/malfet/604/orig
2025-12-04T09:33:41.5546316Z  * [new branch]              gh/malfet/605/base          -> origin/gh/malfet/605/base
2025-12-04T09:33:41.5547790Z  * [new branch]              gh/malfet/605/head          -> origin/gh/malfet/605/head
2025-12-04T09:33:41.5549417Z  * [new branch]              gh/malfet/605/orig          -> origin/gh/malfet/605/orig
2025-12-04T09:33:41.5551441Z  * [new branch]              gh/malfet/606/base          -> origin/gh/malfet/606/base
2025-12-04T09:33:41.5553157Z  * [new branch]              gh/malfet/606/head          -> origin/gh/malfet/606/head
2025-12-04T09:33:41.5554630Z  * [new branch]              gh/malfet/606/orig          -> origin/gh/malfet/606/orig
2025-12-04T09:33:41.5556709Z  * [new branch]              gh/malfet/607/base          -> origin/gh/malfet/607/base
2025-12-04T09:33:41.5558181Z  * [new branch]              gh/malfet/607/head          -> origin/gh/malfet/607/head
2025-12-04T09:33:41.5559774Z  * [new branch]              gh/malfet/607/orig          -> origin/gh/malfet/607/orig
2025-12-04T09:33:41.5561806Z  * [new branch]              gh/malfet/608/base          -> origin/gh/malfet/608/base
2025-12-04T09:33:41.5563264Z  * [new branch]              gh/malfet/608/head          -> origin/gh/malfet/608/head
2025-12-04T09:33:41.5564790Z  * [new branch]              gh/malfet/608/orig          -> origin/gh/malfet/608/orig
2025-12-04T09:33:41.5566792Z  * [new branch]              gh/malfet/609/base          -> origin/gh/malfet/609/base
2025-12-04T09:33:41.5568240Z  * [new branch]              gh/malfet/609/head          -> origin/gh/malfet/609/head
2025-12-04T09:33:41.5569765Z  * [new branch]              gh/malfet/609/orig          -> origin/gh/malfet/609/orig
2025-12-04T09:33:41.5572127Z  * [new branch]              gh/malfet/610/base          -> origin/gh/malfet/610/base
2025-12-04T09:33:41.5573619Z  * [new branch]              gh/malfet/610/head          -> origin/gh/malfet/610/head
2025-12-04T09:33:41.5575200Z  * [new branch]              gh/malfet/610/orig          -> origin/gh/malfet/610/orig
2025-12-04T09:33:41.5577314Z  * [new branch]              gh/malfet/611/base          -> origin/gh/malfet/611/base
2025-12-04T09:33:41.5578726Z  * [new branch]              gh/malfet/611/head          -> origin/gh/malfet/611/head
2025-12-04T09:33:41.5580244Z  * [new branch]              gh/malfet/611/orig          -> origin/gh/malfet/611/orig
2025-12-04T09:33:41.5582065Z  * [new branch]              gh/malfet/612/base          -> origin/gh/malfet/612/base
2025-12-04T09:33:41.5583540Z  * [new branch]              gh/malfet/612/head          -> origin/gh/malfet/612/head
2025-12-04T09:33:41.5585142Z  * [new branch]              gh/malfet/612/orig          -> origin/gh/malfet/612/orig
2025-12-04T09:33:41.5587202Z  * [new branch]              gh/malfet/64/base           -> origin/gh/malfet/64/base
2025-12-04T09:33:41.5588647Z  * [new branch]              gh/malfet/64/head           -> origin/gh/malfet/64/head
2025-12-04T09:33:41.5591035Z  * [new branch]              gh/manuelcandales/11/base   -> origin/gh/manuelcandales/11/base
2025-12-04T09:33:41.5592492Z  * [new branch]              gh/manuelcandales/11/head   -> origin/gh/manuelcandales/11/head
2025-12-04T09:33:41.5593968Z  * [new branch]              gh/manuelcandales/11/orig   -> origin/gh/manuelcandales/11/orig
2025-12-04T09:33:41.5596739Z  * [new branch]              gh/markkm/1/base            -> origin/gh/markkm/1/base
2025-12-04T09:33:41.5599040Z  * [new branch]              gh/masnesral/1/base         -> origin/gh/masnesral/1/base
2025-12-04T09:33:41.5600567Z  * [new branch]              gh/masnesral/1/head         -> origin/gh/masnesral/1/head
2025-12-04T09:33:41.5602054Z  * [new branch]              gh/masnesral/1/orig         -> origin/gh/masnesral/1/orig
2025-12-04T09:33:41.5604394Z  * [new branch]              gh/mhorowitz/0/base         -> origin/gh/mhorowitz/0/base
2025-12-04T09:33:41.5605946Z  * [new branch]              gh/mhorowitz/0/head         -> origin/gh/mhorowitz/0/head
2025-12-04T09:33:41.5607696Z  * [new branch]              gh/mhorowitz/1/base         -> origin/gh/mhorowitz/1/base
2025-12-04T09:33:41.5609194Z  * [new branch]              gh/mhorowitz/1/head         -> origin/gh/mhorowitz/1/head
2025-12-04T09:33:41.5610961Z  * [new branch]              gh/mhorowitz/2/base         -> origin/gh/mhorowitz/2/base
2025-12-04T09:33:41.5612676Z  * [new branch]              gh/mhorowitz/2/head         -> origin/gh/mhorowitz/2/head
2025-12-04T09:33:41.5614416Z  * [new branch]              gh/mhorowitz/3/base         -> origin/gh/mhorowitz/3/base
2025-12-04T09:33:41.5615925Z  * [new branch]              gh/mhorowitz/3/head         -> origin/gh/mhorowitz/3/head
2025-12-04T09:33:41.5617748Z  * [new branch]              gh/mhorowitz/4/base         -> origin/gh/mhorowitz/4/base
2025-12-04T09:33:41.5619180Z  * [new branch]              gh/mhorowitz/4/head         -> origin/gh/mhorowitz/4/head
2025-12-04T09:33:41.5620878Z  * [new branch]              gh/mhorowitz/5/base         -> origin/gh/mhorowitz/5/base
2025-12-04T09:33:41.5622284Z  * [new branch]              gh/mhorowitz/5/head         -> origin/gh/mhorowitz/5/head
2025-12-04T09:33:41.5624035Z  * [new branch]              gh/mhorowitz/6/base         -> origin/gh/mhorowitz/6/base
2025-12-04T09:33:41.5625416Z  * [new branch]              gh/mhorowitz/6/head         -> origin/gh/mhorowitz/6/head
2025-12-04T09:33:41.5627912Z  * [new branch]              gh/mikaylagawarecki/234/base -> origin/gh/mikaylagawarecki/234/base
2025-12-04T09:33:41.5629561Z  * [new branch]              gh/mikaylagawarecki/234/head -> origin/gh/mikaylagawarecki/234/head
2025-12-04T09:33:41.5631384Z  * [new branch]              gh/mikaylagawarecki/235/base -> origin/gh/mikaylagawarecki/235/base
2025-12-04T09:33:41.5632898Z  * [new branch]              gh/mikaylagawarecki/235/head -> origin/gh/mikaylagawarecki/235/head
2025-12-04T09:33:41.5634657Z  * [new branch]              gh/mikaylagawarecki/236/base -> origin/gh/mikaylagawarecki/236/base
2025-12-04T09:33:41.5636070Z  * [new branch]              gh/mikaylagawarecki/236/head -> origin/gh/mikaylagawarecki/236/head
2025-12-04T09:33:41.5638257Z  * [new branch]              gh/mikaylagawarecki/237/base -> origin/gh/mikaylagawarecki/237/base
2025-12-04T09:33:41.5639612Z  * [new branch]              gh/mikaylagawarecki/237/head -> origin/gh/mikaylagawarecki/237/head
2025-12-04T09:33:41.5641604Z  * [new branch]              gh/mikaylagawarecki/238/base -> origin/gh/mikaylagawarecki/238/base
2025-12-04T09:33:41.5643145Z  * [new branch]              gh/mikaylagawarecki/238/head -> origin/gh/mikaylagawarecki/238/head
2025-12-04T09:33:41.5645117Z  * [new branch]              gh/mikaylagawarecki/336/base -> origin/gh/mikaylagawarecki/336/base
2025-12-04T09:33:41.5646607Z  * [new branch]              gh/mikaylagawarecki/336/head -> origin/gh/mikaylagawarecki/336/head
2025-12-04T09:33:41.5648185Z  * [new branch]              gh/mikaylagawarecki/336/orig -> origin/gh/mikaylagawarecki/336/orig
2025-12-04T09:33:41.5650490Z  * [new branch]              gh/mikaylagawarecki/341/base -> origin/gh/mikaylagawarecki/341/base
2025-12-04T09:33:41.5651912Z  * [new branch]              gh/mikaylagawarecki/341/head -> origin/gh/mikaylagawarecki/341/head
2025-12-04T09:33:41.5653410Z  * [new branch]              gh/mikaylagawarecki/341/orig -> origin/gh/mikaylagawarecki/341/orig
2025-12-04T09:33:41.5655648Z  * [new branch]              gh/mikaylagawarecki/342/base -> origin/gh/mikaylagawarecki/342/base
2025-12-04T09:33:41.5657200Z  * [new branch]              gh/mikaylagawarecki/342/head -> origin/gh/mikaylagawarecki/342/head
2025-12-04T09:33:41.5658738Z  * [new branch]              gh/mikaylagawarecki/342/orig -> origin/gh/mikaylagawarecki/342/orig
2025-12-04T09:33:41.5661293Z  * [new branch]              gh/mikaylagawarecki/345/base -> origin/gh/mikaylagawarecki/345/base
2025-12-04T09:33:41.5662738Z  * [new branch]              gh/mikaylagawarecki/345/head -> origin/gh/mikaylagawarecki/345/head
2025-12-04T09:33:41.5664221Z  * [new branch]              gh/mikaylagawarecki/345/orig -> origin/gh/mikaylagawarecki/345/orig
2025-12-04T09:33:41.5666405Z  * [new branch]              gh/mikaylagawarecki/346/base -> origin/gh/mikaylagawarecki/346/base
2025-12-04T09:33:41.5667886Z  * [new branch]              gh/mikaylagawarecki/346/head -> origin/gh/mikaylagawarecki/346/head
2025-12-04T09:33:41.5669457Z  * [new branch]              gh/mikaylagawarecki/346/orig -> origin/gh/mikaylagawarecki/346/orig
2025-12-04T09:33:41.5671783Z  * [new branch]              gh/mikaylagawarecki/347/base -> origin/gh/mikaylagawarecki/347/base
2025-12-04T09:33:41.5675900Z  * [new branch]              gh/mikaylagawarecki/347/head -> origin/gh/mikaylagawarecki/347/head
2025-12-04T09:33:41.5677335Z  * [new branch]              gh/mikaylagawarecki/347/orig -> origin/gh/mikaylagawarecki/347/orig
2025-12-04T09:33:41.5680001Z  * [new branch]              gh/mikaylagawarecki/350/base -> origin/gh/mikaylagawarecki/350/base
2025-12-04T09:33:41.5681459Z  * [new branch]              gh/mikaylagawarecki/350/head -> origin/gh/mikaylagawarecki/350/head
2025-12-04T09:33:41.5683075Z  * [new branch]              gh/mikaylagawarecki/350/orig -> origin/gh/mikaylagawarecki/350/orig
2025-12-04T09:33:41.5686049Z  * [new branch]              gh/mikaylagawarecki/351/base -> origin/gh/mikaylagawarecki/351/base
2025-12-04T09:33:41.5687603Z  * [new branch]              gh/mikaylagawarecki/351/head -> origin/gh/mikaylagawarecki/351/head
2025-12-04T09:33:41.5689142Z  * [new branch]              gh/mikaylagawarecki/351/orig -> origin/gh/mikaylagawarecki/351/orig
2025-12-04T09:33:41.5691338Z  * [new branch]              gh/mikaylagawarecki/352/base -> origin/gh/mikaylagawarecki/352/base
2025-12-04T09:33:41.5692988Z  * [new branch]              gh/mikaylagawarecki/352/head -> origin/gh/mikaylagawarecki/352/head
2025-12-04T09:33:41.5694819Z  * [new branch]              gh/mikaylagawarecki/352/orig -> origin/gh/mikaylagawarecki/352/orig
2025-12-04T09:33:41.5696920Z  * [new branch]              gh/mikaylagawarecki/353/base -> origin/gh/mikaylagawarecki/353/base
2025-12-04T09:33:41.5698840Z  * [new branch]              gh/mikaylagawarecki/353/head -> origin/gh/mikaylagawarecki/353/head
2025-12-04T09:33:41.5700303Z  * [new branch]              gh/mikaylagawarecki/353/orig -> origin/gh/mikaylagawarecki/353/orig
2025-12-04T09:33:41.5702132Z  * [new branch]              gh/mikaylagawarecki/354/base -> origin/gh/mikaylagawarecki/354/base
2025-12-04T09:33:41.5703605Z  * [new branch]              gh/mikaylagawarecki/354/head -> origin/gh/mikaylagawarecki/354/head
2025-12-04T09:33:41.5705620Z  * [new branch]              gh/mikaylagawarecki/354/orig -> origin/gh/mikaylagawarecki/354/orig
2025-12-04T09:33:41.5708556Z  * [new branch]              gh/mikaylagawarecki/356/base -> origin/gh/mikaylagawarecki/356/base
2025-12-04T09:33:41.5710207Z  * [new branch]              gh/mikaylagawarecki/356/head -> origin/gh/mikaylagawarecki/356/head
2025-12-04T09:33:41.5711635Z  * [new branch]              gh/mikaylagawarecki/356/orig -> origin/gh/mikaylagawarecki/356/orig
2025-12-04T09:33:41.5713497Z  * [new branch]              gh/mikaylagawarecki/357/base -> origin/gh/mikaylagawarecki/357/base
2025-12-04T09:33:41.5715064Z  * [new branch]              gh/mikaylagawarecki/357/head -> origin/gh/mikaylagawarecki/357/head
2025-12-04T09:33:41.5716704Z  * [new branch]              gh/mikaylagawarecki/357/orig -> origin/gh/mikaylagawarecki/357/orig
2025-12-04T09:33:41.5718905Z  * [new branch]              gh/mikaylagawarecki/359/base -> origin/gh/mikaylagawarecki/359/base
2025-12-04T09:33:41.5720550Z  * [new branch]              gh/mikaylagawarecki/359/head -> origin/gh/mikaylagawarecki/359/head
2025-12-04T09:33:41.5722087Z  * [new branch]              gh/mikaylagawarecki/359/orig -> origin/gh/mikaylagawarecki/359/orig
2025-12-04T09:33:41.5724073Z  * [new branch]              gh/mikaylagawarecki/360/base -> origin/gh/mikaylagawarecki/360/base
2025-12-04T09:33:41.5725719Z  * [new branch]              gh/mikaylagawarecki/360/head -> origin/gh/mikaylagawarecki/360/head
2025-12-04T09:33:41.5727699Z  * [new branch]              gh/mikaylagawarecki/360/orig -> origin/gh/mikaylagawarecki/360/orig
2025-12-04T09:33:41.5729818Z  * [new branch]              gh/mikaylagawarecki/361/base -> origin/gh/mikaylagawarecki/361/base
2025-12-04T09:33:41.5731358Z  * [new branch]              gh/mikaylagawarecki/361/head -> origin/gh/mikaylagawarecki/361/head
2025-12-04T09:33:41.5732809Z  * [new branch]              gh/mikaylagawarecki/361/orig -> origin/gh/mikaylagawarecki/361/orig
2025-12-04T09:33:41.5734920Z  * [new branch]              gh/mikaylagawarecki/362/base -> origin/gh/mikaylagawarecki/362/base
2025-12-04T09:33:41.5736718Z  * [new branch]              gh/mikaylagawarecki/362/head -> origin/gh/mikaylagawarecki/362/head
2025-12-04T09:33:41.5738382Z  * [new branch]              gh/mikaylagawarecki/362/orig -> origin/gh/mikaylagawarecki/362/orig
2025-12-04T09:33:41.5740778Z  * [new branch]              gh/mikaylagawarecki/363/base -> origin/gh/mikaylagawarecki/363/base
2025-12-04T09:33:41.5742615Z  * [new branch]              gh/mikaylagawarecki/363/head -> origin/gh/mikaylagawarecki/363/head
2025-12-04T09:33:41.5744059Z  * [new branch]              gh/mikaylagawarecki/363/orig -> origin/gh/mikaylagawarecki/363/orig
2025-12-04T09:33:41.5746680Z  * [new branch]              gh/mikaylagawarecki/364/base -> origin/gh/mikaylagawarecki/364/base
2025-12-04T09:33:41.5748157Z  * [new branch]              gh/mikaylagawarecki/364/head -> origin/gh/mikaylagawarecki/364/head
2025-12-04T09:33:41.5749688Z  * [new branch]              gh/mikaylagawarecki/364/orig -> origin/gh/mikaylagawarecki/364/orig
2025-12-04T09:33:41.5752010Z  * [new branch]              gh/mikaylagawarecki/365/base -> origin/gh/mikaylagawarecki/365/base
2025-12-04T09:33:41.5753533Z  * [new branch]              gh/mikaylagawarecki/365/head -> origin/gh/mikaylagawarecki/365/head
2025-12-04T09:33:41.5755180Z  * [new branch]              gh/mikaylagawarecki/365/orig -> origin/gh/mikaylagawarecki/365/orig
2025-12-04T09:33:41.5757347Z  * [new branch]              gh/mikaylagawarecki/366/base -> origin/gh/mikaylagawarecki/366/base
2025-12-04T09:33:41.5758817Z  * [new branch]              gh/mikaylagawarecki/366/head -> origin/gh/mikaylagawarecki/366/head
2025-12-04T09:33:41.5760463Z  * [new branch]              gh/mikaylagawarecki/366/orig -> origin/gh/mikaylagawarecki/366/orig
2025-12-04T09:33:41.5762450Z  * [new branch]              gh/mikaylagawarecki/367/base -> origin/gh/mikaylagawarecki/367/base
2025-12-04T09:33:41.5763960Z  * [new branch]              gh/mikaylagawarecki/367/head -> origin/gh/mikaylagawarecki/367/head
2025-12-04T09:33:41.5765425Z  * [new branch]              gh/mikaylagawarecki/367/orig -> origin/gh/mikaylagawarecki/367/orig
2025-12-04T09:33:41.5768121Z  * [new branch]              gh/mikaylagawarecki/368/base -> origin/gh/mikaylagawarecki/368/base
2025-12-04T09:33:41.5769619Z  * [new branch]              gh/mikaylagawarecki/368/head -> origin/gh/mikaylagawarecki/368/head
2025-12-04T09:33:41.5771337Z  * [new branch]              gh/mikaylagawarecki/368/orig -> origin/gh/mikaylagawarecki/368/orig
2025-12-04T09:33:41.5773397Z  * [new branch]              gh/mikaylagawarecki/369/base -> origin/gh/mikaylagawarecki/369/base
2025-12-04T09:33:41.5774884Z  * [new branch]              gh/mikaylagawarecki/369/head -> origin/gh/mikaylagawarecki/369/head
2025-12-04T09:33:41.5776373Z  * [new branch]              gh/mikaylagawarecki/369/orig -> origin/gh/mikaylagawarecki/369/orig
2025-12-04T09:33:41.5778648Z  * [new branch]              gh/mikaylagawarecki/370/base -> origin/gh/mikaylagawarecki/370/base
2025-12-04T09:33:41.5780122Z  * [new branch]              gh/mikaylagawarecki/370/head -> origin/gh/mikaylagawarecki/370/head
2025-12-04T09:33:41.5781750Z  * [new branch]              gh/mikaylagawarecki/370/orig -> origin/gh/mikaylagawarecki/370/orig
2025-12-04T09:33:41.5783895Z  * [new branch]              gh/mikaylagawarecki/371/base -> origin/gh/mikaylagawarecki/371/base
2025-12-04T09:33:41.5785340Z  * [new branch]              gh/mikaylagawarecki/371/head -> origin/gh/mikaylagawarecki/371/head
2025-12-04T09:33:41.5786793Z  * [new branch]              gh/mikaylagawarecki/371/orig -> origin/gh/mikaylagawarecki/371/orig
2025-12-04T09:33:41.5788849Z  * [new branch]              gh/mikaylagawarecki/372/base -> origin/gh/mikaylagawarecki/372/base
2025-12-04T09:33:41.5790289Z  * [new branch]              gh/mikaylagawarecki/372/head -> origin/gh/mikaylagawarecki/372/head
2025-12-04T09:33:41.5791794Z  * [new branch]              gh/mikaylagawarecki/372/orig -> origin/gh/mikaylagawarecki/372/orig
2025-12-04T09:33:41.5793763Z  * [new branch]              gh/mikaylagawarecki/373/base -> origin/gh/mikaylagawarecki/373/base
2025-12-04T09:33:41.5795222Z  * [new branch]              gh/mikaylagawarecki/373/head -> origin/gh/mikaylagawarecki/373/head
2025-12-04T09:33:41.5796702Z  * [new branch]              gh/mikaylagawarecki/373/orig -> origin/gh/mikaylagawarecki/373/orig
2025-12-04T09:33:41.5798762Z  * [new branch]              gh/mikaylagawarecki/374/base -> origin/gh/mikaylagawarecki/374/base
2025-12-04T09:33:41.5800275Z  * [new branch]              gh/mikaylagawarecki/374/head -> origin/gh/mikaylagawarecki/374/head
2025-12-04T09:33:41.5801838Z  * [new branch]              gh/mikaylagawarecki/374/orig -> origin/gh/mikaylagawarecki/374/orig
2025-12-04T09:33:41.5803840Z  * [new branch]              gh/mikaylagawarecki/375/base -> origin/gh/mikaylagawarecki/375/base
2025-12-04T09:33:41.5805417Z  * [new branch]              gh/mikaylagawarecki/375/head -> origin/gh/mikaylagawarecki/375/head
2025-12-04T09:33:41.5806923Z  * [new branch]              gh/mikaylagawarecki/375/orig -> origin/gh/mikaylagawarecki/375/orig
2025-12-04T09:33:41.5809011Z  * [new branch]              gh/mikaylagawarecki/376/base -> origin/gh/mikaylagawarecki/376/base
2025-12-04T09:33:41.5810631Z  * [new branch]              gh/mikaylagawarecki/376/head -> origin/gh/mikaylagawarecki/376/head
2025-12-04T09:33:41.5812088Z  * [new branch]              gh/mikaylagawarecki/376/orig -> origin/gh/mikaylagawarecki/376/orig
2025-12-04T09:33:41.5814102Z  * [new branch]              gh/mikaylagawarecki/377/base -> origin/gh/mikaylagawarecki/377/base
2025-12-04T09:33:41.5815688Z  * [new branch]              gh/mikaylagawarecki/377/head -> origin/gh/mikaylagawarecki/377/head
2025-12-04T09:33:41.5817283Z  * [new branch]              gh/mikaylagawarecki/377/orig -> origin/gh/mikaylagawarecki/377/orig
2025-12-04T09:33:41.5819749Z  * [new branch]              gh/mikaylagawarecki/378/base -> origin/gh/mikaylagawarecki/378/base
2025-12-04T09:33:41.5821289Z  * [new branch]              gh/mikaylagawarecki/378/head -> origin/gh/mikaylagawarecki/378/head
2025-12-04T09:33:41.5822882Z  * [new branch]              gh/mikaylagawarecki/378/orig -> origin/gh/mikaylagawarecki/378/orig
2025-12-04T09:33:41.5824921Z  * [new branch]              gh/mikaylagawarecki/379/base -> origin/gh/mikaylagawarecki/379/base
2025-12-04T09:33:41.5826384Z  * [new branch]              gh/mikaylagawarecki/379/head -> origin/gh/mikaylagawarecki/379/head
2025-12-04T09:33:41.5827873Z  * [new branch]              gh/mikaylagawarecki/379/orig -> origin/gh/mikaylagawarecki/379/orig
2025-12-04T09:33:41.5829701Z  * [new branch]              gh/mikaylagawarecki/380/base -> origin/gh/mikaylagawarecki/380/base
2025-12-04T09:33:41.5831163Z  * [new branch]              gh/mikaylagawarecki/380/head -> origin/gh/mikaylagawarecki/380/head
2025-12-04T09:33:41.5832633Z  * [new branch]              gh/mikaylagawarecki/380/orig -> origin/gh/mikaylagawarecki/380/orig
2025-12-04T09:33:41.5834467Z  * [new branch]              gh/mikaylagawarecki/381/base -> origin/gh/mikaylagawarecki/381/base
2025-12-04T09:33:41.5835927Z  * [new branch]              gh/mikaylagawarecki/381/head -> origin/gh/mikaylagawarecki/381/head
2025-12-04T09:33:41.5837371Z  * [new branch]              gh/mikaylagawarecki/381/orig -> origin/gh/mikaylagawarecki/381/orig
2025-12-04T09:33:41.5839191Z  * [new branch]              gh/mikaylagawarecki/382/base -> origin/gh/mikaylagawarecki/382/base
2025-12-04T09:33:41.5840812Z  * [new branch]              gh/mikaylagawarecki/382/head -> origin/gh/mikaylagawarecki/382/head
2025-12-04T09:33:41.5842379Z  * [new branch]              gh/mikaylagawarecki/382/orig -> origin/gh/mikaylagawarecki/382/orig
2025-12-04T09:33:41.5844473Z  * [new branch]              gh/mikaylagawarecki/383/base -> origin/gh/mikaylagawarecki/383/base
2025-12-04T09:33:41.5845988Z  * [new branch]              gh/mikaylagawarecki/383/head -> origin/gh/mikaylagawarecki/383/head
2025-12-04T09:33:41.5847486Z  * [new branch]              gh/mikaylagawarecki/383/orig -> origin/gh/mikaylagawarecki/383/orig
2025-12-04T09:33:41.5849554Z  * [new branch]              gh/mikaylagawarecki/384/base -> origin/gh/mikaylagawarecki/384/base
2025-12-04T09:33:41.5851040Z  * [new branch]              gh/mikaylagawarecki/384/head -> origin/gh/mikaylagawarecki/384/head
2025-12-04T09:33:41.5852470Z  * [new branch]              gh/mikaylagawarecki/384/orig -> origin/gh/mikaylagawarecki/384/orig
2025-12-04T09:33:41.5854469Z  * [new branch]              gh/mikaylagawarecki/385/base -> origin/gh/mikaylagawarecki/385/base
2025-12-04T09:33:41.5856079Z  * [new branch]              gh/mikaylagawarecki/385/head -> origin/gh/mikaylagawarecki/385/head
2025-12-04T09:33:41.5857681Z  * [new branch]              gh/mikaylagawarecki/385/orig -> origin/gh/mikaylagawarecki/385/orig
2025-12-04T09:33:41.5859910Z  * [new branch]              gh/mikaylagawarecki/386/base -> origin/gh/mikaylagawarecki/386/base
2025-12-04T09:33:41.5861356Z  * [new branch]              gh/mikaylagawarecki/386/head -> origin/gh/mikaylagawarecki/386/head
2025-12-04T09:33:41.5862995Z  * [new branch]              gh/mikaylagawarecki/386/orig -> origin/gh/mikaylagawarecki/386/orig
2025-12-04T09:33:41.5865097Z  * [new branch]              gh/mikaylagawarecki/387/base -> origin/gh/mikaylagawarecki/387/base
2025-12-04T09:33:41.5866447Z  * [new branch]              gh/mikaylagawarecki/387/head -> origin/gh/mikaylagawarecki/387/head
2025-12-04T09:33:41.5867920Z  * [new branch]              gh/mikaylagawarecki/387/orig -> origin/gh/mikaylagawarecki/387/orig
2025-12-04T09:33:41.5869699Z  * [new branch]              gh/mikaylagawarecki/388/base -> origin/gh/mikaylagawarecki/388/base
2025-12-04T09:33:41.5871338Z  * [new branch]              gh/mikaylagawarecki/388/head -> origin/gh/mikaylagawarecki/388/head
2025-12-04T09:33:41.5872873Z  * [new branch]              gh/mikaylagawarecki/388/orig -> origin/gh/mikaylagawarecki/388/orig
2025-12-04T09:33:41.5874921Z  * [new branch]              gh/mikaylagawarecki/389/base -> origin/gh/mikaylagawarecki/389/base
2025-12-04T09:33:41.5876370Z  * [new branch]              gh/mikaylagawarecki/389/head -> origin/gh/mikaylagawarecki/389/head
2025-12-04T09:33:41.5877816Z  * [new branch]              gh/mikaylagawarecki/389/orig -> origin/gh/mikaylagawarecki/389/orig
2025-12-04T09:33:41.5879950Z  * [new branch]              gh/mikaylagawarecki/390/base -> origin/gh/mikaylagawarecki/390/base
2025-12-04T09:33:41.5881429Z  * [new branch]              gh/mikaylagawarecki/390/head -> origin/gh/mikaylagawarecki/390/head
2025-12-04T09:33:41.5882973Z  * [new branch]              gh/mikaylagawarecki/390/orig -> origin/gh/mikaylagawarecki/390/orig
2025-12-04T09:33:41.5885196Z  * [new branch]              gh/mikaylagawarecki/391/base -> origin/gh/mikaylagawarecki/391/base
2025-12-04T09:33:41.5886742Z  * [new branch]              gh/mikaylagawarecki/391/head -> origin/gh/mikaylagawarecki/391/head
2025-12-04T09:33:41.5888292Z  * [new branch]              gh/mikaylagawarecki/391/orig -> origin/gh/mikaylagawarecki/391/orig
2025-12-04T09:33:41.5890376Z  * [new branch]              gh/mikaylagawarecki/392/base -> origin/gh/mikaylagawarecki/392/base
2025-12-04T09:33:41.5891836Z  * [new branch]              gh/mikaylagawarecki/392/head -> origin/gh/mikaylagawarecki/392/head
2025-12-04T09:33:41.5893310Z  * [new branch]              gh/mikaylagawarecki/392/orig -> origin/gh/mikaylagawarecki/392/orig
2025-12-04T09:33:41.5895655Z  * [new branch]              gh/mlazos/41/base           -> origin/gh/mlazos/41/base
2025-12-04T09:33:41.5897194Z  * [new branch]              gh/mlazos/41/head           -> origin/gh/mlazos/41/head
2025-12-04T09:33:41.5898686Z  * [new branch]              gh/mlazos/41/orig           -> origin/gh/mlazos/41/orig
2025-12-04T09:33:41.5900691Z  * [new branch]              gh/mlazos/42/base           -> origin/gh/mlazos/42/base
2025-12-04T09:33:41.5902190Z  * [new branch]              gh/mlazos/42/head           -> origin/gh/mlazos/42/head
2025-12-04T09:33:41.5903689Z  * [new branch]              gh/mlazos/42/orig           -> origin/gh/mlazos/42/orig
2025-12-04T09:33:41.5905399Z  * [new branch]              gh/mlazos/43/base           -> origin/gh/mlazos/43/base
2025-12-04T09:33:41.5907281Z  * [new branch]              gh/mlazos/43/head           -> origin/gh/mlazos/43/head
2025-12-04T09:33:41.5908731Z  * [new branch]              gh/mlazos/43/orig           -> origin/gh/mlazos/43/orig
2025-12-04T09:33:41.5910506Z  * [new branch]              gh/mlazos/44/base           -> origin/gh/mlazos/44/base
2025-12-04T09:33:41.5911915Z  * [new branch]              gh/mlazos/44/head           -> origin/gh/mlazos/44/head
2025-12-04T09:33:41.5913359Z  * [new branch]              gh/mlazos/44/orig           -> origin/gh/mlazos/44/orig
2025-12-04T09:33:41.5915248Z  * [new branch]              gh/mlazos/47/base           -> origin/gh/mlazos/47/base
2025-12-04T09:33:41.5916708Z  * [new branch]              gh/mlazos/47/head           -> origin/gh/mlazos/47/head
2025-12-04T09:33:41.5918239Z  * [new branch]              gh/mlazos/47/orig           -> origin/gh/mlazos/47/orig
2025-12-04T09:33:41.5920018Z  * [new branch]              gh/mlazos/48/base           -> origin/gh/mlazos/48/base
2025-12-04T09:33:41.5921675Z  * [new branch]              gh/mlazos/48/head           -> origin/gh/mlazos/48/head
2025-12-04T09:33:41.5923031Z  * [new branch]              gh/mlazos/48/orig           -> origin/gh/mlazos/48/orig
2025-12-04T09:33:41.5925152Z  * [new branch]              gh/mlazos/49/base           -> origin/gh/mlazos/49/base
2025-12-04T09:33:41.5926285Z  * [new branch]              gh/mlazos/49/head           -> origin/gh/mlazos/49/head
2025-12-04T09:33:41.5927703Z  * [new branch]              gh/mlazos/49/orig           -> origin/gh/mlazos/49/orig
2025-12-04T09:33:41.5929657Z  * [new branch]              gh/mlazos/50/base           -> origin/gh/mlazos/50/base
2025-12-04T09:33:41.5931287Z  * [new branch]              gh/mlazos/50/head           -> origin/gh/mlazos/50/head
2025-12-04T09:33:41.5932511Z  * [new branch]              gh/mlazos/50/orig           -> origin/gh/mlazos/50/orig
2025-12-04T09:33:41.5934286Z  * [new branch]              gh/mlazos/51/base           -> origin/gh/mlazos/51/base
2025-12-04T09:33:41.5935796Z  * [new branch]              gh/mlazos/51/head           -> origin/gh/mlazos/51/head
2025-12-04T09:33:41.5937484Z  * [new branch]              gh/mlazos/51/orig           -> origin/gh/mlazos/51/orig
2025-12-04T09:33:41.5939347Z  * [new branch]              gh/mlazos/52/base           -> origin/gh/mlazos/52/base
2025-12-04T09:33:41.5940917Z  * [new branch]              gh/mlazos/52/head           -> origin/gh/mlazos/52/head
2025-12-04T09:33:41.5942395Z  * [new branch]              gh/mlazos/52/orig           -> origin/gh/mlazos/52/orig
2025-12-04T09:33:41.5944341Z  * [new branch]              gh/mlazos/53/base           -> origin/gh/mlazos/53/base
2025-12-04T09:33:41.5945835Z  * [new branch]              gh/mlazos/53/head           -> origin/gh/mlazos/53/head
2025-12-04T09:33:41.5947310Z  * [new branch]              gh/mlazos/53/orig           -> origin/gh/mlazos/53/orig
2025-12-04T09:33:41.5949206Z  * [new branch]              gh/mlazos/54/base           -> origin/gh/mlazos/54/base
2025-12-04T09:33:41.5950669Z  * [new branch]              gh/mlazos/54/head           -> origin/gh/mlazos/54/head
2025-12-04T09:33:41.5952143Z  * [new branch]              gh/mlazos/54/orig           -> origin/gh/mlazos/54/orig
2025-12-04T09:33:41.5953983Z  * [new branch]              gh/mlazos/55/base           -> origin/gh/mlazos/55/base
2025-12-04T09:33:41.5955469Z  * [new branch]              gh/mlazos/55/head           -> origin/gh/mlazos/55/head
2025-12-04T09:33:41.5956891Z  * [new branch]              gh/mlazos/55/orig           -> origin/gh/mlazos/55/orig
2025-12-04T09:33:41.5958751Z  * [new branch]              gh/mlazos/56/base           -> origin/gh/mlazos/56/base
2025-12-04T09:33:41.5998851Z  * [new branch]              gh/mlazos/56/head           -> origin/gh/mlazos/56/head
2025-12-04T09:33:41.5999435Z  * [new branch]              gh/mlazos/56/orig           -> origin/gh/mlazos/56/orig
2025-12-04T09:33:41.6000181Z  * [new branch]              gh/mlazos/57/base           -> origin/gh/mlazos/57/base
2025-12-04T09:33:41.6000899Z  * [new branch]              gh/mlazos/57/head           -> origin/gh/mlazos/57/head
2025-12-04T09:33:41.6001518Z  * [new branch]              gh/mlazos/57/orig           -> origin/gh/mlazos/57/orig
2025-12-04T09:33:41.6002143Z  * [new branch]              gh/mlazos/58/base           -> origin/gh/mlazos/58/base
2025-12-04T09:33:41.6002759Z  * [new branch]              gh/mlazos/58/head           -> origin/gh/mlazos/58/head
2025-12-04T09:33:41.6003353Z  * [new branch]              gh/mlazos/58/orig           -> origin/gh/mlazos/58/orig
2025-12-04T09:33:41.6003939Z  * [new branch]              gh/mlazos/59/base           -> origin/gh/mlazos/59/base
2025-12-04T09:33:41.6004536Z  * [new branch]              gh/mlazos/59/head           -> origin/gh/mlazos/59/head
2025-12-04T09:33:41.6005127Z  * [new branch]              gh/mlazos/59/orig           -> origin/gh/mlazos/59/orig
2025-12-04T09:33:41.6005752Z  * [new branch]              gh/mlazos/60/base           -> origin/gh/mlazos/60/base
2025-12-04T09:33:41.6006582Z  * [new branch]              gh/mlazos/60/head           -> origin/gh/mlazos/60/head
2025-12-04T09:33:41.6007215Z  * [new branch]              gh/mlazos/60/orig           -> origin/gh/mlazos/60/orig
2025-12-04T09:33:41.6007846Z  * [new branch]              gh/mlazos/61/base           -> origin/gh/mlazos/61/base
2025-12-04T09:33:41.6008474Z  * [new branch]              gh/mlazos/61/head           -> origin/gh/mlazos/61/head
2025-12-04T09:33:41.6009091Z  * [new branch]              gh/mlazos/61/orig           -> origin/gh/mlazos/61/orig
2025-12-04T09:33:41.6009716Z  * [new branch]              gh/mlazos/62/base           -> origin/gh/mlazos/62/base
2025-12-04T09:33:41.6010329Z  * [new branch]              gh/mlazos/62/head           -> origin/gh/mlazos/62/head
2025-12-04T09:33:41.6010987Z  * [new branch]              gh/mlazos/62/orig           -> origin/gh/mlazos/62/orig
2025-12-04T09:33:41.6011601Z  * [new branch]              gh/mlazos/63/base           -> origin/gh/mlazos/63/base
2025-12-04T09:33:41.6012221Z  * [new branch]              gh/mlazos/63/head           -> origin/gh/mlazos/63/head
2025-12-04T09:33:41.6012835Z  * [new branch]              gh/mlazos/63/orig           -> origin/gh/mlazos/63/orig
2025-12-04T09:33:41.6013435Z  * [new branch]              gh/mlazos/64/base           -> origin/gh/mlazos/64/base
2025-12-04T09:33:41.6014175Z  * [new branch]              gh/mlazos/64/head           -> origin/gh/mlazos/64/head
2025-12-04T09:33:41.6014794Z  * [new branch]              gh/mlazos/64/orig           -> origin/gh/mlazos/64/orig
2025-12-04T09:33:41.6015411Z  * [new branch]              gh/mlazos/65/base           -> origin/gh/mlazos/65/base
2025-12-04T09:33:41.6016023Z  * [new branch]              gh/mlazos/65/head           -> origin/gh/mlazos/65/head
2025-12-04T09:33:41.6016733Z  * [new branch]              gh/mlazos/65/orig           -> origin/gh/mlazos/65/orig
2025-12-04T09:33:41.6017349Z  * [new branch]              gh/mlazos/66/base           -> origin/gh/mlazos/66/base
2025-12-04T09:33:41.6017966Z  * [new branch]              gh/mlazos/66/head           -> origin/gh/mlazos/66/head
2025-12-04T09:33:41.6018570Z  * [new branch]              gh/mlazos/66/orig           -> origin/gh/mlazos/66/orig
2025-12-04T09:33:41.6019187Z  * [new branch]              gh/mlazos/67/base           -> origin/gh/mlazos/67/base
2025-12-04T09:33:41.6019804Z  * [new branch]              gh/mlazos/67/head           -> origin/gh/mlazos/67/head
2025-12-04T09:33:41.6020419Z  * [new branch]              gh/mlazos/67/orig           -> origin/gh/mlazos/67/orig
2025-12-04T09:33:41.6021022Z  * [new branch]              gh/mlazos/68/base           -> origin/gh/mlazos/68/base
2025-12-04T09:33:41.6021639Z  * [new branch]              gh/mlazos/68/head           -> origin/gh/mlazos/68/head
2025-12-04T09:33:41.6022255Z  * [new branch]              gh/mlazos/68/orig           -> origin/gh/mlazos/68/orig
2025-12-04T09:33:41.6022859Z  * [new branch]              gh/mlazos/69/base           -> origin/gh/mlazos/69/base
2025-12-04T09:33:41.6023484Z  * [new branch]              gh/mlazos/69/head           -> origin/gh/mlazos/69/head
2025-12-04T09:33:41.6025224Z  * [new branch]              gh/mlazos/69/orig           -> origin/gh/mlazos/69/orig
2025-12-04T09:33:41.6027405Z  * [new branch]              gh/mlazos/70/base           -> origin/gh/mlazos/70/base
2025-12-04T09:33:41.6029027Z  * [new branch]              gh/mlazos/70/head           -> origin/gh/mlazos/70/head
2025-12-04T09:33:41.6030620Z  * [new branch]              gh/mlazos/70/orig           -> origin/gh/mlazos/70/orig
2025-12-04T09:33:41.6032555Z  * [new branch]              gh/mlazos/71/base           -> origin/gh/mlazos/71/base
2025-12-04T09:33:41.6034005Z  * [new branch]              gh/mlazos/71/head           -> origin/gh/mlazos/71/head
2025-12-04T09:33:41.6035454Z  * [new branch]              gh/mlazos/71/orig           -> origin/gh/mlazos/71/orig
2025-12-04T09:33:41.6037357Z  * [new branch]              gh/mlazos/72/base           -> origin/gh/mlazos/72/base
2025-12-04T09:33:41.6039089Z  * [new branch]              gh/mlazos/72/head           -> origin/gh/mlazos/72/head
2025-12-04T09:33:41.6040351Z  * [new branch]              gh/mlazos/72/orig           -> origin/gh/mlazos/72/orig
2025-12-04T09:33:41.6042471Z  * [new branch]              gh/mlazos/73/base           -> origin/gh/mlazos/73/base
2025-12-04T09:33:41.6043925Z  * [new branch]              gh/mlazos/73/head           -> origin/gh/mlazos/73/head
2025-12-04T09:33:41.6045383Z  * [new branch]              gh/mlazos/73/orig           -> origin/gh/mlazos/73/orig
2025-12-04T09:33:41.6047675Z  * [new branch]              gh/mrmiywj/1/base           -> origin/gh/mrmiywj/1/base
2025-12-04T09:33:41.6049232Z  * [new branch]              gh/mrmiywj/1/head           -> origin/gh/mrmiywj/1/head
2025-12-04T09:33:41.6051634Z  * [new branch]              gh/muchulee8/73/base        -> origin/gh/muchulee8/73/base
2025-12-04T09:33:41.6053270Z  * [new branch]              gh/muchulee8/73/head        -> origin/gh/muchulee8/73/head
2025-12-04T09:33:41.6054854Z  * [new branch]              gh/muchulee8/73/orig        -> origin/gh/muchulee8/73/orig
2025-12-04T09:33:41.6057514Z  * [new branch]              gh/naveenthangudu/1/base    -> origin/gh/naveenthangudu/1/base
2025-12-04T09:33:41.6058910Z  * [new branch]              gh/naveenthangudu/1/head    -> origin/gh/naveenthangudu/1/head
2025-12-04T09:33:41.6060576Z  * [new branch]              gh/naveenthangudu/1/orig    -> origin/gh/naveenthangudu/1/orig
2025-12-04T09:33:41.6062494Z  * [new branch]              gh/naveenthangudu/2/base    -> origin/gh/naveenthangudu/2/base
2025-12-04T09:33:41.6063991Z  * [new branch]              gh/naveenthangudu/2/head    -> origin/gh/naveenthangudu/2/head
2025-12-04T09:33:41.6065498Z  * [new branch]              gh/naveenthangudu/2/orig    -> origin/gh/naveenthangudu/2/orig
2025-12-04T09:33:41.6067287Z  * [new branch]              gh/naveenthangudu/3/base    -> origin/gh/naveenthangudu/3/base
2025-12-04T09:33:41.6068638Z  * [new branch]              gh/naveenthangudu/3/head    -> origin/gh/naveenthangudu/3/head
2025-12-04T09:33:41.6070346Z  * [new branch]              gh/naveenthangudu/3/orig    -> origin/gh/naveenthangudu/3/orig
2025-12-04T09:33:41.6072500Z  * [new branch]              gh/naveenthangudu/4/base    -> origin/gh/naveenthangudu/4/base
2025-12-04T09:33:41.6073823Z  * [new branch]              gh/naveenthangudu/4/head    -> origin/gh/naveenthangudu/4/head
2025-12-04T09:33:41.6075480Z  * [new branch]              gh/naveenthangudu/4/orig    -> origin/gh/naveenthangudu/4/orig
2025-12-04T09:33:41.6077466Z  * [new branch]              gh/naveenthangudu/5/base    -> origin/gh/naveenthangudu/5/base
2025-12-04T09:33:41.6078858Z  * [new branch]              gh/naveenthangudu/5/head    -> origin/gh/naveenthangudu/5/head
2025-12-04T09:33:41.6080685Z  * [new branch]              gh/naveenthangudu/5/orig    -> origin/gh/naveenthangudu/5/orig
2025-12-04T09:33:41.6082500Z  * [new branch]              gh/naveenthangudu/6/base    -> origin/gh/naveenthangudu/6/base
2025-12-04T09:33:41.6083825Z  * [new branch]              gh/naveenthangudu/6/head    -> origin/gh/naveenthangudu/6/head
2025-12-04T09:33:41.6085144Z  * [new branch]              gh/naveenthangudu/6/orig    -> origin/gh/naveenthangudu/6/orig
2025-12-04T09:33:41.6087623Z  * [new branch]              gh/naveenthangudu/7/base    -> origin/gh/naveenthangudu/7/base
2025-12-04T09:33:41.6089010Z  * [new branch]              gh/naveenthangudu/7/head    -> origin/gh/naveenthangudu/7/head
2025-12-04T09:33:41.6090386Z  * [new branch]              gh/naveenthangudu/7/orig    -> origin/gh/naveenthangudu/7/orig
2025-12-04T09:33:41.6092178Z  * [new branch]              gh/naveenthangudu/8/base    -> origin/gh/naveenthangudu/8/base
2025-12-04T09:33:41.6093743Z  * [new branch]              gh/naveenthangudu/8/head    -> origin/gh/naveenthangudu/8/head
2025-12-04T09:33:41.6095243Z  * [new branch]              gh/naveenthangudu/8/orig    -> origin/gh/naveenthangudu/8/orig
2025-12-04T09:33:41.6097466Z  * [new branch]              gh/naveenthangudu/9/base    -> origin/gh/naveenthangudu/9/base
2025-12-04T09:33:41.6098643Z  * [new branch]              gh/naveenthangudu/9/head    -> origin/gh/naveenthangudu/9/head
2025-12-04T09:33:41.6100276Z  * [new branch]              gh/naveenthangudu/9/orig    -> origin/gh/naveenthangudu/9/orig
2025-12-04T09:33:41.6102572Z  * [new branch]              gh/nikitaved/1/base         -> origin/gh/nikitaved/1/base
2025-12-04T09:33:41.6104080Z  * [new branch]              gh/nikitaved/1/head         -> origin/gh/nikitaved/1/head
2025-12-04T09:33:41.6105403Z  * [new branch]              gh/nikitaved/1/orig         -> origin/gh/nikitaved/1/orig
2025-12-04T09:33:41.6107487Z  * [new branch]              gh/nikitaved/10/base        -> origin/gh/nikitaved/10/base
2025-12-04T09:33:41.6108968Z  * [new branch]              gh/nikitaved/10/head        -> origin/gh/nikitaved/10/head
2025-12-04T09:33:41.6110455Z  * [new branch]              gh/nikitaved/10/orig        -> origin/gh/nikitaved/10/orig
2025-12-04T09:33:41.6112339Z  * [new branch]              gh/nikitaved/11/base        -> origin/gh/nikitaved/11/base
2025-12-04T09:33:41.6113862Z  * [new branch]              gh/nikitaved/11/head        -> origin/gh/nikitaved/11/head
2025-12-04T09:33:41.6115515Z  * [new branch]              gh/nikitaved/11/orig        -> origin/gh/nikitaved/11/orig
2025-12-04T09:33:41.6117345Z  * [new branch]              gh/nikitaved/12/base        -> origin/gh/nikitaved/12/base
2025-12-04T09:33:41.6118645Z  * [new branch]              gh/nikitaved/12/head        -> origin/gh/nikitaved/12/head
2025-12-04T09:33:41.6120147Z  * [new branch]              gh/nikitaved/12/orig        -> origin/gh/nikitaved/12/orig
2025-12-04T09:33:41.6122091Z  * [new branch]              gh/nikitaved/13/base        -> origin/gh/nikitaved/13/base
2025-12-04T09:33:41.6123532Z  * [new branch]              gh/nikitaved/13/head        -> origin/gh/nikitaved/13/head
2025-12-04T09:33:41.6124841Z  * [new branch]              gh/nikitaved/13/orig        -> origin/gh/nikitaved/13/orig
2025-12-04T09:33:41.6126943Z  * [new branch]              gh/nikitaved/14/base        -> origin/gh/nikitaved/14/base
2025-12-04T09:33:41.6128425Z  * [new branch]              gh/nikitaved/14/head        -> origin/gh/nikitaved/14/head
2025-12-04T09:33:41.6129874Z  * [new branch]              gh/nikitaved/14/orig        -> origin/gh/nikitaved/14/orig
2025-12-04T09:33:41.6131633Z  * [new branch]              gh/nikitaved/15/base        -> origin/gh/nikitaved/15/base
2025-12-04T09:33:41.6133072Z  * [new branch]              gh/nikitaved/15/head        -> origin/gh/nikitaved/15/head
2025-12-04T09:33:41.6134645Z  * [new branch]              gh/nikitaved/15/orig        -> origin/gh/nikitaved/15/orig
2025-12-04T09:33:41.6136555Z  * [new branch]              gh/nikitaved/16/base        -> origin/gh/nikitaved/16/base
2025-12-04T09:33:41.6138068Z  * [new branch]              gh/nikitaved/16/head        -> origin/gh/nikitaved/16/head
2025-12-04T09:33:41.6139502Z  * [new branch]              gh/nikitaved/16/orig        -> origin/gh/nikitaved/16/orig
2025-12-04T09:33:41.6141487Z  * [new branch]              gh/nikitaved/2/base         -> origin/gh/nikitaved/2/base
2025-12-04T09:33:41.6142927Z  * [new branch]              gh/nikitaved/2/head         -> origin/gh/nikitaved/2/head
2025-12-04T09:33:41.6144424Z  * [new branch]              gh/nikitaved/2/orig         -> origin/gh/nikitaved/2/orig
2025-12-04T09:33:41.6146345Z  * [new branch]              gh/nikitaved/4/base         -> origin/gh/nikitaved/4/base
2025-12-04T09:33:41.6147802Z  * [new branch]              gh/nikitaved/4/head         -> origin/gh/nikitaved/4/head
2025-12-04T09:33:41.6149286Z  * [new branch]              gh/nikitaved/4/orig         -> origin/gh/nikitaved/4/orig
2025-12-04T09:33:41.6151272Z  * [new branch]              gh/nikitaved/5/base         -> origin/gh/nikitaved/5/base
2025-12-04T09:33:41.6152710Z  * [new branch]              gh/nikitaved/5/head         -> origin/gh/nikitaved/5/head
2025-12-04T09:33:41.6154421Z  * [new branch]              gh/nikitaved/5/orig         -> origin/gh/nikitaved/5/orig
2025-12-04T09:33:41.6156127Z  * [new branch]              gh/nikitaved/6/base         -> origin/gh/nikitaved/6/base
2025-12-04T09:33:41.6157641Z  * [new branch]              gh/nikitaved/6/head         -> origin/gh/nikitaved/6/head
2025-12-04T09:33:41.6159615Z  * [new branch]              gh/nikitaved/6/orig         -> origin/gh/nikitaved/6/orig
2025-12-04T09:33:41.6161537Z  * [new branch]              gh/nikitaved/8/base         -> origin/gh/nikitaved/8/base
2025-12-04T09:33:41.6162958Z  * [new branch]              gh/nikitaved/8/head         -> origin/gh/nikitaved/8/head
2025-12-04T09:33:41.6164414Z  * [new branch]              gh/nikitaved/8/orig         -> origin/gh/nikitaved/8/orig
2025-12-04T09:33:41.6166326Z  * [new branch]              gh/nikitaved/9/base         -> origin/gh/nikitaved/9/base
2025-12-04T09:33:41.6167765Z  * [new branch]              gh/nikitaved/9/head         -> origin/gh/nikitaved/9/head
2025-12-04T09:33:41.6169207Z  * [new branch]              gh/nikitaved/9/orig         -> origin/gh/nikitaved/9/orig
2025-12-04T09:33:41.6171564Z  * [new branch]              gh/oulgen/10/base           -> origin/gh/oulgen/10/base
2025-12-04T09:33:41.6175213Z  * [new branch]              gh/oulgen/10/head           -> origin/gh/oulgen/10/head
2025-12-04T09:33:41.6176785Z  * [new branch]              gh/oulgen/10/orig           -> origin/gh/oulgen/10/orig
2025-12-04T09:33:41.6178711Z  * [new branch]              gh/oulgen/11/base           -> origin/gh/oulgen/11/base
2025-12-04T09:33:41.6180095Z  * [new branch]              gh/oulgen/11/head           -> origin/gh/oulgen/11/head
2025-12-04T09:33:41.6181681Z  * [new branch]              gh/oulgen/11/orig           -> origin/gh/oulgen/11/orig
2025-12-04T09:33:41.6183518Z  * [new branch]              gh/oulgen/12/base           -> origin/gh/oulgen/12/base
2025-12-04T09:33:41.6184944Z  * [new branch]              gh/oulgen/12/head           -> origin/gh/oulgen/12/head
2025-12-04T09:33:41.6186397Z  * [new branch]              gh/oulgen/12/orig           -> origin/gh/oulgen/12/orig
2025-12-04T09:33:41.6188758Z  * [new branch]              gh/oulgen/13/base           -> origin/gh/oulgen/13/base
2025-12-04T09:33:41.6190199Z  * [new branch]              gh/oulgen/13/head           -> origin/gh/oulgen/13/head
2025-12-04T09:33:41.6191619Z  * [new branch]              gh/oulgen/13/orig           -> origin/gh/oulgen/13/orig
2025-12-04T09:33:41.6193504Z  * [new branch]              gh/oulgen/14/base           -> origin/gh/oulgen/14/base
2025-12-04T09:33:41.6195526Z  * [new branch]              gh/oulgen/14/head           -> origin/gh/oulgen/14/head
2025-12-04T09:33:41.6197086Z  * [new branch]              gh/oulgen/14/orig           -> origin/gh/oulgen/14/orig
2025-12-04T09:33:41.6199281Z  * [new branch]              gh/oulgen/15/base           -> origin/gh/oulgen/15/base
2025-12-04T09:33:41.6200321Z  * [new branch]              gh/oulgen/15/head           -> origin/gh/oulgen/15/head
2025-12-04T09:33:41.6201860Z  * [new branch]              gh/oulgen/15/orig           -> origin/gh/oulgen/15/orig
2025-12-04T09:33:41.6203775Z  * [new branch]              gh/oulgen/16/base           -> origin/gh/oulgen/16/base
2025-12-04T09:33:41.6205197Z  * [new branch]              gh/oulgen/16/head           -> origin/gh/oulgen/16/head
2025-12-04T09:33:41.6206656Z  * [new branch]              gh/oulgen/16/orig           -> origin/gh/oulgen/16/orig
2025-12-04T09:33:41.6208542Z  * [new branch]              gh/oulgen/17/base           -> origin/gh/oulgen/17/base
2025-12-04T09:33:41.6210083Z  * [new branch]              gh/oulgen/17/head           -> origin/gh/oulgen/17/head
2025-12-04T09:33:41.6211478Z  * [new branch]              gh/oulgen/17/orig           -> origin/gh/oulgen/17/orig
2025-12-04T09:33:41.6213407Z  * [new branch]              gh/oulgen/18/base           -> origin/gh/oulgen/18/base
2025-12-04T09:33:41.6214901Z  * [new branch]              gh/oulgen/18/head           -> origin/gh/oulgen/18/head
2025-12-04T09:33:41.6216740Z  * [new branch]              gh/oulgen/18/orig           -> origin/gh/oulgen/18/orig
2025-12-04T09:33:41.6218357Z  * [new branch]              gh/oulgen/19/base           -> origin/gh/oulgen/19/base
2025-12-04T09:33:41.6219824Z  * [new branch]              gh/oulgen/19/head           -> origin/gh/oulgen/19/head
2025-12-04T09:33:41.6221782Z  * [new branch]              gh/oulgen/19/orig           -> origin/gh/oulgen/19/orig
2025-12-04T09:33:41.6223269Z  * [new branch]              gh/oulgen/20/base           -> origin/gh/oulgen/20/base
2025-12-04T09:33:41.6224712Z  * [new branch]              gh/oulgen/20/head           -> origin/gh/oulgen/20/head
2025-12-04T09:33:41.6226236Z  * [new branch]              gh/oulgen/20/orig           -> origin/gh/oulgen/20/orig
2025-12-04T09:33:41.6228084Z  * [new branch]              gh/oulgen/21/base           -> origin/gh/oulgen/21/base
2025-12-04T09:33:41.6229566Z  * [new branch]              gh/oulgen/21/head           -> origin/gh/oulgen/21/head
2025-12-04T09:33:41.6231007Z  * [new branch]              gh/oulgen/21/orig           -> origin/gh/oulgen/21/orig
2025-12-04T09:33:41.6232955Z  * [new branch]              gh/oulgen/22/base           -> origin/gh/oulgen/22/base
2025-12-04T09:33:41.6234459Z  * [new branch]              gh/oulgen/22/head           -> origin/gh/oulgen/22/head
2025-12-04T09:33:41.6235942Z  * [new branch]              gh/oulgen/22/orig           -> origin/gh/oulgen/22/orig
2025-12-04T09:33:41.6237849Z  * [new branch]              gh/oulgen/23/base           -> origin/gh/oulgen/23/base
2025-12-04T09:33:41.6239300Z  * [new branch]              gh/oulgen/23/head           -> origin/gh/oulgen/23/head
2025-12-04T09:33:41.6240753Z  * [new branch]              gh/oulgen/23/orig           -> origin/gh/oulgen/23/orig
2025-12-04T09:33:41.6242546Z  * [new branch]              gh/oulgen/24/base           -> origin/gh/oulgen/24/base
2025-12-04T09:33:41.6244011Z  * [new branch]              gh/oulgen/24/head           -> origin/gh/oulgen/24/head
2025-12-04T09:33:41.6245417Z  * [new branch]              gh/oulgen/24/orig           -> origin/gh/oulgen/24/orig
2025-12-04T09:33:41.6247283Z  * [new branch]              gh/oulgen/25/base           -> origin/gh/oulgen/25/base
2025-12-04T09:33:41.6248746Z  * [new branch]              gh/oulgen/25/head           -> origin/gh/oulgen/25/head
2025-12-04T09:33:41.6250247Z  * [new branch]              gh/oulgen/25/orig           -> origin/gh/oulgen/25/orig
2025-12-04T09:33:41.6252196Z  * [new branch]              gh/oulgen/26/base           -> origin/gh/oulgen/26/base
2025-12-04T09:33:41.6253747Z  * [new branch]              gh/oulgen/26/head           -> origin/gh/oulgen/26/head
2025-12-04T09:33:41.6255302Z  * [new branch]              gh/oulgen/26/orig           -> origin/gh/oulgen/26/orig
2025-12-04T09:33:41.6257332Z  * [new branch]              gh/oulgen/4/base            -> origin/gh/oulgen/4/base
2025-12-04T09:33:41.6258812Z  * [new branch]              gh/oulgen/4/head            -> origin/gh/oulgen/4/head
2025-12-04T09:33:41.6260252Z  * [new branch]              gh/oulgen/4/orig            -> origin/gh/oulgen/4/orig
2025-12-04T09:33:41.6262988Z  * [new branch]              gh/oulgen/7/base            -> origin/gh/oulgen/7/base
2025-12-04T09:33:41.6264463Z  * [new branch]              gh/oulgen/7/head            -> origin/gh/oulgen/7/head
2025-12-04T09:33:41.6265951Z  * [new branch]              gh/oulgen/7/orig            -> origin/gh/oulgen/7/orig
2025-12-04T09:33:41.6267903Z  * [new branch]              gh/oulgen/8/base            -> origin/gh/oulgen/8/base
2025-12-04T09:33:41.6269438Z  * [new branch]              gh/oulgen/8/head            -> origin/gh/oulgen/8/head
2025-12-04T09:33:41.6270881Z  * [new branch]              gh/oulgen/8/orig            -> origin/gh/oulgen/8/orig
2025-12-04T09:33:41.6273144Z  * [new branch]              gh/oulgen/9/base            -> origin/gh/oulgen/9/base
2025-12-04T09:33:41.6274739Z  * [new branch]              gh/oulgen/9/head            -> origin/gh/oulgen/9/head
2025-12-04T09:33:41.6276787Z  * [new branch]              gh/oulgen/9/orig            -> origin/gh/oulgen/9/orig
2025-12-04T09:33:41.6278719Z  * [new branch]              gh/patvig/mtia-serialization -> origin/gh/patvig/mtia-serialization
2025-12-04T09:33:41.6281220Z  * [new branch]              gh/pearu/108/base           -> origin/gh/pearu/108/base
2025-12-04T09:33:41.6282734Z  * [new branch]              gh/pearu/108/head           -> origin/gh/pearu/108/head
2025-12-04T09:33:41.6284370Z  * [new branch]              gh/pearu/108/orig           -> origin/gh/pearu/108/orig
2025-12-04T09:33:41.6286275Z  * [new branch]              gh/pearu/109/base           -> origin/gh/pearu/109/base
2025-12-04T09:33:41.6287724Z  * [new branch]              gh/pearu/109/head           -> origin/gh/pearu/109/head
2025-12-04T09:33:41.6289138Z  * [new branch]              gh/pearu/109/orig           -> origin/gh/pearu/109/orig
2025-12-04T09:33:41.6291152Z  * [new branch]              gh/pearu/110/base           -> origin/gh/pearu/110/base
2025-12-04T09:33:41.6292589Z  * [new branch]              gh/pearu/110/head           -> origin/gh/pearu/110/head
2025-12-04T09:33:41.6294220Z  * [new branch]              gh/pearu/110/orig           -> origin/gh/pearu/110/orig
2025-12-04T09:33:41.6296134Z  * [new branch]              gh/pearu/111/base           -> origin/gh/pearu/111/base
2025-12-04T09:33:41.6297731Z  * [new branch]              gh/pearu/111/head           -> origin/gh/pearu/111/head
2025-12-04T09:33:41.6299287Z  * [new branch]              gh/pearu/111/orig           -> origin/gh/pearu/111/orig
2025-12-04T09:33:41.6301220Z  * [new branch]              gh/pearu/112/base           -> origin/gh/pearu/112/base
2025-12-04T09:33:41.6302680Z  * [new branch]              gh/pearu/112/head           -> origin/gh/pearu/112/head
2025-12-04T09:33:41.6304127Z  * [new branch]              gh/pearu/112/orig           -> origin/gh/pearu/112/orig
2025-12-04T09:33:41.6305902Z  * [new branch]              gh/pearu/115/base           -> origin/gh/pearu/115/base
2025-12-04T09:33:41.6307335Z  * [new branch]              gh/pearu/115/head           -> origin/gh/pearu/115/head
2025-12-04T09:33:41.6308816Z  * [new branch]              gh/pearu/115/orig           -> origin/gh/pearu/115/orig
2025-12-04T09:33:41.6310601Z  * [new branch]              gh/pearu/116/base           -> origin/gh/pearu/116/base
2025-12-04T09:33:41.6312104Z  * [new branch]              gh/pearu/116/head           -> origin/gh/pearu/116/head
2025-12-04T09:33:41.6313647Z  * [new branch]              gh/pearu/116/orig           -> origin/gh/pearu/116/orig
2025-12-04T09:33:41.6315499Z  * [new branch]              gh/pearu/117/base           -> origin/gh/pearu/117/base
2025-12-04T09:33:41.6316994Z  * [new branch]              gh/pearu/117/head           -> origin/gh/pearu/117/head
2025-12-04T09:33:41.6318492Z  * [new branch]              gh/pearu/117/orig           -> origin/gh/pearu/117/orig
2025-12-04T09:33:41.6320365Z  * [new branch]              gh/pearu/118/base           -> origin/gh/pearu/118/base
2025-12-04T09:33:41.6321825Z  * [new branch]              gh/pearu/118/head           -> origin/gh/pearu/118/head
2025-12-04T09:33:41.6323395Z  * [new branch]              gh/pearu/118/orig           -> origin/gh/pearu/118/orig
2025-12-04T09:33:41.6325273Z  * [new branch]              gh/pearu/119/base           -> origin/gh/pearu/119/base
2025-12-04T09:33:41.6326706Z  * [new branch]              gh/pearu/119/head           -> origin/gh/pearu/119/head
2025-12-04T09:33:41.6328127Z  * [new branch]              gh/pearu/119/orig           -> origin/gh/pearu/119/orig
2025-12-04T09:33:41.6330511Z  * [new branch]              gh/pearu/139/base           -> origin/gh/pearu/139/base
2025-12-04T09:33:41.6331997Z  * [new branch]              gh/pearu/139/head           -> origin/gh/pearu/139/head
2025-12-04T09:33:41.6333513Z  * [new branch]              gh/pearu/139/orig           -> origin/gh/pearu/139/orig
2025-12-04T09:33:41.6335489Z  * [new branch]              gh/pearu/140/base           -> origin/gh/pearu/140/base
2025-12-04T09:33:41.6337104Z  * [new branch]              gh/pearu/140/head           -> origin/gh/pearu/140/head
2025-12-04T09:33:41.6338531Z  * [new branch]              gh/pearu/140/orig           -> origin/gh/pearu/140/orig
2025-12-04T09:33:41.6340473Z  * [new branch]              gh/pearu/142/base           -> origin/gh/pearu/142/base
2025-12-04T09:33:41.6341927Z  * [new branch]              gh/pearu/142/head           -> origin/gh/pearu/142/head
2025-12-04T09:33:41.6343372Z  * [new branch]              gh/pearu/142/orig           -> origin/gh/pearu/142/orig
2025-12-04T09:33:41.6345341Z  * [new branch]              gh/pearu/143/base           -> origin/gh/pearu/143/base
2025-12-04T09:33:41.6346738Z  * [new branch]              gh/pearu/143/head           -> origin/gh/pearu/143/head
2025-12-04T09:33:41.6348257Z  * [new branch]              gh/pearu/143/orig           -> origin/gh/pearu/143/orig
2025-12-04T09:33:41.6350221Z  * [new branch]              gh/pearu/147/base           -> origin/gh/pearu/147/base
2025-12-04T09:33:41.6351763Z  * [new branch]              gh/pearu/147/head           -> origin/gh/pearu/147/head
2025-12-04T09:33:41.6353815Z  * [new branch]              gh/pearu/147/orig           -> origin/gh/pearu/147/orig
2025-12-04T09:33:41.6355758Z  * [new branch]              gh/pearu/149/base           -> origin/gh/pearu/149/base
2025-12-04T09:33:41.6357083Z  * [new branch]              gh/pearu/149/head           -> origin/gh/pearu/149/head
2025-12-04T09:33:41.6358630Z  * [new branch]              gh/pearu/149/orig           -> origin/gh/pearu/149/orig
2025-12-04T09:33:41.6361054Z  * [new branch]              gh/pearu/150/base           -> origin/gh/pearu/150/base
2025-12-04T09:33:41.6362550Z  * [new branch]              gh/pearu/150/head           -> origin/gh/pearu/150/head
2025-12-04T09:33:41.6363994Z  * [new branch]              gh/pearu/150/orig           -> origin/gh/pearu/150/orig
2025-12-04T09:33:41.6366004Z  * [new branch]              gh/pearu/151/base           -> origin/gh/pearu/151/base
2025-12-04T09:33:41.6367506Z  * [new branch]              gh/pearu/151/head           -> origin/gh/pearu/151/head
2025-12-04T09:33:41.6368995Z  * [new branch]              gh/pearu/151/orig           -> origin/gh/pearu/151/orig
2025-12-04T09:33:41.6371418Z  * [new branch]              gh/pearu/152/base           -> origin/gh/pearu/152/base
2025-12-04T09:33:41.6372728Z  * [new branch]              gh/pearu/152/head           -> origin/gh/pearu/152/head
2025-12-04T09:33:41.6374284Z  * [new branch]              gh/pearu/152/orig           -> origin/gh/pearu/152/orig
2025-12-04T09:33:41.6376340Z  * [new branch]              gh/pearu/153/base           -> origin/gh/pearu/153/base
2025-12-04T09:33:41.6377915Z  * [new branch]              gh/pearu/153/head           -> origin/gh/pearu/153/head
2025-12-04T09:33:41.6379329Z  * [new branch]              gh/pearu/153/orig           -> origin/gh/pearu/153/orig
2025-12-04T09:33:41.6381338Z  * [new branch]              gh/pearu/154/base           -> origin/gh/pearu/154/base
2025-12-04T09:33:41.6382798Z  * [new branch]              gh/pearu/154/head           -> origin/gh/pearu/154/head
2025-12-04T09:33:41.6384242Z  * [new branch]              gh/pearu/154/orig           -> origin/gh/pearu/154/orig
2025-12-04T09:33:41.6386309Z  * [new branch]              gh/pearu/155/base           -> origin/gh/pearu/155/base
2025-12-04T09:33:41.6387791Z  * [new branch]              gh/pearu/155/head           -> origin/gh/pearu/155/head
2025-12-04T09:33:41.6389337Z  * [new branch]              gh/pearu/155/orig           -> origin/gh/pearu/155/orig
2025-12-04T09:33:41.6391306Z  * [new branch]              gh/pearu/156/base           -> origin/gh/pearu/156/base
2025-12-04T09:33:41.6392709Z  * [new branch]              gh/pearu/156/head           -> origin/gh/pearu/156/head
2025-12-04T09:33:41.6394292Z  * [new branch]              gh/pearu/156/orig           -> origin/gh/pearu/156/orig
2025-12-04T09:33:41.6396763Z  * [new branch]              gh/pearu/56/base            -> origin/gh/pearu/56/base
2025-12-04T09:33:41.6398666Z  * [new branch]              gh/pearu/56/head            -> origin/gh/pearu/56/head
2025-12-04T09:33:41.6399925Z  * [new branch]              gh/pearu/56/orig            -> origin/gh/pearu/56/orig
2025-12-04T09:33:41.6402763Z  * [new branch]              gh/pearu/97/base            -> origin/gh/pearu/97/base
2025-12-04T09:33:41.6404238Z  * [new branch]              gh/pearu/97/head            -> origin/gh/pearu/97/head
2025-12-04T09:33:41.6405806Z  * [new branch]              gh/pearu/97/orig            -> origin/gh/pearu/97/orig
2025-12-04T09:33:41.6408207Z  * [new branch]              gh/pianpwk/21/base          -> origin/gh/pianpwk/21/base
2025-12-04T09:33:41.6409710Z  * [new branch]              gh/pianpwk/21/head          -> origin/gh/pianpwk/21/head
2025-12-04T09:33:41.6411771Z  * [new branch]              gh/pianpwk/28/base          -> origin/gh/pianpwk/28/base
2025-12-04T09:33:41.6413275Z  * [new branch]              gh/pianpwk/28/head          -> origin/gh/pianpwk/28/head
2025-12-04T09:33:41.6415482Z  * [new branch]              gh/pianpwk/28/orig          -> origin/gh/pianpwk/28/orig
2025-12-04T09:33:41.6417632Z  * [new branch]              gh/pianpwk/29/base          -> origin/gh/pianpwk/29/base
2025-12-04T09:33:41.6419343Z  * [new branch]              gh/pianpwk/29/head          -> origin/gh/pianpwk/29/head
2025-12-04T09:33:41.6421334Z  * [new branch]              gh/pianpwk/29/orig          -> origin/gh/pianpwk/29/orig
2025-12-04T09:33:41.6423551Z  * [new branch]              gh/pianpwk/30/base          -> origin/gh/pianpwk/30/base
2025-12-04T09:33:41.6425040Z  * [new branch]              gh/pianpwk/30/head          -> origin/gh/pianpwk/30/head
2025-12-04T09:33:41.6426609Z  * [new branch]              gh/pianpwk/30/orig          -> origin/gh/pianpwk/30/orig
2025-12-04T09:33:41.6428718Z  * [new branch]              gh/pianpwk/31/base          -> origin/gh/pianpwk/31/base
2025-12-04T09:33:41.6430214Z  * [new branch]              gh/pianpwk/31/head          -> origin/gh/pianpwk/31/head
2025-12-04T09:33:41.6431743Z  * [new branch]              gh/pianpwk/31/orig          -> origin/gh/pianpwk/31/orig
2025-12-04T09:33:41.6433565Z  * [new branch]              gh/pianpwk/32/base          -> origin/gh/pianpwk/32/base
2025-12-04T09:33:41.6435101Z  * [new branch]              gh/pianpwk/32/head          -> origin/gh/pianpwk/32/head
2025-12-04T09:33:41.6436669Z  * [new branch]              gh/pianpwk/32/orig          -> origin/gh/pianpwk/32/orig
2025-12-04T09:33:41.6438444Z  * [new branch]              gh/pianpwk/33/base          -> origin/gh/pianpwk/33/base
2025-12-04T09:33:41.6439859Z  * [new branch]              gh/pianpwk/33/head          -> origin/gh/pianpwk/33/head
2025-12-04T09:33:41.6441328Z  * [new branch]              gh/pianpwk/33/orig          -> origin/gh/pianpwk/33/orig
2025-12-04T09:33:41.6443607Z  * [new branch]              gh/pianpwk/34/base          -> origin/gh/pianpwk/34/base
2025-12-04T09:33:41.6445451Z  * [new branch]              gh/pianpwk/34/head          -> origin/gh/pianpwk/34/head
2025-12-04T09:33:41.6447410Z  * [new branch]              gh/pianpwk/34/orig          -> origin/gh/pianpwk/34/orig
2025-12-04T09:33:41.6449387Z  * [new branch]              gh/pianpwk/35/base          -> origin/gh/pianpwk/35/base
2025-12-04T09:33:41.6450879Z  * [new branch]              gh/pianpwk/35/head          -> origin/gh/pianpwk/35/head
2025-12-04T09:33:41.6452456Z  * [new branch]              gh/pianpwk/35/orig          -> origin/gh/pianpwk/35/orig
2025-12-04T09:33:41.6454873Z  * [new branch]              gh/rec/141/base             -> origin/gh/rec/141/base
2025-12-04T09:33:41.6456545Z  * [new branch]              gh/rec/141/head             -> origin/gh/rec/141/head
2025-12-04T09:33:41.6458551Z  * [new branch]              gh/rec/153/base             -> origin/gh/rec/153/base
2025-12-04T09:33:41.6459964Z  * [new branch]              gh/rec/153/head             -> origin/gh/rec/153/head
2025-12-04T09:33:41.6461449Z  * [new branch]              gh/rec/153/orig             -> origin/gh/rec/153/orig
2025-12-04T09:33:41.6463543Z  * [new branch]              gh/rec/154/base             -> origin/gh/rec/154/base
2025-12-04T09:33:41.6465428Z  * [new branch]              gh/rec/154/head             -> origin/gh/rec/154/head
2025-12-04T09:33:41.6466997Z  * [new branch]              gh/rec/154/orig             -> origin/gh/rec/154/orig
2025-12-04T09:33:41.6469005Z  * [new branch]              gh/rec/164/base             -> origin/gh/rec/164/base
2025-12-04T09:33:41.6470469Z  * [new branch]              gh/rec/164/head             -> origin/gh/rec/164/head
2025-12-04T09:33:41.6472123Z  * [new branch]              gh/rec/164/orig             -> origin/gh/rec/164/orig
2025-12-04T09:33:41.6474710Z  * [new branch]              gh/rec/166/base             -> origin/gh/rec/166/base
2025-12-04T09:33:41.6476212Z  * [new branch]              gh/rec/166/head             -> origin/gh/rec/166/head
2025-12-04T09:33:41.6477784Z  * [new branch]              gh/rec/166/orig             -> origin/gh/rec/166/orig
2025-12-04T09:33:41.6479882Z  * [new branch]              gh/rec/167/base             -> origin/gh/rec/167/base
2025-12-04T09:33:41.6481259Z  * [new branch]              gh/rec/167/head             -> origin/gh/rec/167/head
2025-12-04T09:33:41.6482789Z  * [new branch]              gh/rec/167/orig             -> origin/gh/rec/167/orig
2025-12-04T09:33:41.6484691Z  * [new branch]              gh/rec/168/base             -> origin/gh/rec/168/base
2025-12-04T09:33:41.6486117Z  * [new branch]              gh/rec/168/head             -> origin/gh/rec/168/head
2025-12-04T09:33:41.6487548Z  * [new branch]              gh/rec/168/orig             -> origin/gh/rec/168/orig
2025-12-04T09:33:41.6489513Z  * [new branch]              gh/rec/169/base             -> origin/gh/rec/169/base
2025-12-04T09:33:41.6490944Z  * [new branch]              gh/rec/169/head             -> origin/gh/rec/169/head
2025-12-04T09:33:41.6492404Z  * [new branch]              gh/rec/169/orig             -> origin/gh/rec/169/orig
2025-12-04T09:33:41.6494391Z  * [new branch]              gh/rec/170/base             -> origin/gh/rec/170/base
2025-12-04T09:33:41.6496420Z  * [new branch]              gh/rec/170/head             -> origin/gh/rec/170/head
2025-12-04T09:33:41.6498095Z  * [new branch]              gh/rec/170/orig             -> origin/gh/rec/170/orig
2025-12-04T09:33:41.6500048Z  * [new branch]              gh/rec/171/base             -> origin/gh/rec/171/base
2025-12-04T09:33:41.6501565Z  * [new branch]              gh/rec/171/head             -> origin/gh/rec/171/head
2025-12-04T09:33:41.6503089Z  * [new branch]              gh/rec/171/orig             -> origin/gh/rec/171/orig
2025-12-04T09:33:41.6504992Z  * [new branch]              gh/rec/172/base             -> origin/gh/rec/172/base
2025-12-04T09:33:41.6506459Z  * [new branch]              gh/rec/172/head             -> origin/gh/rec/172/head
2025-12-04T09:33:41.6507858Z  * [new branch]              gh/rec/172/orig             -> origin/gh/rec/172/orig
2025-12-04T09:33:41.6509861Z  * [new branch]              gh/rec/173/base             -> origin/gh/rec/173/base
2025-12-04T09:33:41.6511296Z  * [new branch]              gh/rec/173/head             -> origin/gh/rec/173/head
2025-12-04T09:33:41.6512800Z  * [new branch]              gh/rec/173/orig             -> origin/gh/rec/173/orig
2025-12-04T09:33:41.6514724Z  * [new branch]              gh/rec/174/base             -> origin/gh/rec/174/base
2025-12-04T09:33:41.6516172Z  * [new branch]              gh/rec/174/head             -> origin/gh/rec/174/head
2025-12-04T09:33:41.6517840Z  * [new branch]              gh/rec/174/orig             -> origin/gh/rec/174/orig
2025-12-04T09:33:41.6519752Z  * [new branch]              gh/rec/175/base             -> origin/gh/rec/175/base
2025-12-04T09:33:41.6521156Z  * [new branch]              gh/rec/175/head             -> origin/gh/rec/175/head
2025-12-04T09:33:41.6522659Z  * [new branch]              gh/rec/175/orig             -> origin/gh/rec/175/orig
2025-12-04T09:33:41.6524764Z  * [new branch]              gh/rec/176/base             -> origin/gh/rec/176/base
2025-12-04T09:33:41.6525964Z  * [new branch]              gh/rec/176/head             -> origin/gh/rec/176/head
2025-12-04T09:33:41.6527524Z  * [new branch]              gh/rec/176/orig             -> origin/gh/rec/176/orig
2025-12-04T09:33:41.6530035Z  * [new branch]              gh/rec/177/base             -> origin/gh/rec/177/base
2025-12-04T09:33:41.6531654Z  * [new branch]              gh/rec/177/head             -> origin/gh/rec/177/head
2025-12-04T09:33:41.6533089Z  * [new branch]              gh/rec/177/orig             -> origin/gh/rec/177/orig
2025-12-04T09:33:41.6535715Z  * [new branch]              gh/robert-hardwick/3/base   -> origin/gh/robert-hardwick/3/base
2025-12-04T09:33:41.6537479Z  * [new branch]              gh/robert-hardwick/3/head   -> origin/gh/robert-hardwick/3/head
2025-12-04T09:33:41.6538859Z  * [new branch]              gh/robert-hardwick/3/orig   -> origin/gh/robert-hardwick/3/orig
2025-12-04T09:33:41.6540936Z  * [new branch]              gh/robert-hardwick/4/base   -> origin/gh/robert-hardwick/4/base
2025-12-04T09:33:41.6542447Z  * [new branch]              gh/robert-hardwick/4/head   -> origin/gh/robert-hardwick/4/head
2025-12-04T09:33:41.6543979Z  * [new branch]              gh/robert-hardwick/4/orig   -> origin/gh/robert-hardwick/4/orig
2025-12-04T09:33:41.6545905Z  * [new branch]              gh/robert-hardwick/5/base   -> origin/gh/robert-hardwick/5/base
2025-12-04T09:33:41.6547282Z  * [new branch]              gh/robert-hardwick/5/head   -> origin/gh/robert-hardwick/5/head
2025-12-04T09:33:41.6548882Z  * [new branch]              gh/robert-hardwick/5/orig   -> origin/gh/robert-hardwick/5/orig
2025-12-04T09:33:41.6550817Z  * [new branch]              gh/robert-hardwick/6/base   -> origin/gh/robert-hardwick/6/base
2025-12-04T09:33:41.6552168Z  * [new branch]              gh/robert-hardwick/6/head   -> origin/gh/robert-hardwick/6/head
2025-12-04T09:33:41.6553720Z  * [new branch]              gh/robert-hardwick/6/orig   -> origin/gh/robert-hardwick/6/orig
2025-12-04T09:33:41.6555715Z  * [new branch]              gh/robert-hardwick/7/base   -> origin/gh/robert-hardwick/7/base
2025-12-04T09:33:41.6557279Z  * [new branch]              gh/robert-hardwick/7/head   -> origin/gh/robert-hardwick/7/head
2025-12-04T09:33:41.6558613Z  * [new branch]              gh/robert-hardwick/7/orig   -> origin/gh/robert-hardwick/7/orig
2025-12-04T09:33:41.6561167Z  * [new branch]              gh/robert-hardwick/8/base   -> origin/gh/robert-hardwick/8/base
2025-12-04T09:33:41.6562665Z  * [new branch]              gh/robert-hardwick/8/head   -> origin/gh/robert-hardwick/8/head
2025-12-04T09:33:41.6564216Z  * [new branch]              gh/robert-hardwick/8/orig   -> origin/gh/robert-hardwick/8/orig
2025-12-04T09:33:41.6566265Z  * [new branch]              gh/robert-hardwick/9/base   -> origin/gh/robert-hardwick/9/base
2025-12-04T09:33:41.6567838Z  * [new branch]              gh/robert-hardwick/9/head   -> origin/gh/robert-hardwick/9/head
2025-12-04T09:33:41.6569335Z  * [new branch]              gh/robert-hardwick/9/orig   -> origin/gh/robert-hardwick/9/orig
2025-12-04T09:33:41.6571940Z  * [new branch]              gh/rtimpe/1/base            -> origin/gh/rtimpe/1/base
2025-12-04T09:33:41.6573407Z  * [new branch]              gh/rtimpe/1/head            -> origin/gh/rtimpe/1/head
2025-12-04T09:33:41.6575357Z  * [new branch]              gh/rtimpe/2/base            -> origin/gh/rtimpe/2/base
2025-12-04T09:33:41.6576917Z  * [new branch]              gh/rtimpe/2/head            -> origin/gh/rtimpe/2/head
2025-12-04T09:33:41.6579023Z  * [new branch]              gh/rtimpe/22/base           -> origin/gh/rtimpe/22/base
2025-12-04T09:33:41.6580552Z  * [new branch]              gh/rtimpe/22/head           -> origin/gh/rtimpe/22/head
2025-12-04T09:33:41.6582080Z  * [new branch]              gh/rtimpe/22/orig           -> origin/gh/rtimpe/22/orig
2025-12-04T09:33:41.6583919Z  * [new branch]              gh/rtimpe/23/base           -> origin/gh/rtimpe/23/base
2025-12-04T09:33:41.6585548Z  * [new branch]              gh/rtimpe/23/head           -> origin/gh/rtimpe/23/head
2025-12-04T09:33:41.6586728Z  * [new branch]              gh/rtimpe/23/orig           -> origin/gh/rtimpe/23/orig
2025-12-04T09:33:41.6588720Z  * [new branch]              gh/rtimpe/24/base           -> origin/gh/rtimpe/24/base
2025-12-04T09:33:41.6590196Z  * [new branch]              gh/rtimpe/24/head           -> origin/gh/rtimpe/24/head
2025-12-04T09:33:41.6591621Z  * [new branch]              gh/rtimpe/24/orig           -> origin/gh/rtimpe/24/orig
2025-12-04T09:33:41.6593522Z  * [new branch]              gh/rtimpe/25/base           -> origin/gh/rtimpe/25/base
2025-12-04T09:33:41.6594983Z  * [new branch]              gh/rtimpe/25/head           -> origin/gh/rtimpe/25/head
2025-12-04T09:33:41.6596666Z  * [new branch]              gh/rtimpe/25/orig           -> origin/gh/rtimpe/25/orig
2025-12-04T09:33:41.6599075Z  * [new branch]              gh/rtimpe/26/base           -> origin/gh/rtimpe/26/base
2025-12-04T09:33:41.6600579Z  * [new branch]              gh/rtimpe/26/head           -> origin/gh/rtimpe/26/head
2025-12-04T09:33:41.6602110Z  * [new branch]              gh/rtimpe/26/orig           -> origin/gh/rtimpe/26/orig
2025-12-04T09:33:41.6604028Z  * [new branch]              gh/rtimpe/27/base           -> origin/gh/rtimpe/27/base
2025-12-04T09:33:41.6605433Z  * [new branch]              gh/rtimpe/27/head           -> origin/gh/rtimpe/27/head
2025-12-04T09:33:41.6607013Z  * [new branch]              gh/rtimpe/27/orig           -> origin/gh/rtimpe/27/orig
2025-12-04T09:33:41.6609511Z  * [new branch]              gh/rtimpe/28/base           -> origin/gh/rtimpe/28/base
2025-12-04T09:33:41.6610955Z  * [new branch]              gh/rtimpe/28/head           -> origin/gh/rtimpe/28/head
2025-12-04T09:33:41.6612553Z  * [new branch]              gh/rtimpe/28/orig           -> origin/gh/rtimpe/28/orig
2025-12-04T09:33:41.6614587Z  * [new branch]              gh/rtimpe/29/base           -> origin/gh/rtimpe/29/base
2025-12-04T09:33:41.6616041Z  * [new branch]              gh/rtimpe/29/head           -> origin/gh/rtimpe/29/head
2025-12-04T09:33:41.6617901Z  * [new branch]              gh/rtimpe/29/orig           -> origin/gh/rtimpe/29/orig
2025-12-04T09:33:41.6619803Z  * [new branch]              gh/rtimpe/3/base            -> origin/gh/rtimpe/3/base
2025-12-04T09:33:41.6621112Z  * [new branch]              gh/rtimpe/3/head            -> origin/gh/rtimpe/3/head
2025-12-04T09:33:41.6623156Z  * [new branch]              gh/rtimpe/30/base           -> origin/gh/rtimpe/30/base
2025-12-04T09:33:41.6624646Z  * [new branch]              gh/rtimpe/30/head           -> origin/gh/rtimpe/30/head
2025-12-04T09:33:41.6626130Z  * [new branch]              gh/rtimpe/30/orig           -> origin/gh/rtimpe/30/orig
2025-12-04T09:33:41.6628131Z  * [new branch]              gh/rtimpe/31/base           -> origin/gh/rtimpe/31/base
2025-12-04T09:33:41.6629602Z  * [new branch]              gh/rtimpe/31/head           -> origin/gh/rtimpe/31/head
2025-12-04T09:33:41.6631290Z  * [new branch]              gh/rtimpe/31/orig           -> origin/gh/rtimpe/31/orig
2025-12-04T09:33:41.6633756Z  * [new branch]              gh/rtimpe/32/base           -> origin/gh/rtimpe/32/base
2025-12-04T09:33:41.6635220Z  * [new branch]              gh/rtimpe/32/head           -> origin/gh/rtimpe/32/head
2025-12-04T09:33:41.6636629Z  * [new branch]              gh/rtimpe/32/orig           -> origin/gh/rtimpe/32/orig
2025-12-04T09:33:41.6638740Z  * [new branch]              gh/rtimpe/33/base           -> origin/gh/rtimpe/33/base
2025-12-04T09:33:41.6640163Z  * [new branch]              gh/rtimpe/33/head           -> origin/gh/rtimpe/33/head
2025-12-04T09:33:41.6641700Z  * [new branch]              gh/rtimpe/33/orig           -> origin/gh/rtimpe/33/orig
2025-12-04T09:33:41.6643565Z  * [new branch]              gh/rtimpe/34/base           -> origin/gh/rtimpe/34/base
2025-12-04T09:33:41.6645052Z  * [new branch]              gh/rtimpe/34/head           -> origin/gh/rtimpe/34/head
2025-12-04T09:33:41.6646566Z  * [new branch]              gh/rtimpe/34/orig           -> origin/gh/rtimpe/34/orig
2025-12-04T09:33:41.6648516Z  * [new branch]              gh/rtimpe/35/base           -> origin/gh/rtimpe/35/base
2025-12-04T09:33:41.6650062Z  * [new branch]              gh/rtimpe/35/head           -> origin/gh/rtimpe/35/head
2025-12-04T09:33:41.6651589Z  * [new branch]              gh/rtimpe/35/orig           -> origin/gh/rtimpe/35/orig
2025-12-04T09:33:41.6653554Z  * [new branch]              gh/rtimpe/4/base            -> origin/gh/rtimpe/4/base
2025-12-04T09:33:41.6655071Z  * [new branch]              gh/rtimpe/4/head            -> origin/gh/rtimpe/4/head
2025-12-04T09:33:41.6657968Z  * [new branch]              gh/ruisizhang123/1/base     -> origin/gh/ruisizhang123/1/base
2025-12-04T09:33:41.6659529Z  * [new branch]              gh/ruisizhang123/1/head     -> origin/gh/ruisizhang123/1/head
2025-12-04T09:33:41.6660857Z  * [new branch]              gh/ruisizhang123/1/orig     -> origin/gh/ruisizhang123/1/orig
2025-12-04T09:33:41.6662896Z  * [new branch]              gh/ruisizhang123/4/base     -> origin/gh/ruisizhang123/4/base
2025-12-04T09:33:41.6664434Z  * [new branch]              gh/ruisizhang123/4/head     -> origin/gh/ruisizhang123/4/head
2025-12-04T09:33:41.6665876Z  * [new branch]              gh/ruisizhang123/4/orig     -> origin/gh/ruisizhang123/4/orig
2025-12-04T09:33:41.6667852Z  * [new branch]              gh/ruisizhang123/5/base     -> origin/gh/ruisizhang123/5/base
2025-12-04T09:33:41.6669583Z  * [new branch]              gh/ruisizhang123/5/head     -> origin/gh/ruisizhang123/5/head
2025-12-04T09:33:41.6671207Z  * [new branch]              gh/ruisizhang123/5/orig     -> origin/gh/ruisizhang123/5/orig
2025-12-04T09:33:41.6675409Z  * [new branch]              gh/ruisizhang123/6/base     -> origin/gh/ruisizhang123/6/base
2025-12-04T09:33:41.6676718Z  * [new branch]              gh/ruisizhang123/6/head     -> origin/gh/ruisizhang123/6/head
2025-12-04T09:33:41.6678252Z  * [new branch]              gh/ruisizhang123/6/orig     -> origin/gh/ruisizhang123/6/orig
2025-12-04T09:33:41.6680404Z  * [new branch]              gh/ruisizhang123/7/base     -> origin/gh/ruisizhang123/7/base
2025-12-04T09:33:41.6681898Z  * [new branch]              gh/ruisizhang123/7/head     -> origin/gh/ruisizhang123/7/head
2025-12-04T09:33:41.6683434Z  * [new branch]              gh/ruisizhang123/7/orig     -> origin/gh/ruisizhang123/7/orig
2025-12-04T09:33:41.6685314Z  * [new branch]              gh/ruisizhang123/8/base     -> origin/gh/ruisizhang123/8/base
2025-12-04T09:33:41.6687170Z  * [new branch]              gh/ruisizhang123/8/head     -> origin/gh/ruisizhang123/8/head
2025-12-04T09:33:41.6688690Z  * [new branch]              gh/ruisizhang123/8/orig     -> origin/gh/ruisizhang123/8/orig
2025-12-04T09:33:41.6690628Z  * [new branch]              gh/ruisizhang123/9/base     -> origin/gh/ruisizhang123/9/base
2025-12-04T09:33:41.6692131Z  * [new branch]              gh/ruisizhang123/9/head     -> origin/gh/ruisizhang123/9/head
2025-12-04T09:33:41.6693590Z  * [new branch]              gh/ruisizhang123/9/orig     -> origin/gh/ruisizhang123/9/orig
2025-12-04T09:33:41.6696108Z  * [new branch]              gh/seemethere/52/base       -> origin/gh/seemethere/52/base
2025-12-04T09:33:41.6697770Z  * [new branch]              gh/seemethere/52/head       -> origin/gh/seemethere/52/head
2025-12-04T09:33:41.6699359Z  * [new branch]              gh/seemethere/52/orig       -> origin/gh/seemethere/52/orig
2025-12-04T09:33:41.6701297Z  * [new branch]              gh/seemethere/53/base       -> origin/gh/seemethere/53/base
2025-12-04T09:33:41.6702748Z  * [new branch]              gh/seemethere/53/head       -> origin/gh/seemethere/53/head
2025-12-04T09:33:41.6704259Z  * [new branch]              gh/seemethere/53/orig       -> origin/gh/seemethere/53/orig
2025-12-04T09:33:41.6706286Z  * [new branch]              gh/seemethere/54/base       -> origin/gh/seemethere/54/base
2025-12-04T09:33:41.6707747Z  * [new branch]              gh/seemethere/54/head       -> origin/gh/seemethere/54/head
2025-12-04T09:33:41.6709433Z  * [new branch]              gh/seemethere/54/orig       -> origin/gh/seemethere/54/orig
2025-12-04T09:33:41.6711162Z  * [new branch]              gh/seemethere/55/base       -> origin/gh/seemethere/55/base
2025-12-04T09:33:41.6712398Z  * [new branch]              gh/seemethere/55/head       -> origin/gh/seemethere/55/head
2025-12-04T09:33:41.6713963Z  * [new branch]              gh/seemethere/55/orig       -> origin/gh/seemethere/55/orig
2025-12-04T09:33:41.6715816Z  * [new branch]              gh/seemethere/59/base       -> origin/gh/seemethere/59/base
2025-12-04T09:33:41.6717309Z  * [new branch]              gh/seemethere/59/head       -> origin/gh/seemethere/59/head
2025-12-04T09:33:41.6718945Z  * [new branch]              gh/seemethere/59/orig       -> origin/gh/seemethere/59/orig
2025-12-04T09:33:41.6720843Z  * [new branch]              gh/seemethere/62/base       -> origin/gh/seemethere/62/base
2025-12-04T09:33:41.6722375Z  * [new branch]              gh/seemethere/62/head       -> origin/gh/seemethere/62/head
2025-12-04T09:33:41.6723860Z  * [new branch]              gh/seemethere/62/orig       -> origin/gh/seemethere/62/orig
2025-12-04T09:33:41.6725805Z  * [new branch]              gh/seemethere/63/base       -> origin/gh/seemethere/63/base
2025-12-04T09:33:41.6727178Z  * [new branch]              gh/seemethere/63/head       -> origin/gh/seemethere/63/head
2025-12-04T09:33:41.6728705Z  * [new branch]              gh/seemethere/63/orig       -> origin/gh/seemethere/63/orig
2025-12-04T09:33:41.6730649Z  * [new branch]              gh/seemethere/71/base       -> origin/gh/seemethere/71/base
2025-12-04T09:33:41.6732101Z  * [new branch]              gh/seemethere/71/head       -> origin/gh/seemethere/71/head
2025-12-04T09:33:41.6733589Z  * [new branch]              gh/seemethere/71/orig       -> origin/gh/seemethere/71/orig
2025-12-04T09:33:41.6735554Z  * [new branch]              gh/seemethere/72/base       -> origin/gh/seemethere/72/base
2025-12-04T09:33:41.6737122Z  * [new branch]              gh/seemethere/72/head       -> origin/gh/seemethere/72/head
2025-12-04T09:33:41.6738903Z  * [new branch]              gh/seemethere/72/orig       -> origin/gh/seemethere/72/orig
2025-12-04T09:33:41.6740879Z  * [new branch]              gh/seemethere/73/base       -> origin/gh/seemethere/73/base
2025-12-04T09:33:41.6742333Z  * [new branch]              gh/seemethere/73/head       -> origin/gh/seemethere/73/head
2025-12-04T09:33:41.6743861Z  * [new branch]              gh/seemethere/73/orig       -> origin/gh/seemethere/73/orig
2025-12-04T09:33:41.6745785Z  * [new branch]              gh/seemethere/74/base       -> origin/gh/seemethere/74/base
2025-12-04T09:33:41.6747226Z  * [new branch]              gh/seemethere/74/head       -> origin/gh/seemethere/74/head
2025-12-04T09:33:41.6748751Z  * [new branch]              gh/seemethere/74/orig       -> origin/gh/seemethere/74/orig
2025-12-04T09:33:41.6750744Z  * [new branch]              gh/seemethere/75/base       -> origin/gh/seemethere/75/base
2025-12-04T09:33:41.6752090Z  * [new branch]              gh/seemethere/75/head       -> origin/gh/seemethere/75/head
2025-12-04T09:33:41.6753663Z  * [new branch]              gh/seemethere/75/orig       -> origin/gh/seemethere/75/orig
2025-12-04T09:33:41.6755590Z  * [new branch]              gh/seemethere/76/base       -> origin/gh/seemethere/76/base
2025-12-04T09:33:41.6757033Z  * [new branch]              gh/seemethere/76/head       -> origin/gh/seemethere/76/head
2025-12-04T09:33:41.6758605Z  * [new branch]              gh/seemethere/76/orig       -> origin/gh/seemethere/76/orig
2025-12-04T09:33:41.6761370Z  * [new branch]              gh/shunting314/145/base     -> origin/gh/shunting314/145/base
2025-12-04T09:33:41.6763035Z  * [new branch]              gh/shunting314/145/head     -> origin/gh/shunting314/145/head
2025-12-04T09:33:41.6764587Z  * [new branch]              gh/shunting314/145/orig     -> origin/gh/shunting314/145/orig
2025-12-04T09:33:41.6767075Z  * [new branch]              gh/shunting314/176/base     -> origin/gh/shunting314/176/base
2025-12-04T09:33:41.6768760Z  * [new branch]              gh/shunting314/176/head     -> origin/gh/shunting314/176/head
2025-12-04T09:33:41.6770270Z  * [new branch]              gh/shunting314/176/orig     -> origin/gh/shunting314/176/orig
2025-12-04T09:33:41.6773095Z  * [new branch]              gh/shunting314/249/base     -> origin/gh/shunting314/249/base
2025-12-04T09:33:41.6774720Z  * [new branch]              gh/shunting314/249/head     -> origin/gh/shunting314/249/head
2025-12-04T09:33:41.6776336Z  * [new branch]              gh/shunting314/249/orig     -> origin/gh/shunting314/249/orig
2025-12-04T09:33:41.6778498Z  * [new branch]              gh/shunting314/253/base     -> origin/gh/shunting314/253/base
2025-12-04T09:33:41.6780015Z  * [new branch]              gh/shunting314/253/head     -> origin/gh/shunting314/253/head
2025-12-04T09:33:41.6781355Z  * [new branch]              gh/shunting314/253/orig     -> origin/gh/shunting314/253/orig
2025-12-04T09:33:41.6783459Z  * [new branch]              gh/shunting314/256/base     -> origin/gh/shunting314/256/base
2025-12-04T09:33:41.6784935Z  * [new branch]              gh/shunting314/256/head     -> origin/gh/shunting314/256/head
2025-12-04T09:33:41.6786254Z  * [new branch]              gh/shunting314/256/orig     -> origin/gh/shunting314/256/orig
2025-12-04T09:33:41.6788701Z  * [new branch]              gh/shunting314/257/base     -> origin/gh/shunting314/257/base
2025-12-04T09:33:41.6790279Z  * [new branch]              gh/shunting314/257/head     -> origin/gh/shunting314/257/head
2025-12-04T09:33:41.6791760Z  * [new branch]              gh/shunting314/257/orig     -> origin/gh/shunting314/257/orig
2025-12-04T09:33:41.6794001Z  * [new branch]              gh/shunting314/258/base     -> origin/gh/shunting314/258/base
2025-12-04T09:33:41.6795282Z  * [new branch]              gh/shunting314/258/head     -> origin/gh/shunting314/258/head
2025-12-04T09:33:41.6796882Z  * [new branch]              gh/shunting314/258/orig     -> origin/gh/shunting314/258/orig
2025-12-04T09:33:41.6798727Z  * [new branch]              gh/shunting314/259/base     -> origin/gh/shunting314/259/base
2025-12-04T09:33:41.6800403Z  * [new branch]              gh/shunting314/259/head     -> origin/gh/shunting314/259/head
2025-12-04T09:33:41.6801927Z  * [new branch]              gh/shunting314/259/orig     -> origin/gh/shunting314/259/orig
2025-12-04T09:33:41.6804038Z  * [new branch]              gh/shunting314/260/base     -> origin/gh/shunting314/260/base
2025-12-04T09:33:41.6805689Z  * [new branch]              gh/shunting314/260/head     -> origin/gh/shunting314/260/head
2025-12-04T09:33:41.6807242Z  * [new branch]              gh/shunting314/260/orig     -> origin/gh/shunting314/260/orig
2025-12-04T09:33:41.6809329Z  * [new branch]              gh/shunting314/261/base     -> origin/gh/shunting314/261/base
2025-12-04T09:33:41.6811474Z  * [new branch]              gh/shunting314/261/head     -> origin/gh/shunting314/261/head
2025-12-04T09:33:41.6813084Z  * [new branch]              gh/shunting314/261/orig     -> origin/gh/shunting314/261/orig
2025-12-04T09:33:41.6815151Z  * [new branch]              gh/shunting314/262/base     -> origin/gh/shunting314/262/base
2025-12-04T09:33:41.6816760Z  * [new branch]              gh/shunting314/262/head     -> origin/gh/shunting314/262/head
2025-12-04T09:33:41.6818243Z  * [new branch]              gh/shunting314/262/orig     -> origin/gh/shunting314/262/orig
2025-12-04T09:33:41.6820365Z  * [new branch]              gh/shunting314/263/base     -> origin/gh/shunting314/263/base
2025-12-04T09:33:41.6822102Z  * [new branch]              gh/shunting314/263/head     -> origin/gh/shunting314/263/head
2025-12-04T09:33:41.6823691Z  * [new branch]              gh/shunting314/263/orig     -> origin/gh/shunting314/263/orig
2025-12-04T09:33:41.6825683Z  * [new branch]              gh/shunting314/264/base     -> origin/gh/shunting314/264/base
2025-12-04T09:33:41.6827375Z  * [new branch]              gh/shunting314/264/head     -> origin/gh/shunting314/264/head
2025-12-04T09:33:41.6828750Z  * [new branch]              gh/shunting314/264/orig     -> origin/gh/shunting314/264/orig
2025-12-04T09:33:41.6830709Z  * [new branch]              gh/shunting314/265/base     -> origin/gh/shunting314/265/base
2025-12-04T09:33:41.6832094Z  * [new branch]              gh/shunting314/265/head     -> origin/gh/shunting314/265/head
2025-12-04T09:33:41.6833564Z  * [new branch]              gh/shunting314/265/orig     -> origin/gh/shunting314/265/orig
2025-12-04T09:33:41.6835599Z  * [new branch]              gh/shunting314/266/base     -> origin/gh/shunting314/266/base
2025-12-04T09:33:41.6837858Z  * [new branch]              gh/shunting314/266/head     -> origin/gh/shunting314/266/head
2025-12-04T09:33:41.6839347Z  * [new branch]              gh/shunting314/266/orig     -> origin/gh/shunting314/266/orig
2025-12-04T09:33:41.6842608Z  * [new branch]              gh/shunting314/267/base     -> origin/gh/shunting314/267/base
2025-12-04T09:33:41.6844435Z  * [new branch]              gh/shunting314/267/head     -> origin/gh/shunting314/267/head
2025-12-04T09:33:41.6845871Z  * [new branch]              gh/shunting314/267/orig     -> origin/gh/shunting314/267/orig
2025-12-04T09:33:41.6848489Z  * [new branch]              gh/shunting314/268/base     -> origin/gh/shunting314/268/base
2025-12-04T09:33:41.6850088Z  * [new branch]              gh/shunting314/268/head     -> origin/gh/shunting314/268/head
2025-12-04T09:33:41.6851582Z  * [new branch]              gh/shunting314/268/orig     -> origin/gh/shunting314/268/orig
2025-12-04T09:33:41.6853588Z  * [new branch]              gh/shunting314/269/base     -> origin/gh/shunting314/269/base
2025-12-04T09:33:41.6855071Z  * [new branch]              gh/shunting314/269/head     -> origin/gh/shunting314/269/head
2025-12-04T09:33:41.6857188Z  * [new branch]              gh/shunting314/269/orig     -> origin/gh/shunting314/269/orig
2025-12-04T09:33:41.6859718Z  * [new branch]              gh/silverguo/1/base         -> origin/gh/silverguo/1/base
2025-12-04T09:33:41.6861167Z  * [new branch]              gh/silverguo/1/head         -> origin/gh/silverguo/1/head
2025-12-04T09:33:41.6863063Z  * [new branch]              gh/silverguo/2/base         -> origin/gh/silverguo/2/base
2025-12-04T09:33:41.6864603Z  * [new branch]              gh/silverguo/2/head         -> origin/gh/silverguo/2/head
2025-12-04T09:33:41.6866426Z  * [new branch]              gh/silverguo/3/base         -> origin/gh/silverguo/3/base
2025-12-04T09:33:41.6867864Z  * [new branch]              gh/silverguo/3/head         -> origin/gh/silverguo/3/head
2025-12-04T09:33:41.6869633Z  * [new branch]              gh/silverguo/4/base         -> origin/gh/silverguo/4/base
2025-12-04T09:33:41.6871334Z  * [new branch]              gh/silverguo/4/head         -> origin/gh/silverguo/4/head
2025-12-04T09:33:41.6873842Z  * [new branch]              gh/slayton58/39/base        -> origin/gh/slayton58/39/base
2025-12-04T09:33:41.6875295Z  * [new branch]              gh/slayton58/39/head        -> origin/gh/slayton58/39/head
2025-12-04T09:33:41.6876807Z  * [new branch]              gh/slayton58/39/orig        -> origin/gh/slayton58/39/orig
2025-12-04T09:33:41.6878800Z  * [new branch]              gh/slayton58/42/base        -> origin/gh/slayton58/42/base
2025-12-04T09:33:41.6880227Z  * [new branch]              gh/slayton58/42/head        -> origin/gh/slayton58/42/head
2025-12-04T09:33:41.6881914Z  * [new branch]              gh/slayton58/42/orig        -> origin/gh/slayton58/42/orig
2025-12-04T09:33:41.6884027Z  * [new branch]              gh/slayton58/43/base        -> origin/gh/slayton58/43/base
2025-12-04T09:33:41.6885567Z  * [new branch]              gh/slayton58/43/head        -> origin/gh/slayton58/43/head
2025-12-04T09:33:41.6887070Z  * [new branch]              gh/slayton58/43/orig        -> origin/gh/slayton58/43/orig
2025-12-04T09:33:41.6889180Z  * [new branch]              gh/slayton58/44/base        -> origin/gh/slayton58/44/base
2025-12-04T09:33:41.6890815Z  * [new branch]              gh/slayton58/44/head        -> origin/gh/slayton58/44/head
2025-12-04T09:33:41.6892216Z  * [new branch]              gh/slayton58/44/orig        -> origin/gh/slayton58/44/orig
2025-12-04T09:33:41.6894175Z  * [new branch]              gh/slayton58/45/base        -> origin/gh/slayton58/45/base
2025-12-04T09:33:41.6895620Z  * [new branch]              gh/slayton58/45/head        -> origin/gh/slayton58/45/head
2025-12-04T09:33:41.6897301Z  * [new branch]              gh/slayton58/45/orig        -> origin/gh/slayton58/45/orig
2025-12-04T09:33:41.6899781Z  * [new branch]              gh/slayton58/46/base        -> origin/gh/slayton58/46/base
2025-12-04T09:33:41.6901420Z  * [new branch]              gh/slayton58/46/head        -> origin/gh/slayton58/46/head
2025-12-04T09:33:41.6902884Z  * [new branch]              gh/slayton58/46/orig        -> origin/gh/slayton58/46/orig
2025-12-04T09:33:41.6905074Z  * [new branch]              gh/slayton58/6/base         -> origin/gh/slayton58/6/base
2025-12-04T09:33:41.6906668Z  * [new branch]              gh/slayton58/6/head         -> origin/gh/slayton58/6/head
2025-12-04T09:33:41.6908500Z  * [new branch]              gh/slayton58/7/base         -> origin/gh/slayton58/7/base
2025-12-04T09:33:41.6909920Z  * [new branch]              gh/slayton58/7/head         -> origin/gh/slayton58/7/head
2025-12-04T09:33:41.6912766Z  * [new branch]              gh/soulitzer/269/base       -> origin/gh/soulitzer/269/base
2025-12-04T09:33:41.6914282Z  * [new branch]              gh/soulitzer/269/head       -> origin/gh/soulitzer/269/head
2025-12-04T09:33:41.6916281Z  * [new branch]              gh/soulitzer/269/orig       -> origin/gh/soulitzer/269/orig
2025-12-04T09:33:41.6918407Z  * [new branch]              gh/soulitzer/276/base       -> origin/gh/soulitzer/276/base
2025-12-04T09:33:41.6919917Z  * [new branch]              gh/soulitzer/276/head       -> origin/gh/soulitzer/276/head
2025-12-04T09:33:41.6921415Z  * [new branch]              gh/soulitzer/276/orig       -> origin/gh/soulitzer/276/orig
2025-12-04T09:33:41.6923808Z  * [new branch]              gh/soulitzer/287/base       -> origin/gh/soulitzer/287/base
2025-12-04T09:33:41.6925254Z  * [new branch]              gh/soulitzer/287/head       -> origin/gh/soulitzer/287/head
2025-12-04T09:33:41.6926893Z  * [new branch]              gh/soulitzer/287/orig       -> origin/gh/soulitzer/287/orig
2025-12-04T09:33:41.6929019Z  * [new branch]              gh/soulitzer/296/base       -> origin/gh/soulitzer/296/base
2025-12-04T09:33:41.6930539Z  * [new branch]              gh/soulitzer/296/head       -> origin/gh/soulitzer/296/head
2025-12-04T09:33:41.6931988Z  * [new branch]              gh/soulitzer/296/orig       -> origin/gh/soulitzer/296/orig
2025-12-04T09:33:41.6933984Z  * [new branch]              gh/soulitzer/299/base       -> origin/gh/soulitzer/299/base
2025-12-04T09:33:41.6935624Z  * [new branch]              gh/soulitzer/299/head       -> origin/gh/soulitzer/299/head
2025-12-04T09:33:41.6937221Z  * [new branch]              gh/soulitzer/299/orig       -> origin/gh/soulitzer/299/orig
2025-12-04T09:33:41.6939272Z  * [new branch]              gh/soulitzer/300/base       -> origin/gh/soulitzer/300/base
2025-12-04T09:33:41.6941369Z  * [new branch]              gh/soulitzer/300/head       -> origin/gh/soulitzer/300/head
2025-12-04T09:33:41.6942856Z  * [new branch]              gh/soulitzer/300/orig       -> origin/gh/soulitzer/300/orig
2025-12-04T09:33:41.6945096Z  * [new branch]              gh/soulitzer/301/base       -> origin/gh/soulitzer/301/base
2025-12-04T09:33:41.6946653Z  * [new branch]              gh/soulitzer/301/head       -> origin/gh/soulitzer/301/head
2025-12-04T09:33:41.6948192Z  * [new branch]              gh/soulitzer/301/orig       -> origin/gh/soulitzer/301/orig
2025-12-04T09:33:41.6950146Z  * [new branch]              gh/soulitzer/313/base       -> origin/gh/soulitzer/313/base
2025-12-04T09:33:41.6951629Z  * [new branch]              gh/soulitzer/313/head       -> origin/gh/soulitzer/313/head
2025-12-04T09:33:41.6953243Z  * [new branch]              gh/soulitzer/313/orig       -> origin/gh/soulitzer/313/orig
2025-12-04T09:33:41.6955169Z  * [new branch]              gh/soulitzer/319/base       -> origin/gh/soulitzer/319/base
2025-12-04T09:33:41.6957038Z  * [new branch]              gh/soulitzer/319/head       -> origin/gh/soulitzer/319/head
2025-12-04T09:33:41.6958481Z  * [new branch]              gh/soulitzer/319/orig       -> origin/gh/soulitzer/319/orig
2025-12-04T09:33:41.6960568Z  * [new branch]              gh/soulitzer/320/base       -> origin/gh/soulitzer/320/base
2025-12-04T09:33:41.6962053Z  * [new branch]              gh/soulitzer/320/head       -> origin/gh/soulitzer/320/head
2025-12-04T09:33:41.6964042Z  * [new branch]              gh/soulitzer/320/orig       -> origin/gh/soulitzer/320/orig
2025-12-04T09:33:41.6966246Z  * [new branch]              gh/soulitzer/336/base       -> origin/gh/soulitzer/336/base
2025-12-04T09:33:41.6967677Z  * [new branch]              gh/soulitzer/336/head       -> origin/gh/soulitzer/336/head
2025-12-04T09:33:41.6969095Z  * [new branch]              gh/soulitzer/336/orig       -> origin/gh/soulitzer/336/orig
2025-12-04T09:33:41.6971772Z  * [new branch]              gh/soulitzer/347/base       -> origin/gh/soulitzer/347/base
2025-12-04T09:33:41.6973252Z  * [new branch]              gh/soulitzer/347/head       -> origin/gh/soulitzer/347/head
2025-12-04T09:33:41.6974689Z  * [new branch]              gh/soulitzer/347/orig       -> origin/gh/soulitzer/347/orig
2025-12-04T09:33:41.6977045Z  * [new branch]              gh/soulitzer/349/base       -> origin/gh/soulitzer/349/base
2025-12-04T09:33:41.6978532Z  * [new branch]              gh/soulitzer/349/head       -> origin/gh/soulitzer/349/head
2025-12-04T09:33:41.6980106Z  * [new branch]              gh/soulitzer/349/orig       -> origin/gh/soulitzer/349/orig
2025-12-04T09:33:41.6982011Z  * [new branch]              gh/soulitzer/350/base       -> origin/gh/soulitzer/350/base
2025-12-04T09:33:41.6983399Z  * [new branch]              gh/soulitzer/350/head       -> origin/gh/soulitzer/350/head
2025-12-04T09:33:41.6984832Z  * [new branch]              gh/soulitzer/350/orig       -> origin/gh/soulitzer/350/orig
2025-12-04T09:33:41.6987012Z  * [new branch]              gh/soulitzer/351/base       -> origin/gh/soulitzer/351/base
2025-12-04T09:33:41.6988505Z  * [new branch]              gh/soulitzer/351/head       -> origin/gh/soulitzer/351/head
2025-12-04T09:33:41.6989974Z  * [new branch]              gh/soulitzer/351/orig       -> origin/gh/soulitzer/351/orig
2025-12-04T09:33:41.6991945Z  * [new branch]              gh/soulitzer/353/base       -> origin/gh/soulitzer/353/base
2025-12-04T09:33:41.6993627Z  * [new branch]              gh/soulitzer/353/head       -> origin/gh/soulitzer/353/head
2025-12-04T09:33:41.6995086Z  * [new branch]              gh/soulitzer/353/orig       -> origin/gh/soulitzer/353/orig
2025-12-04T09:33:41.6997903Z  * [new branch]              gh/soulitzer/358/base       -> origin/gh/soulitzer/358/base
2025-12-04T09:33:41.6999531Z  * [new branch]              gh/soulitzer/358/head       -> origin/gh/soulitzer/358/head
2025-12-04T09:33:41.7001104Z  * [new branch]              gh/soulitzer/358/orig       -> origin/gh/soulitzer/358/orig
2025-12-04T09:33:41.7003758Z  * [new branch]              gh/soulitzer/359/base       -> origin/gh/soulitzer/359/base
2025-12-04T09:33:41.7005265Z  * [new branch]              gh/soulitzer/359/head       -> origin/gh/soulitzer/359/head
2025-12-04T09:33:41.7006838Z  * [new branch]              gh/soulitzer/359/orig       -> origin/gh/soulitzer/359/orig
2025-12-04T09:33:41.7008953Z  * [new branch]              gh/soulitzer/374/base       -> origin/gh/soulitzer/374/base
2025-12-04T09:33:41.7010386Z  * [new branch]              gh/soulitzer/374/head       -> origin/gh/soulitzer/374/head
2025-12-04T09:33:41.7011897Z  * [new branch]              gh/soulitzer/374/orig       -> origin/gh/soulitzer/374/orig
2025-12-04T09:33:41.7013897Z  * [new branch]              gh/soulitzer/375/base       -> origin/gh/soulitzer/375/base
2025-12-04T09:33:41.7015378Z  * [new branch]              gh/soulitzer/375/head       -> origin/gh/soulitzer/375/head
2025-12-04T09:33:41.7016857Z  * [new branch]              gh/soulitzer/375/orig       -> origin/gh/soulitzer/375/orig
2025-12-04T09:33:41.7018825Z  * [new branch]              gh/soulitzer/380/base       -> origin/gh/soulitzer/380/base
2025-12-04T09:33:41.7020251Z  * [new branch]              gh/soulitzer/380/head       -> origin/gh/soulitzer/380/head
2025-12-04T09:33:41.7021707Z  * [new branch]              gh/soulitzer/380/orig       -> origin/gh/soulitzer/380/orig
2025-12-04T09:33:41.7023654Z  * [new branch]              gh/soulitzer/385/base       -> origin/gh/soulitzer/385/base
2025-12-04T09:33:41.7025143Z  * [new branch]              gh/soulitzer/385/head       -> origin/gh/soulitzer/385/head
2025-12-04T09:33:41.7026540Z  * [new branch]              gh/soulitzer/385/orig       -> origin/gh/soulitzer/385/orig
2025-12-04T09:33:41.7028653Z  * [new branch]              gh/soulitzer/386/base       -> origin/gh/soulitzer/386/base
2025-12-04T09:33:41.7030142Z  * [new branch]              gh/soulitzer/386/head       -> origin/gh/soulitzer/386/head
2025-12-04T09:33:41.7031658Z  * [new branch]              gh/soulitzer/386/orig       -> origin/gh/soulitzer/386/orig
2025-12-04T09:33:41.7034040Z  * [new branch]              gh/soulitzer/387/base       -> origin/gh/soulitzer/387/base
2025-12-04T09:33:41.7035474Z  * [new branch]              gh/soulitzer/387/head       -> origin/gh/soulitzer/387/head
2025-12-04T09:33:41.7036945Z  * [new branch]              gh/soulitzer/387/orig       -> origin/gh/soulitzer/387/orig
2025-12-04T09:33:41.7038982Z  * [new branch]              gh/soulitzer/388/base       -> origin/gh/soulitzer/388/base
2025-12-04T09:33:41.7040416Z  * [new branch]              gh/soulitzer/388/head       -> origin/gh/soulitzer/388/head
2025-12-04T09:33:41.7041904Z  * [new branch]              gh/soulitzer/388/orig       -> origin/gh/soulitzer/388/orig
2025-12-04T09:33:41.7043975Z  * [new branch]              gh/soulitzer/389/base       -> origin/gh/soulitzer/389/base
2025-12-04T09:33:41.7045389Z  * [new branch]              gh/soulitzer/389/head       -> origin/gh/soulitzer/389/head
2025-12-04T09:33:41.7046821Z  * [new branch]              gh/soulitzer/389/orig       -> origin/gh/soulitzer/389/orig
2025-12-04T09:33:41.7049354Z  * [new branch]              gh/soulitzer/390/base       -> origin/gh/soulitzer/390/base
2025-12-04T09:33:41.7050821Z  * [new branch]              gh/soulitzer/390/head       -> origin/gh/soulitzer/390/head
2025-12-04T09:33:41.7052349Z  * [new branch]              gh/soulitzer/390/orig       -> origin/gh/soulitzer/390/orig
2025-12-04T09:33:41.7054295Z  * [new branch]              gh/soulitzer/391/base       -> origin/gh/soulitzer/391/base
2025-12-04T09:33:41.7055747Z  * [new branch]              gh/soulitzer/391/head       -> origin/gh/soulitzer/391/head
2025-12-04T09:33:41.7057419Z  * [new branch]              gh/soulitzer/391/orig       -> origin/gh/soulitzer/391/orig
2025-12-04T09:33:41.7059384Z  * [new branch]              gh/soulitzer/392/base       -> origin/gh/soulitzer/392/base
2025-12-04T09:33:41.7060871Z  * [new branch]              gh/soulitzer/392/head       -> origin/gh/soulitzer/392/head
2025-12-04T09:33:41.7062346Z  * [new branch]              gh/soulitzer/392/orig       -> origin/gh/soulitzer/392/orig
2025-12-04T09:33:41.7064847Z  * [new branch]              gh/swolchok/728/next        -> origin/gh/swolchok/728/next
2025-12-04T09:33:41.7067112Z  * [new branch]              gh/swolchok/819/base        -> origin/gh/swolchok/819/base
2025-12-04T09:33:41.7068602Z  * [new branch]              gh/swolchok/819/head        -> origin/gh/swolchok/819/head
2025-12-04T09:33:41.7070097Z  * [new branch]              gh/swolchok/819/orig        -> origin/gh/swolchok/819/orig
2025-12-04T09:33:41.7072200Z  * [new branch]              gh/swolchok/824/base        -> origin/gh/swolchok/824/base
2025-12-04T09:33:41.7073806Z  * [new branch]              gh/swolchok/824/head        -> origin/gh/swolchok/824/head
2025-12-04T09:33:41.7075135Z  * [new branch]              gh/swolchok/824/orig        -> origin/gh/swolchok/824/orig
2025-12-04T09:33:41.7077158Z  * [new branch]              gh/swolchok/829/base        -> origin/gh/swolchok/829/base
2025-12-04T09:33:41.7078504Z  * [new branch]              gh/swolchok/829/head        -> origin/gh/swolchok/829/head
2025-12-04T09:33:41.7080030Z  * [new branch]              gh/swolchok/829/orig        -> origin/gh/swolchok/829/orig
2025-12-04T09:33:41.7082066Z  * [new branch]              gh/swolchok/839/base        -> origin/gh/swolchok/839/base
2025-12-04T09:33:41.7083414Z  * [new branch]              gh/swolchok/839/head        -> origin/gh/swolchok/839/head
2025-12-04T09:33:41.7084858Z  * [new branch]              gh/swolchok/839/orig        -> origin/gh/swolchok/839/orig
2025-12-04T09:33:41.7086747Z  * [new branch]              gh/swolchok/841/base        -> origin/gh/swolchok/841/base
2025-12-04T09:33:41.7088310Z  * [new branch]              gh/swolchok/841/head        -> origin/gh/swolchok/841/head
2025-12-04T09:33:41.7089855Z  * [new branch]              gh/swolchok/841/orig        -> origin/gh/swolchok/841/orig
2025-12-04T09:33:41.7091800Z  * [new branch]              gh/swolchok/842/base        -> origin/gh/swolchok/842/base
2025-12-04T09:33:41.7093333Z  * [new branch]              gh/swolchok/842/head        -> origin/gh/swolchok/842/head
2025-12-04T09:33:41.7095131Z  * [new branch]              gh/swolchok/842/orig        -> origin/gh/swolchok/842/orig
2025-12-04T09:33:41.7096794Z  * [new branch]              gh/swolchok/845/base        -> origin/gh/swolchok/845/base
2025-12-04T09:33:41.7098345Z  * [new branch]              gh/swolchok/845/head        -> origin/gh/swolchok/845/head
2025-12-04T09:33:41.7099862Z  * [new branch]              gh/swolchok/845/orig        -> origin/gh/swolchok/845/orig
2025-12-04T09:33:41.7102091Z  * [new branch]              gh/swolchok/848/base        -> origin/gh/swolchok/848/base
2025-12-04T09:33:41.7103603Z  * [new branch]              gh/swolchok/848/head        -> origin/gh/swolchok/848/head
2025-12-04T09:33:41.7105212Z  * [new branch]              gh/swolchok/848/orig        -> origin/gh/swolchok/848/orig
2025-12-04T09:33:41.7107070Z  * [new branch]              gh/swolchok/856/base        -> origin/gh/swolchok/856/base
2025-12-04T09:33:41.7108658Z  * [new branch]              gh/swolchok/856/head        -> origin/gh/swolchok/856/head
2025-12-04T09:33:41.7110125Z  * [new branch]              gh/swolchok/856/orig        -> origin/gh/swolchok/856/orig
2025-12-04T09:33:41.7112269Z  * [new branch]              gh/swolchok/860/base        -> origin/gh/swolchok/860/base
2025-12-04T09:33:41.7113728Z  * [new branch]              gh/swolchok/860/head        -> origin/gh/swolchok/860/head
2025-12-04T09:33:41.7115604Z  * [new branch]              gh/swolchok/860/orig        -> origin/gh/swolchok/860/orig
2025-12-04T09:33:41.7117871Z  * [new branch]              gh/swolchok/861/base        -> origin/gh/swolchok/861/base
2025-12-04T09:33:41.7119413Z  * [new branch]              gh/swolchok/861/head        -> origin/gh/swolchok/861/head
2025-12-04T09:33:41.7120890Z  * [new branch]              gh/swolchok/861/orig        -> origin/gh/swolchok/861/orig
2025-12-04T09:33:41.7122969Z  * [new branch]              gh/swolchok/862/base        -> origin/gh/swolchok/862/base
2025-12-04T09:33:41.7124334Z  * [new branch]              gh/swolchok/862/head        -> origin/gh/swolchok/862/head
2025-12-04T09:33:41.7125867Z  * [new branch]              gh/swolchok/862/orig        -> origin/gh/swolchok/862/orig
2025-12-04T09:33:41.7127974Z  * [new branch]              gh/swolchok/863/base        -> origin/gh/swolchok/863/base
2025-12-04T09:33:41.7129462Z  * [new branch]              gh/swolchok/863/head        -> origin/gh/swolchok/863/head
2025-12-04T09:33:41.7131614Z  * [new branch]              gh/swolchok/863/orig        -> origin/gh/swolchok/863/orig
2025-12-04T09:33:41.7133758Z  * [new branch]              gh/swolchok/864/base        -> origin/gh/swolchok/864/base
2025-12-04T09:33:41.7135582Z  * [new branch]              gh/swolchok/864/head        -> origin/gh/swolchok/864/head
2025-12-04T09:33:41.7137251Z  * [new branch]              gh/swolchok/864/orig        -> origin/gh/swolchok/864/orig
2025-12-04T09:33:41.7139181Z  * [new branch]              gh/swolchok/865/base        -> origin/gh/swolchok/865/base
2025-12-04T09:33:41.7140927Z  * [new branch]              gh/swolchok/865/head        -> origin/gh/swolchok/865/head
2025-12-04T09:33:41.7142366Z  * [new branch]              gh/swolchok/865/orig        -> origin/gh/swolchok/865/orig
2025-12-04T09:33:41.7145047Z  * [new branch]              gh/swolchok/866/base        -> origin/gh/swolchok/866/base
2025-12-04T09:33:41.7146544Z  * [new branch]              gh/swolchok/866/head        -> origin/gh/swolchok/866/head
2025-12-04T09:33:41.7148069Z  * [new branch]              gh/swolchok/866/orig        -> origin/gh/swolchok/866/orig
2025-12-04T09:33:41.7150004Z  * [new branch]              gh/swolchok/867/base        -> origin/gh/swolchok/867/base
2025-12-04T09:33:41.7151638Z  * [new branch]              gh/swolchok/867/head        -> origin/gh/swolchok/867/head
2025-12-04T09:33:41.7153717Z  * [new branch]              gh/swolchok/867/orig        -> origin/gh/swolchok/867/orig
2025-12-04T09:33:41.7155688Z  * [new branch]              gh/swolchok/868/base        -> origin/gh/swolchok/868/base
2025-12-04T09:33:41.7157238Z  * [new branch]              gh/swolchok/868/head        -> origin/gh/swolchok/868/head
2025-12-04T09:33:41.7158725Z  * [new branch]              gh/swolchok/868/orig        -> origin/gh/swolchok/868/orig
2025-12-04T09:33:41.7160809Z  * [new branch]              gh/swolchok/869/base        -> origin/gh/swolchok/869/base
2025-12-04T09:33:41.7162344Z  * [new branch]              gh/swolchok/869/head        -> origin/gh/swolchok/869/head
2025-12-04T09:33:41.7163923Z  * [new branch]              gh/swolchok/869/orig        -> origin/gh/swolchok/869/orig
2025-12-04T09:33:41.7166053Z  * [new branch]              gh/swolchok/870/base        -> origin/gh/swolchok/870/base
2025-12-04T09:33:41.7167439Z  * [new branch]              gh/swolchok/870/head        -> origin/gh/swolchok/870/head
2025-12-04T09:33:41.7168971Z  * [new branch]              gh/swolchok/870/orig        -> origin/gh/swolchok/870/orig
2025-12-04T09:33:41.7171210Z  * [new branch]              gh/swolchok/871/base        -> origin/gh/swolchok/871/base
2025-12-04T09:33:41.7176645Z  * [new branch]              gh/swolchok/871/head        -> origin/gh/swolchok/871/head
2025-12-04T09:33:41.7178267Z  * [new branch]              gh/swolchok/871/orig        -> origin/gh/swolchok/871/orig
2025-12-04T09:33:41.7181011Z  * [new branch]              gh/teja-rao/4/base          -> origin/gh/teja-rao/4/base
2025-12-04T09:33:41.7182587Z  * [new branch]              gh/teja-rao/4/head          -> origin/gh/teja-rao/4/head
2025-12-04T09:33:41.7184081Z  * [new branch]              gh/teja-rao/4/orig          -> origin/gh/teja-rao/4/orig
2025-12-04T09:33:41.7186594Z  * [new branch]              gh/tianyu-l/2/base          -> origin/gh/tianyu-l/2/base
2025-12-04T09:33:41.7188046Z  * [new branch]              gh/tianyu-l/2/head          -> origin/gh/tianyu-l/2/head
2025-12-04T09:33:41.7189551Z  * [new branch]              gh/tianyu-l/2/orig          -> origin/gh/tianyu-l/2/orig
2025-12-04T09:33:41.7191491Z  * [new branch]              gh/tianyu-l/3/base          -> origin/gh/tianyu-l/3/base
2025-12-04T09:33:41.7193028Z  * [new branch]              gh/tianyu-l/3/orig          -> origin/gh/tianyu-l/3/orig
2025-12-04T09:33:41.7195213Z  * [new branch]              gh/tianyu-l/4/base          -> origin/gh/tianyu-l/4/base
2025-12-04T09:33:41.7196678Z  * [new branch]              gh/tianyu-l/4/head          -> origin/gh/tianyu-l/4/head
2025-12-04T09:33:41.7198231Z  * [new branch]              gh/tianyu-l/4/orig          -> origin/gh/tianyu-l/4/orig
2025-12-04T09:33:41.7201312Z  * [new branch]              gh/tugsbayasgalan/10/base   -> origin/gh/tugsbayasgalan/10/base
2025-12-04T09:33:41.7202790Z  * [new branch]              gh/tugsbayasgalan/10/head   -> origin/gh/tugsbayasgalan/10/head
2025-12-04T09:33:41.7204233Z  * [new branch]              gh/tugsbayasgalan/10/orig   -> origin/gh/tugsbayasgalan/10/orig
2025-12-04T09:33:41.7206208Z  * [new branch]              gh/tugsbayasgalan/13/base   -> origin/gh/tugsbayasgalan/13/base
2025-12-04T09:33:41.7207899Z  * [new branch]              gh/tugsbayasgalan/13/head   -> origin/gh/tugsbayasgalan/13/head
2025-12-04T09:33:41.7209354Z  * [new branch]              gh/tugsbayasgalan/13/orig   -> origin/gh/tugsbayasgalan/13/orig
2025-12-04T09:33:41.7211592Z  * [new branch]              gh/tugsbayasgalan/17/base   -> origin/gh/tugsbayasgalan/17/base
2025-12-04T09:33:41.7212980Z  * [new branch]              gh/tugsbayasgalan/17/head   -> origin/gh/tugsbayasgalan/17/head
2025-12-04T09:33:41.7214563Z  * [new branch]              gh/tugsbayasgalan/17/orig   -> origin/gh/tugsbayasgalan/17/orig
2025-12-04T09:33:41.7216841Z  * [new branch]              gh/tugsbayasgalan/2/base    -> origin/gh/tugsbayasgalan/2/base
2025-12-04T09:33:41.7218343Z  * [new branch]              gh/tugsbayasgalan/2/head    -> origin/gh/tugsbayasgalan/2/head
2025-12-04T09:33:41.7219831Z  * [new branch]              gh/tugsbayasgalan/2/orig    -> origin/gh/tugsbayasgalan/2/orig
2025-12-04T09:33:41.7222242Z  * [new branch]              gh/tugsbayasgalan/28/base   -> origin/gh/tugsbayasgalan/28/base
2025-12-04T09:33:41.7223789Z  * [new branch]              gh/tugsbayasgalan/28/head   -> origin/gh/tugsbayasgalan/28/head
2025-12-04T09:33:41.7225247Z  * [new branch]              gh/tugsbayasgalan/28/orig   -> origin/gh/tugsbayasgalan/28/orig
2025-12-04T09:33:41.7227350Z  * [new branch]              gh/tugsbayasgalan/32/base   -> origin/gh/tugsbayasgalan/32/base
2025-12-04T09:33:41.7229380Z  * [new branch]              gh/tugsbayasgalan/32/head   -> origin/gh/tugsbayasgalan/32/head
2025-12-04T09:33:41.7230889Z  * [new branch]              gh/tugsbayasgalan/32/orig   -> origin/gh/tugsbayasgalan/32/orig
2025-12-04T09:33:41.7233045Z  * [new branch]              gh/tugsbayasgalan/35/base   -> origin/gh/tugsbayasgalan/35/base
2025-12-04T09:33:41.7234696Z  * [new branch]              gh/tugsbayasgalan/35/head   -> origin/gh/tugsbayasgalan/35/head
2025-12-04T09:33:41.7236190Z  * [new branch]              gh/tugsbayasgalan/35/orig   -> origin/gh/tugsbayasgalan/35/orig
2025-12-04T09:33:41.7238315Z  * [new branch]              gh/tugsbayasgalan/36/base   -> origin/gh/tugsbayasgalan/36/base
2025-12-04T09:33:41.7239824Z  * [new branch]              gh/tugsbayasgalan/36/head   -> origin/gh/tugsbayasgalan/36/head
2025-12-04T09:33:41.7241369Z  * [new branch]              gh/tugsbayasgalan/36/orig   -> origin/gh/tugsbayasgalan/36/orig
2025-12-04T09:33:41.7243381Z  * [new branch]              gh/tugsbayasgalan/37/base   -> origin/gh/tugsbayasgalan/37/base
2025-12-04T09:33:41.7244865Z  * [new branch]              gh/tugsbayasgalan/37/head   -> origin/gh/tugsbayasgalan/37/head
2025-12-04T09:33:41.7246327Z  * [new branch]              gh/tugsbayasgalan/37/orig   -> origin/gh/tugsbayasgalan/37/orig
2025-12-04T09:33:41.7248365Z  * [new branch]              gh/tugsbayasgalan/43/base   -> origin/gh/tugsbayasgalan/43/base
2025-12-04T09:33:41.7249888Z  * [new branch]              gh/tugsbayasgalan/43/head   -> origin/gh/tugsbayasgalan/43/head
2025-12-04T09:33:41.7251420Z  * [new branch]              gh/tugsbayasgalan/43/orig   -> origin/gh/tugsbayasgalan/43/orig
2025-12-04T09:33:41.7253848Z  * [new branch]              gh/tugsbayasgalan/48/base   -> origin/gh/tugsbayasgalan/48/base
2025-12-04T09:33:41.7255351Z  * [new branch]              gh/tugsbayasgalan/48/head   -> origin/gh/tugsbayasgalan/48/head
2025-12-04T09:33:41.7257060Z  * [new branch]              gh/tugsbayasgalan/48/orig   -> origin/gh/tugsbayasgalan/48/orig
2025-12-04T09:33:41.7259216Z  * [new branch]              gh/tugsbayasgalan/51/base   -> origin/gh/tugsbayasgalan/51/base
2025-12-04T09:33:41.7260844Z  * [new branch]              gh/tugsbayasgalan/51/head   -> origin/gh/tugsbayasgalan/51/head
2025-12-04T09:33:41.7262306Z  * [new branch]              gh/tugsbayasgalan/51/orig   -> origin/gh/tugsbayasgalan/51/orig
2025-12-04T09:33:41.7264094Z  * [new branch]              gh/tugsbayasgalan/52/base   -> origin/gh/tugsbayasgalan/52/base
2025-12-04T09:33:41.7265655Z  * [new branch]              gh/tugsbayasgalan/52/head   -> origin/gh/tugsbayasgalan/52/head
2025-12-04T09:33:41.7267200Z  * [new branch]              gh/tugsbayasgalan/52/orig   -> origin/gh/tugsbayasgalan/52/orig
2025-12-04T09:33:41.7269245Z  * [new branch]              gh/tugsbayasgalan/53/base   -> origin/gh/tugsbayasgalan/53/base
2025-12-04T09:33:41.7270705Z  * [new branch]              gh/tugsbayasgalan/53/head   -> origin/gh/tugsbayasgalan/53/head
2025-12-04T09:33:41.7272433Z  * [new branch]              gh/tugsbayasgalan/53/orig   -> origin/gh/tugsbayasgalan/53/orig
2025-12-04T09:33:41.7274592Z  * [new branch]              gh/tugsbayasgalan/55/base   -> origin/gh/tugsbayasgalan/55/base
2025-12-04T09:33:41.7276287Z  * [new branch]              gh/tugsbayasgalan/55/head   -> origin/gh/tugsbayasgalan/55/head
2025-12-04T09:33:41.7277852Z  * [new branch]              gh/tugsbayasgalan/55/orig   -> origin/gh/tugsbayasgalan/55/orig
2025-12-04T09:33:41.7280108Z  * [new branch]              gh/tugsbayasgalan/59/base   -> origin/gh/tugsbayasgalan/59/base
2025-12-04T09:33:41.7281758Z  * [new branch]              gh/tugsbayasgalan/59/head   -> origin/gh/tugsbayasgalan/59/head
2025-12-04T09:33:41.7283238Z  * [new branch]              gh/tugsbayasgalan/59/orig   -> origin/gh/tugsbayasgalan/59/orig
2025-12-04T09:33:41.7285178Z  * [new branch]              gh/tugsbayasgalan/6/base    -> origin/gh/tugsbayasgalan/6/base
2025-12-04T09:33:41.7286625Z  * [new branch]              gh/tugsbayasgalan/6/head    -> origin/gh/tugsbayasgalan/6/head
2025-12-04T09:33:41.7288238Z  * [new branch]              gh/tugsbayasgalan/6/orig    -> origin/gh/tugsbayasgalan/6/orig
2025-12-04T09:33:41.7290124Z  * [new branch]              gh/tugsbayasgalan/60/base   -> origin/gh/tugsbayasgalan/60/base
2025-12-04T09:33:41.7291680Z  * [new branch]              gh/tugsbayasgalan/60/head   -> origin/gh/tugsbayasgalan/60/head
2025-12-04T09:33:41.7293177Z  * [new branch]              gh/tugsbayasgalan/60/orig   -> origin/gh/tugsbayasgalan/60/orig
2025-12-04T09:33:41.7295764Z  * [new branch]              gh/tugsbayasgalan/61/base   -> origin/gh/tugsbayasgalan/61/base
2025-12-04T09:33:41.7297305Z  * [new branch]              gh/tugsbayasgalan/61/head   -> origin/gh/tugsbayasgalan/61/head
2025-12-04T09:33:41.7298792Z  * [new branch]              gh/tugsbayasgalan/61/orig   -> origin/gh/tugsbayasgalan/61/orig
2025-12-04T09:33:41.7301043Z  * [new branch]              gh/tugsbayasgalan/63/base   -> origin/gh/tugsbayasgalan/63/base
2025-12-04T09:33:41.7302561Z  * [new branch]              gh/tugsbayasgalan/63/head   -> origin/gh/tugsbayasgalan/63/head
2025-12-04T09:33:41.7304052Z  * [new branch]              gh/tugsbayasgalan/63/orig   -> origin/gh/tugsbayasgalan/63/orig
2025-12-04T09:33:41.7306140Z  * [new branch]              gh/tugsbayasgalan/67/base   -> origin/gh/tugsbayasgalan/67/base
2025-12-04T09:33:41.7307618Z  * [new branch]              gh/tugsbayasgalan/67/head   -> origin/gh/tugsbayasgalan/67/head
2025-12-04T09:33:41.7309096Z  * [new branch]              gh/tugsbayasgalan/67/orig   -> origin/gh/tugsbayasgalan/67/orig
2025-12-04T09:33:41.7311351Z  * [new branch]              gh/tugsbayasgalan/68/base   -> origin/gh/tugsbayasgalan/68/base
2025-12-04T09:33:41.7312848Z  * [new branch]              gh/tugsbayasgalan/68/head   -> origin/gh/tugsbayasgalan/68/head
2025-12-04T09:33:41.7314351Z  * [new branch]              gh/tugsbayasgalan/68/orig   -> origin/gh/tugsbayasgalan/68/orig
2025-12-04T09:33:41.7316381Z  * [new branch]              gh/tugsbayasgalan/7/base    -> origin/gh/tugsbayasgalan/7/base
2025-12-04T09:33:41.7317940Z  * [new branch]              gh/tugsbayasgalan/7/head    -> origin/gh/tugsbayasgalan/7/head
2025-12-04T09:33:41.7319526Z  * [new branch]              gh/tugsbayasgalan/7/orig    -> origin/gh/tugsbayasgalan/7/orig
2025-12-04T09:33:41.7321975Z  * [new branch]              gh/tugsbayasgalan/70/base   -> origin/gh/tugsbayasgalan/70/base
2025-12-04T09:33:41.7323703Z  * [new branch]              gh/tugsbayasgalan/70/head   -> origin/gh/tugsbayasgalan/70/head
2025-12-04T09:33:41.7325262Z  * [new branch]              gh/tugsbayasgalan/70/orig   -> origin/gh/tugsbayasgalan/70/orig
2025-12-04T09:33:41.7327610Z  * [new branch]              gh/tugsbayasgalan/71/base   -> origin/gh/tugsbayasgalan/71/base
2025-12-04T09:33:41.7329282Z  * [new branch]              gh/tugsbayasgalan/71/head   -> origin/gh/tugsbayasgalan/71/head
2025-12-04T09:33:41.7330883Z  * [new branch]              gh/tugsbayasgalan/71/orig   -> origin/gh/tugsbayasgalan/71/orig
2025-12-04T09:33:41.7333165Z  * [new branch]              gh/tugsbayasgalan/72/base   -> origin/gh/tugsbayasgalan/72/base
2025-12-04T09:33:41.7334721Z  * [new branch]              gh/tugsbayasgalan/72/head   -> origin/gh/tugsbayasgalan/72/head
2025-12-04T09:33:41.7336232Z  * [new branch]              gh/tugsbayasgalan/72/orig   -> origin/gh/tugsbayasgalan/72/orig
2025-12-04T09:33:41.7338505Z  * [new branch]              gh/tugsbayasgalan/73/base   -> origin/gh/tugsbayasgalan/73/base
2025-12-04T09:33:41.7340116Z  * [new branch]              gh/tugsbayasgalan/73/head   -> origin/gh/tugsbayasgalan/73/head
2025-12-04T09:33:41.7341645Z  * [new branch]              gh/tugsbayasgalan/73/orig   -> origin/gh/tugsbayasgalan/73/orig
2025-12-04T09:33:41.7344573Z  * [new branch]              gh/tugsbayasgalan/74/base   -> origin/gh/tugsbayasgalan/74/base
2025-12-04T09:33:41.7346205Z  * [new branch]              gh/tugsbayasgalan/74/head   -> origin/gh/tugsbayasgalan/74/head
2025-12-04T09:33:41.7347698Z  * [new branch]              gh/tugsbayasgalan/74/orig   -> origin/gh/tugsbayasgalan/74/orig
2025-12-04T09:33:41.7349819Z  * [new branch]              gh/tugsbayasgalan/75/base   -> origin/gh/tugsbayasgalan/75/base
2025-12-04T09:33:41.7351304Z  * [new branch]              gh/tugsbayasgalan/75/head   -> origin/gh/tugsbayasgalan/75/head
2025-12-04T09:33:41.7352916Z  * [new branch]              gh/tugsbayasgalan/75/orig   -> origin/gh/tugsbayasgalan/75/orig
2025-12-04T09:33:41.7354751Z  * [new branch]              gh/tugsbayasgalan/76/base   -> origin/gh/tugsbayasgalan/76/base
2025-12-04T09:33:41.7356389Z  * [new branch]              gh/tugsbayasgalan/76/head   -> origin/gh/tugsbayasgalan/76/head
2025-12-04T09:33:41.7357817Z  * [new branch]              gh/tugsbayasgalan/76/orig   -> origin/gh/tugsbayasgalan/76/orig
2025-12-04T09:33:41.7360702Z  * [new branch]              gh/tugsbayasgalan/77/base   -> origin/gh/tugsbayasgalan/77/base
2025-12-04T09:33:41.7362115Z  * [new branch]              gh/tugsbayasgalan/77/head   -> origin/gh/tugsbayasgalan/77/head
2025-12-04T09:33:41.7363678Z  * [new branch]              gh/tugsbayasgalan/77/orig   -> origin/gh/tugsbayasgalan/77/orig
2025-12-04T09:33:41.7365858Z  * [new branch]              gh/tugsbayasgalan/78/base   -> origin/gh/tugsbayasgalan/78/base
2025-12-04T09:33:41.7367504Z  * [new branch]              gh/tugsbayasgalan/78/head   -> origin/gh/tugsbayasgalan/78/head
2025-12-04T09:33:41.7369029Z  * [new branch]              gh/tugsbayasgalan/78/orig   -> origin/gh/tugsbayasgalan/78/orig
2025-12-04T09:33:41.7371285Z  * [new branch]              gh/tugsbayasgalan/79/base   -> origin/gh/tugsbayasgalan/79/base
2025-12-04T09:33:41.7372944Z  * [new branch]              gh/tugsbayasgalan/79/head   -> origin/gh/tugsbayasgalan/79/head
2025-12-04T09:33:41.7374424Z  * [new branch]              gh/tugsbayasgalan/79/orig   -> origin/gh/tugsbayasgalan/79/orig
2025-12-04T09:33:41.7376603Z  * [new branch]              gh/tugsbayasgalan/8/base    -> origin/gh/tugsbayasgalan/8/base
2025-12-04T09:33:41.7378029Z  * [new branch]              gh/tugsbayasgalan/8/head    -> origin/gh/tugsbayasgalan/8/head
2025-12-04T09:33:41.7379643Z  * [new branch]              gh/tugsbayasgalan/8/orig    -> origin/gh/tugsbayasgalan/8/orig
2025-12-04T09:33:41.7381466Z  * [new branch]              gh/tugsbayasgalan/80/base   -> origin/gh/tugsbayasgalan/80/base
2025-12-04T09:33:41.7382892Z  * [new branch]              gh/tugsbayasgalan/80/head   -> origin/gh/tugsbayasgalan/80/head
2025-12-04T09:33:41.7384399Z  * [new branch]              gh/tugsbayasgalan/80/orig   -> origin/gh/tugsbayasgalan/80/orig
2025-12-04T09:33:41.7386637Z  * [new branch]              gh/tugsbayasgalan/81/base   -> origin/gh/tugsbayasgalan/81/base
2025-12-04T09:33:41.7388011Z  * [new branch]              gh/tugsbayasgalan/81/head   -> origin/gh/tugsbayasgalan/81/head
2025-12-04T09:33:41.7389513Z  * [new branch]              gh/tugsbayasgalan/81/orig   -> origin/gh/tugsbayasgalan/81/orig
2025-12-04T09:33:41.7392373Z  * [new branch]              gh/tugsbayasgalan/82/base   -> origin/gh/tugsbayasgalan/82/base
2025-12-04T09:33:41.7394153Z  * [new branch]              gh/tugsbayasgalan/82/head   -> origin/gh/tugsbayasgalan/82/head
2025-12-04T09:33:41.7395732Z  * [new branch]              gh/tugsbayasgalan/82/orig   -> origin/gh/tugsbayasgalan/82/orig
2025-12-04T09:33:41.7397628Z  * [new branch]              gh/tugsbayasgalan/83/base   -> origin/gh/tugsbayasgalan/83/base
2025-12-04T09:33:41.7399201Z  * [new branch]              gh/tugsbayasgalan/83/head   -> origin/gh/tugsbayasgalan/83/head
2025-12-04T09:33:41.7400700Z  * [new branch]              gh/tugsbayasgalan/83/orig   -> origin/gh/tugsbayasgalan/83/orig
2025-12-04T09:33:41.7402549Z  * [new branch]              gh/tugsbayasgalan/84/base   -> origin/gh/tugsbayasgalan/84/base
2025-12-04T09:33:41.7404120Z  * [new branch]              gh/tugsbayasgalan/84/head   -> origin/gh/tugsbayasgalan/84/head
2025-12-04T09:33:41.7405626Z  * [new branch]              gh/tugsbayasgalan/84/orig   -> origin/gh/tugsbayasgalan/84/orig
2025-12-04T09:33:41.7407611Z  * [new branch]              gh/tugsbayasgalan/85/base   -> origin/gh/tugsbayasgalan/85/base
2025-12-04T09:33:41.7409329Z  * [new branch]              gh/tugsbayasgalan/85/head   -> origin/gh/tugsbayasgalan/85/head
2025-12-04T09:33:41.7410823Z  * [new branch]              gh/tugsbayasgalan/85/orig   -> origin/gh/tugsbayasgalan/85/orig
2025-12-04T09:33:41.7412915Z  * [new branch]              gh/tugsbayasgalan/86/base   -> origin/gh/tugsbayasgalan/86/base
2025-12-04T09:33:41.7414554Z  * [new branch]              gh/tugsbayasgalan/86/head   -> origin/gh/tugsbayasgalan/86/head
2025-12-04T09:33:41.7415995Z  * [new branch]              gh/tugsbayasgalan/86/orig   -> origin/gh/tugsbayasgalan/86/orig
2025-12-04T09:33:41.7418636Z  * [new branch]              gh/tugsbayasgalan/87/base   -> origin/gh/tugsbayasgalan/87/base
2025-12-04T09:33:41.7420118Z  * [new branch]              gh/tugsbayasgalan/87/head   -> origin/gh/tugsbayasgalan/87/head
2025-12-04T09:33:41.7421689Z  * [new branch]              gh/tugsbayasgalan/87/orig   -> origin/gh/tugsbayasgalan/87/orig
2025-12-04T09:33:41.7423881Z  * [new branch]              gh/tugsbayasgalan/88/base   -> origin/gh/tugsbayasgalan/88/base
2025-12-04T09:33:41.7425342Z  * [new branch]              gh/tugsbayasgalan/88/head   -> origin/gh/tugsbayasgalan/88/head
2025-12-04T09:33:41.7426965Z  * [new branch]              gh/tugsbayasgalan/88/orig   -> origin/gh/tugsbayasgalan/88/orig
2025-12-04T09:33:41.7429196Z  * [new branch]              gh/tugsbayasgalan/89/base   -> origin/gh/tugsbayasgalan/89/base
2025-12-04T09:33:41.7430638Z  * [new branch]              gh/tugsbayasgalan/89/head   -> origin/gh/tugsbayasgalan/89/head
2025-12-04T09:33:41.7432164Z  * [new branch]              gh/tugsbayasgalan/89/orig   -> origin/gh/tugsbayasgalan/89/orig
2025-12-04T09:33:41.7434264Z  * [new branch]              gh/tugsbayasgalan/9/base    -> origin/gh/tugsbayasgalan/9/base
2025-12-04T09:33:41.7435648Z  * [new branch]              gh/tugsbayasgalan/9/head    -> origin/gh/tugsbayasgalan/9/head
2025-12-04T09:33:41.7437118Z  * [new branch]              gh/tugsbayasgalan/9/orig    -> origin/gh/tugsbayasgalan/9/orig
2025-12-04T09:33:41.7440464Z  * [new branch]              gh/tugsbayasgalan/90/base   -> origin/gh/tugsbayasgalan/90/base
2025-12-04T09:33:41.7441789Z  * [new branch]              gh/tugsbayasgalan/90/head   -> origin/gh/tugsbayasgalan/90/head
2025-12-04T09:33:41.7443379Z  * [new branch]              gh/tugsbayasgalan/90/orig   -> origin/gh/tugsbayasgalan/90/orig
2025-12-04T09:33:41.7445812Z  * [new branch]              gh/tugsbayasgalan/91/base   -> origin/gh/tugsbayasgalan/91/base
2025-12-04T09:33:41.7447265Z  * [new branch]              gh/tugsbayasgalan/91/head   -> origin/gh/tugsbayasgalan/91/head
2025-12-04T09:33:41.7448782Z  * [new branch]              gh/tugsbayasgalan/91/orig   -> origin/gh/tugsbayasgalan/91/orig
2025-12-04T09:33:41.7451087Z  * [new branch]              gh/tugsbayasgalan/92/base   -> origin/gh/tugsbayasgalan/92/base
2025-12-04T09:33:41.7452677Z  * [new branch]              gh/tugsbayasgalan/92/head   -> origin/gh/tugsbayasgalan/92/head
2025-12-04T09:33:41.7454148Z  * [new branch]              gh/tugsbayasgalan/92/orig   -> origin/gh/tugsbayasgalan/92/orig
2025-12-04T09:33:41.7456500Z  * [new branch]              gh/tugsbayasgalan/93/base   -> origin/gh/tugsbayasgalan/93/base
2025-12-04T09:33:41.7458101Z  * [new branch]              gh/tugsbayasgalan/93/head   -> origin/gh/tugsbayasgalan/93/head
2025-12-04T09:33:41.7459675Z  * [new branch]              gh/tugsbayasgalan/93/orig   -> origin/gh/tugsbayasgalan/93/orig
2025-12-04T09:33:41.7462344Z  * [new branch]              gh/v0i0/14/base             -> origin/gh/v0i0/14/base
2025-12-04T09:33:41.7463725Z  * [new branch]              gh/v0i0/14/head             -> origin/gh/v0i0/14/head
2025-12-04T09:33:41.7465245Z  * [new branch]              gh/v0i0/14/orig             -> origin/gh/v0i0/14/orig
2025-12-04T09:33:41.7467032Z  * [new branch]              gh/v0i0/15/base             -> origin/gh/v0i0/15/base
2025-12-04T09:33:41.7468686Z  * [new branch]              gh/v0i0/15/head             -> origin/gh/v0i0/15/head
2025-12-04T09:33:41.7470306Z  * [new branch]              gh/v0i0/15/orig             -> origin/gh/v0i0/15/orig
2025-12-04T09:33:41.7472572Z  * [new branch]              gh/v0i0/16/base             -> origin/gh/v0i0/16/base
2025-12-04T09:33:41.7474112Z  * [new branch]              gh/v0i0/16/head             -> origin/gh/v0i0/16/head
2025-12-04T09:33:41.7475565Z  * [new branch]              gh/v0i0/16/orig             -> origin/gh/v0i0/16/orig
2025-12-04T09:33:41.7477612Z  * [new branch]              gh/v0i0/17/base             -> origin/gh/v0i0/17/base
2025-12-04T09:33:41.7479125Z  * [new branch]              gh/v0i0/17/head             -> origin/gh/v0i0/17/head
2025-12-04T09:33:41.7480631Z  * [new branch]              gh/v0i0/17/orig             -> origin/gh/v0i0/17/orig
2025-12-04T09:33:41.7482679Z  * [new branch]              gh/v0i0/18/base             -> origin/gh/v0i0/18/base
2025-12-04T09:33:41.7484282Z  * [new branch]              gh/v0i0/18/head             -> origin/gh/v0i0/18/head
2025-12-04T09:33:41.7485767Z  * [new branch]              gh/v0i0/18/orig             -> origin/gh/v0i0/18/orig
2025-12-04T09:33:41.7487810Z  * [new branch]              gh/v0i0/19/base             -> origin/gh/v0i0/19/base
2025-12-04T09:33:41.7489336Z  * [new branch]              gh/v0i0/19/head             -> origin/gh/v0i0/19/head
2025-12-04T09:33:41.7490926Z  * [new branch]              gh/v0i0/19/orig             -> origin/gh/v0i0/19/orig
2025-12-04T09:33:41.7494189Z  * [new branch]              gh/vishal9-team/1/base      -> origin/gh/vishal9-team/1/base
2025-12-04T09:33:41.7495767Z  * [new branch]              gh/vishal9-team/1/head      -> origin/gh/vishal9-team/1/head
2025-12-04T09:33:41.7497738Z  * [new branch]              gh/vishal9-team/2/base      -> origin/gh/vishal9-team/2/base
2025-12-04T09:33:41.7499282Z  * [new branch]              gh/vishal9-team/2/head      -> origin/gh/vishal9-team/2/head
2025-12-04T09:33:41.7501120Z  * [new branch]              gh/vishal9-team/2/orig      -> origin/gh/vishal9-team/2/orig
2025-12-04T09:33:41.7503354Z  * [new branch]              gh/vishal9-team/3/base      -> origin/gh/vishal9-team/3/base
2025-12-04T09:33:41.7504811Z  * [new branch]              gh/vishal9-team/3/head      -> origin/gh/vishal9-team/3/head
2025-12-04T09:33:41.7506404Z  * [new branch]              gh/vishal9-team/3/orig      -> origin/gh/vishal9-team/3/orig
2025-12-04T09:33:41.7508170Z  * [new branch]              gh/vishal9-team/4/base      -> origin/gh/vishal9-team/4/base
2025-12-04T09:33:41.7509660Z  * [new branch]              gh/vishal9-team/4/head      -> origin/gh/vishal9-team/4/head
2025-12-04T09:33:41.7511280Z  * [new branch]              gh/vishal9-team/4/orig      -> origin/gh/vishal9-team/4/orig
2025-12-04T09:33:41.7513677Z  * [new branch]              gh/vkuzo/1/next             -> origin/gh/vkuzo/1/next
2025-12-04T09:33:41.7515593Z  * [new branch]              gh/vkuzo/2/next             -> origin/gh/vkuzo/2/next
2025-12-04T09:33:41.7517488Z  * [new branch]              gh/vkuzo/3/next             -> origin/gh/vkuzo/3/next
2025-12-04T09:33:41.7519998Z  * [new branch]              gh/wconstab/424/base        -> origin/gh/wconstab/424/base
2025-12-04T09:33:41.7521667Z  * [new branch]              gh/wconstab/424/head        -> origin/gh/wconstab/424/head
2025-12-04T09:33:41.7523178Z  * [new branch]              gh/wconstab/424/orig        -> origin/gh/wconstab/424/orig
2025-12-04T09:33:41.7525265Z  * [new branch]              gh/wconstab/435/base        -> origin/gh/wconstab/435/base
2025-12-04T09:33:41.7526825Z  * [new branch]              gh/wconstab/435/head        -> origin/gh/wconstab/435/head
2025-12-04T09:33:41.7528493Z  * [new branch]              gh/wconstab/435/orig        -> origin/gh/wconstab/435/orig
2025-12-04T09:33:41.7530980Z  * [new branch]              gh/wconstab/444/base        -> origin/gh/wconstab/444/base
2025-12-04T09:33:41.7532553Z  * [new branch]              gh/wconstab/444/head        -> origin/gh/wconstab/444/head
2025-12-04T09:33:41.7534147Z  * [new branch]              gh/wconstab/444/orig        -> origin/gh/wconstab/444/orig
2025-12-04T09:33:41.7536185Z  * [new branch]              gh/wconstab/447/base        -> origin/gh/wconstab/447/base
2025-12-04T09:33:41.7537805Z  * [new branch]              gh/wconstab/447/head        -> origin/gh/wconstab/447/head
2025-12-04T09:33:41.7539286Z  * [new branch]              gh/wconstab/447/orig        -> origin/gh/wconstab/447/orig
2025-12-04T09:33:41.7541337Z  * [new branch]              gh/wconstab/448/base        -> origin/gh/wconstab/448/base
2025-12-04T09:33:41.7542820Z  * [new branch]              gh/wconstab/448/head        -> origin/gh/wconstab/448/head
2025-12-04T09:33:41.7544395Z  * [new branch]              gh/wconstab/448/orig        -> origin/gh/wconstab/448/orig
2025-12-04T09:33:41.7546802Z  * [new branch]              gh/wconstab/449/base        -> origin/gh/wconstab/449/base
2025-12-04T09:33:41.7548388Z  * [new branch]              gh/wconstab/449/head        -> origin/gh/wconstab/449/head
2025-12-04T09:33:41.7550081Z  * [new branch]              gh/wconstab/449/orig        -> origin/gh/wconstab/449/orig
2025-12-04T09:33:41.7551891Z  * [new branch]              gh/wconstab/450/base        -> origin/gh/wconstab/450/base
2025-12-04T09:33:41.7553524Z  * [new branch]              gh/wconstab/450/head        -> origin/gh/wconstab/450/head
2025-12-04T09:33:41.7555033Z  * [new branch]              gh/wconstab/450/orig        -> origin/gh/wconstab/450/orig
2025-12-04T09:33:41.7556832Z  * [new branch]              gh/wconstab/451/base        -> origin/gh/wconstab/451/base
2025-12-04T09:33:41.7558577Z  * [new branch]              gh/wconstab/451/head        -> origin/gh/wconstab/451/head
2025-12-04T09:33:41.7559992Z  * [new branch]              gh/wconstab/451/orig        -> origin/gh/wconstab/451/orig
2025-12-04T09:33:41.7562232Z  * [new branch]              gh/wconstab/452/base        -> origin/gh/wconstab/452/base
2025-12-04T09:33:41.7563665Z  * [new branch]              gh/wconstab/452/head        -> origin/gh/wconstab/452/head
2025-12-04T09:33:41.7565386Z  * [new branch]              gh/wconstab/452/orig        -> origin/gh/wconstab/452/orig
2025-12-04T09:33:41.7567088Z  * [new branch]              gh/wconstab/453/base        -> origin/gh/wconstab/453/base
2025-12-04T09:33:41.7568696Z  * [new branch]              gh/wconstab/453/head        -> origin/gh/wconstab/453/head
2025-12-04T09:33:41.7570350Z  * [new branch]              gh/wconstab/453/orig        -> origin/gh/wconstab/453/orig
2025-12-04T09:33:41.7572371Z  * [new branch]              gh/wconstab/454/base        -> origin/gh/wconstab/454/base
2025-12-04T09:33:41.7573902Z  * [new branch]              gh/wconstab/454/head        -> origin/gh/wconstab/454/head
2025-12-04T09:33:41.7575431Z  * [new branch]              gh/wconstab/454/orig        -> origin/gh/wconstab/454/orig
2025-12-04T09:33:41.7577620Z  * [new branch]              gh/wconstab/455/base        -> origin/gh/wconstab/455/base
2025-12-04T09:33:41.7579079Z  * [new branch]              gh/wconstab/455/head        -> origin/gh/wconstab/455/head
2025-12-04T09:33:41.7580552Z  * [new branch]              gh/wconstab/455/orig        -> origin/gh/wconstab/455/orig
2025-12-04T09:33:41.7582928Z  * [new branch]              gh/wconstab/456/base        -> origin/gh/wconstab/456/base
2025-12-04T09:33:41.7584753Z  * [new branch]              gh/wconstab/456/head        -> origin/gh/wconstab/456/head
2025-12-04T09:33:41.7586407Z  * [new branch]              gh/wconstab/456/orig        -> origin/gh/wconstab/456/orig
2025-12-04T09:33:41.7588457Z  * [new branch]              gh/wconstab/457/base        -> origin/gh/wconstab/457/base
2025-12-04T09:33:41.7590071Z  * [new branch]              gh/wconstab/457/head        -> origin/gh/wconstab/457/head
2025-12-04T09:33:41.7593622Z  * [new branch]              gh/wconstab/457/orig        -> origin/gh/wconstab/457/orig
2025-12-04T09:33:41.7594437Z  * [new branch]              gh/wconstab/458/base        -> origin/gh/wconstab/458/base
2025-12-04T09:33:41.7595342Z  * [new branch]              gh/wconstab/458/head        -> origin/gh/wconstab/458/head
2025-12-04T09:33:41.7596713Z  * [new branch]              gh/wconstab/458/orig        -> origin/gh/wconstab/458/orig
2025-12-04T09:33:41.7598524Z  * [new branch]              gh/wconstab/459/base        -> origin/gh/wconstab/459/base
2025-12-04T09:33:41.7600133Z  * [new branch]              gh/wconstab/459/head        -> origin/gh/wconstab/459/head
2025-12-04T09:33:41.7601512Z  * [new branch]              gh/wconstab/459/orig        -> origin/gh/wconstab/459/orig
2025-12-04T09:33:41.7604338Z  * [new branch]              gh/wconstab/460/base        -> origin/gh/wconstab/460/base
2025-12-04T09:33:41.7606156Z  * [new branch]              gh/wconstab/460/head        -> origin/gh/wconstab/460/head
2025-12-04T09:33:41.7607771Z  * [new branch]              gh/wconstab/460/orig        -> origin/gh/wconstab/460/orig
2025-12-04T09:33:41.7609924Z  * [new branch]              gh/wconstab/461/base        -> origin/gh/wconstab/461/base
2025-12-04T09:33:41.7611640Z  * [new branch]              gh/wconstab/461/head        -> origin/gh/wconstab/461/head
2025-12-04T09:33:41.7613253Z  * [new branch]              gh/wconstab/461/orig        -> origin/gh/wconstab/461/orig
2025-12-04T09:33:41.7615128Z  * [new branch]              gh/wconstab/462/base        -> origin/gh/wconstab/462/base
2025-12-04T09:33:41.7616806Z  * [new branch]              gh/wconstab/462/head        -> origin/gh/wconstab/462/head
2025-12-04T09:33:41.7618511Z  * [new branch]              gh/wconstab/462/orig        -> origin/gh/wconstab/462/orig
2025-12-04T09:33:41.7620575Z  * [new branch]              gh/wconstab/463/base        -> origin/gh/wconstab/463/base
2025-12-04T09:33:41.7622189Z  * [new branch]              gh/wconstab/463/head        -> origin/gh/wconstab/463/head
2025-12-04T09:33:41.7623757Z  * [new branch]              gh/wconstab/463/orig        -> origin/gh/wconstab/463/orig
2025-12-04T09:33:41.7625831Z  * [new branch]              gh/wconstab/464/base        -> origin/gh/wconstab/464/base
2025-12-04T09:33:41.7627505Z  * [new branch]              gh/wconstab/464/head        -> origin/gh/wconstab/464/head
2025-12-04T09:33:41.7628810Z  * [new branch]              gh/wconstab/464/orig        -> origin/gh/wconstab/464/orig
2025-12-04T09:33:41.7630733Z  * [new branch]              gh/wconstab/465/base        -> origin/gh/wconstab/465/base
2025-12-04T09:33:41.7632318Z  * [new branch]              gh/wconstab/465/head        -> origin/gh/wconstab/465/head
2025-12-04T09:33:41.7633811Z  * [new branch]              gh/wconstab/465/orig        -> origin/gh/wconstab/465/orig
2025-12-04T09:33:41.7636049Z  * [new branch]              gh/wconstab/466/base        -> origin/gh/wconstab/466/base
2025-12-04T09:33:41.7637438Z  * [new branch]              gh/wconstab/466/head        -> origin/gh/wconstab/466/head
2025-12-04T09:33:41.7638808Z  * [new branch]              gh/wconstab/466/orig        -> origin/gh/wconstab/466/orig
2025-12-04T09:33:41.7641309Z  * [new branch]              gh/wconstab/467/base        -> origin/gh/wconstab/467/base
2025-12-04T09:33:41.7642983Z  * [new branch]              gh/wconstab/467/head        -> origin/gh/wconstab/467/head
2025-12-04T09:33:41.7644540Z  * [new branch]              gh/wconstab/467/orig        -> origin/gh/wconstab/467/orig
2025-12-04T09:33:41.7646462Z  * [new branch]              gh/wconstab/468/base        -> origin/gh/wconstab/468/base
2025-12-04T09:33:41.7647875Z  * [new branch]              gh/wconstab/468/head        -> origin/gh/wconstab/468/head
2025-12-04T09:33:41.7649410Z  * [new branch]              gh/wconstab/468/orig        -> origin/gh/wconstab/468/orig
2025-12-04T09:33:41.7652030Z  * [new branch]              gh/weifengpy/39/base        -> origin/gh/weifengpy/39/base
2025-12-04T09:33:41.7653570Z  * [new branch]              gh/weifengpy/39/head        -> origin/gh/weifengpy/39/head
2025-12-04T09:33:41.7655107Z  * [new branch]              gh/weifengpy/39/orig        -> origin/gh/weifengpy/39/orig
2025-12-04T09:33:41.7657586Z  * [new branch]              gh/weifengpy/40/base        -> origin/gh/weifengpy/40/base
2025-12-04T09:33:41.7659039Z  * [new branch]              gh/weifengpy/40/head        -> origin/gh/weifengpy/40/head
2025-12-04T09:33:41.7660533Z  * [new branch]              gh/weifengpy/40/orig        -> origin/gh/weifengpy/40/orig
2025-12-04T09:33:41.7662767Z  * [new branch]              gh/weifengpy/41/base        -> origin/gh/weifengpy/41/base
2025-12-04T09:33:41.7664325Z  * [new branch]              gh/weifengpy/41/head        -> origin/gh/weifengpy/41/head
2025-12-04T09:33:41.7665960Z  * [new branch]              gh/weifengpy/41/orig        -> origin/gh/weifengpy/41/orig
2025-12-04T09:33:41.7668524Z  * [new branch]              gh/williamwen42/250/base    -> origin/gh/williamwen42/250/base
2025-12-04T09:33:41.7669982Z  * [new branch]              gh/williamwen42/250/head    -> origin/gh/williamwen42/250/head
2025-12-04T09:33:41.7671648Z  * [new branch]              gh/williamwen42/250/orig    -> origin/gh/williamwen42/250/orig
2025-12-04T09:33:41.7677228Z  * [new branch]              gh/williamwen42/279/base    -> origin/gh/williamwen42/279/base
2025-12-04T09:33:41.7678869Z  * [new branch]              gh/williamwen42/279/head    -> origin/gh/williamwen42/279/head
2025-12-04T09:33:41.7680454Z  * [new branch]              gh/williamwen42/279/orig    -> origin/gh/williamwen42/279/orig
2025-12-04T09:33:41.7682466Z  * [new branch]              gh/williamwen42/282/base    -> origin/gh/williamwen42/282/base
2025-12-04T09:33:41.7683993Z  * [new branch]              gh/williamwen42/282/head    -> origin/gh/williamwen42/282/head
2025-12-04T09:33:41.7685450Z  * [new branch]              gh/williamwen42/282/orig    -> origin/gh/williamwen42/282/orig
2025-12-04T09:33:41.7687588Z  * [new branch]              gh/williamwen42/287/base    -> origin/gh/williamwen42/287/base
2025-12-04T09:33:41.7689024Z  * [new branch]              gh/williamwen42/287/head    -> origin/gh/williamwen42/287/head
2025-12-04T09:33:41.7690640Z  * [new branch]              gh/williamwen42/287/orig    -> origin/gh/williamwen42/287/orig
2025-12-04T09:33:41.7692824Z  * [new branch]              gh/williamwen42/288/base    -> origin/gh/williamwen42/288/base
2025-12-04T09:33:41.7694164Z  * [new branch]              gh/williamwen42/288/head    -> origin/gh/williamwen42/288/head
2025-12-04T09:33:41.7695668Z  * [new branch]              gh/williamwen42/288/orig    -> origin/gh/williamwen42/288/orig
2025-12-04T09:33:41.7698138Z  * [new branch]              gh/williamwen42/296/base    -> origin/gh/williamwen42/296/base
2025-12-04T09:33:41.7699760Z  * [new branch]              gh/williamwen42/296/head    -> origin/gh/williamwen42/296/head
2025-12-04T09:33:41.7701249Z  * [new branch]              gh/williamwen42/296/orig    -> origin/gh/williamwen42/296/orig
2025-12-04T09:33:41.7703114Z  * [new branch]              gh/williamwen42/297/base    -> origin/gh/williamwen42/297/base
2025-12-04T09:33:41.7704681Z  * [new branch]              gh/williamwen42/297/head    -> origin/gh/williamwen42/297/head
2025-12-04T09:33:41.7706213Z  * [new branch]              gh/williamwen42/297/orig    -> origin/gh/williamwen42/297/orig
2025-12-04T09:33:41.7708259Z  * [new branch]              gh/williamwen42/306/base    -> origin/gh/williamwen42/306/base
2025-12-04T09:33:41.7709807Z  * [new branch]              gh/williamwen42/306/head    -> origin/gh/williamwen42/306/head
2025-12-04T09:33:41.7711369Z  * [new branch]              gh/williamwen42/306/orig    -> origin/gh/williamwen42/306/orig
2025-12-04T09:33:41.7713427Z  * [new branch]              gh/williamwen42/309/base    -> origin/gh/williamwen42/309/base
2025-12-04T09:33:41.7714992Z  * [new branch]              gh/williamwen42/309/head    -> origin/gh/williamwen42/309/head
2025-12-04T09:33:41.7716532Z  * [new branch]              gh/williamwen42/309/orig    -> origin/gh/williamwen42/309/orig
2025-12-04T09:33:41.7718616Z  * [new branch]              gh/williamwen42/310/base    -> origin/gh/williamwen42/310/base
2025-12-04T09:33:41.7720180Z  * [new branch]              gh/williamwen42/310/head    -> origin/gh/williamwen42/310/head
2025-12-04T09:33:41.7721746Z  * [new branch]              gh/williamwen42/310/orig    -> origin/gh/williamwen42/310/orig
2025-12-04T09:33:41.7725357Z  * [new branch]              gh/williamwen42/311/base    -> origin/gh/williamwen42/311/base
2025-12-04T09:33:41.7726854Z  * [new branch]              gh/williamwen42/311/head    -> origin/gh/williamwen42/311/head
2025-12-04T09:33:41.7728375Z  * [new branch]              gh/williamwen42/311/orig    -> origin/gh/williamwen42/311/orig
2025-12-04T09:33:41.7730191Z  * [new branch]              gh/williamwen42/319/base    -> origin/gh/williamwen42/319/base
2025-12-04T09:33:41.7731649Z  * [new branch]              gh/williamwen42/319/head    -> origin/gh/williamwen42/319/head
2025-12-04T09:33:41.7733100Z  * [new branch]              gh/williamwen42/319/orig    -> origin/gh/williamwen42/319/orig
2025-12-04T09:33:41.7735127Z  * [new branch]              gh/williamwen42/325/base    -> origin/gh/williamwen42/325/base
2025-12-04T09:33:41.7736893Z  * [new branch]              gh/williamwen42/325/head    -> origin/gh/williamwen42/325/head
2025-12-04T09:33:41.7738401Z  * [new branch]              gh/williamwen42/325/orig    -> origin/gh/williamwen42/325/orig
2025-12-04T09:33:41.7740417Z  * [new branch]              gh/williamwen42/326/base    -> origin/gh/williamwen42/326/base
2025-12-04T09:33:41.7742032Z  * [new branch]              gh/williamwen42/326/head    -> origin/gh/williamwen42/326/head
2025-12-04T09:33:41.7743512Z  * [new branch]              gh/williamwen42/326/orig    -> origin/gh/williamwen42/326/orig
2025-12-04T09:33:41.7745567Z  * [new branch]              gh/williamwen42/327/base    -> origin/gh/williamwen42/327/base
2025-12-04T09:33:41.7747015Z  * [new branch]              gh/williamwen42/327/head    -> origin/gh/williamwen42/327/head
2025-12-04T09:33:41.7748526Z  * [new branch]              gh/williamwen42/327/orig    -> origin/gh/williamwen42/327/orig
2025-12-04T09:33:41.7750552Z  * [new branch]              gh/williamwen42/328/base    -> origin/gh/williamwen42/328/base
2025-12-04T09:33:41.7752225Z  * [new branch]              gh/williamwen42/328/head    -> origin/gh/williamwen42/328/head
2025-12-04T09:33:41.7753703Z  * [new branch]              gh/williamwen42/328/orig    -> origin/gh/williamwen42/328/orig
2025-12-04T09:33:41.7756300Z  * [new branch]              gh/williamwen42/329/base    -> origin/gh/williamwen42/329/base
2025-12-04T09:33:41.7757940Z  * [new branch]              gh/williamwen42/329/head    -> origin/gh/williamwen42/329/head
2025-12-04T09:33:41.7759514Z  * [new branch]              gh/williamwen42/329/orig    -> origin/gh/williamwen42/329/orig
2025-12-04T09:33:41.7761687Z  * [new branch]              gh/williamwen42/330/base    -> origin/gh/williamwen42/330/base
2025-12-04T09:33:41.7763291Z  * [new branch]              gh/williamwen42/330/head    -> origin/gh/williamwen42/330/head
2025-12-04T09:33:41.7764763Z  * [new branch]              gh/williamwen42/330/orig    -> origin/gh/williamwen42/330/orig
2025-12-04T09:33:41.7766736Z  * [new branch]              gh/williamwen42/331/base    -> origin/gh/williamwen42/331/base
2025-12-04T09:33:41.7768199Z  * [new branch]              gh/williamwen42/331/head    -> origin/gh/williamwen42/331/head
2025-12-04T09:33:41.7769732Z  * [new branch]              gh/williamwen42/331/orig    -> origin/gh/williamwen42/331/orig
2025-12-04T09:33:41.7771783Z  * [new branch]              gh/williamwen42/332/base    -> origin/gh/williamwen42/332/base
2025-12-04T09:33:41.7773360Z  * [new branch]              gh/williamwen42/332/head    -> origin/gh/williamwen42/332/head
2025-12-04T09:33:41.7775264Z  * [new branch]              gh/williamwen42/332/orig    -> origin/gh/williamwen42/332/orig
2025-12-04T09:33:41.7777755Z  * [new branch]              gh/williamwen42/333/base    -> origin/gh/williamwen42/333/base
2025-12-04T09:33:41.7779199Z  * [new branch]              gh/williamwen42/333/head    -> origin/gh/williamwen42/333/head
2025-12-04T09:33:41.7780734Z  * [new branch]              gh/williamwen42/333/orig    -> origin/gh/williamwen42/333/orig
2025-12-04T09:33:41.7782907Z  * [new branch]              gh/williamwen42/334/base    -> origin/gh/williamwen42/334/base
2025-12-04T09:33:41.7784379Z  * [new branch]              gh/williamwen42/334/head    -> origin/gh/williamwen42/334/head
2025-12-04T09:33:41.7785991Z  * [new branch]              gh/williamwen42/334/orig    -> origin/gh/williamwen42/334/orig
2025-12-04T09:33:41.7792313Z  * [new branch]              gh/williamwen42/335/base    -> origin/gh/williamwen42/335/base
2025-12-04T09:33:41.7793826Z  * [new branch]              gh/williamwen42/335/head    -> origin/gh/williamwen42/335/head
2025-12-04T09:33:41.7795368Z  * [new branch]              gh/williamwen42/335/orig    -> origin/gh/williamwen42/335/orig
2025-12-04T09:33:41.7797530Z  * [new branch]              gh/williamwen42/336/base    -> origin/gh/williamwen42/336/base
2025-12-04T09:33:41.7799039Z  * [new branch]              gh/williamwen42/336/head    -> origin/gh/williamwen42/336/head
2025-12-04T09:33:41.7800374Z  * [new branch]              gh/williamwen42/336/orig    -> origin/gh/williamwen42/336/orig
2025-12-04T09:33:41.7802465Z  * [new branch]              gh/williamwen42/337/base    -> origin/gh/williamwen42/337/base
2025-12-04T09:33:41.7803935Z  * [new branch]              gh/williamwen42/337/head    -> origin/gh/williamwen42/337/head
2025-12-04T09:33:41.7805464Z  * [new branch]              gh/williamwen42/337/orig    -> origin/gh/williamwen42/337/orig
2025-12-04T09:33:41.7807652Z  * [new branch]              gh/williamwen42/338/base    -> origin/gh/williamwen42/338/base
2025-12-04T09:33:41.7809164Z  * [new branch]              gh/williamwen42/338/head    -> origin/gh/williamwen42/338/head
2025-12-04T09:33:41.7810662Z  * [new branch]              gh/williamwen42/338/orig    -> origin/gh/williamwen42/338/orig
2025-12-04T09:33:41.7812682Z  * [new branch]              gh/williamwen42/339/base    -> origin/gh/williamwen42/339/base
2025-12-04T09:33:41.7814293Z  * [new branch]              gh/williamwen42/339/head    -> origin/gh/williamwen42/339/head
2025-12-04T09:33:41.7815577Z  * [new branch]              gh/williamwen42/339/orig    -> origin/gh/williamwen42/339/orig
2025-12-04T09:33:41.7827317Z  * [new branch]              gh/williamwen42/340/base    -> origin/gh/williamwen42/340/base
2025-12-04T09:33:41.7827906Z  * [new branch]              gh/williamwen42/340/head    -> origin/gh/williamwen42/340/head
2025-12-04T09:33:41.7828224Z  * [new branch]              gh/williamwen42/340/orig    -> origin/gh/williamwen42/340/orig
2025-12-04T09:33:41.7828516Z  * [new branch]              gh/williamwen42/341/base    -> origin/gh/williamwen42/341/base
2025-12-04T09:33:41.7828794Z  * [new branch]              gh/williamwen42/341/head    -> origin/gh/williamwen42/341/head
2025-12-04T09:33:41.7829070Z  * [new branch]              gh/williamwen42/341/orig    -> origin/gh/williamwen42/341/orig
2025-12-04T09:33:41.7829362Z  * [new branch]              gh/williamwen42/342/base    -> origin/gh/williamwen42/342/base
2025-12-04T09:33:41.7829660Z  * [new branch]              gh/williamwen42/342/head    -> origin/gh/williamwen42/342/head
2025-12-04T09:33:41.7830071Z  * [new branch]              gh/williamwen42/342/orig    -> origin/gh/williamwen42/342/orig
2025-12-04T09:33:41.7832751Z  * [new branch]              gh/williamwen42/343/base    -> origin/gh/williamwen42/343/base
2025-12-04T09:33:41.7834324Z  * [new branch]              gh/williamwen42/343/head    -> origin/gh/williamwen42/343/head
2025-12-04T09:33:41.7835766Z  * [new branch]              gh/williamwen42/343/orig    -> origin/gh/williamwen42/343/orig
2025-12-04T09:33:41.7837855Z  * [new branch]              gh/williamwen42/344/base    -> origin/gh/williamwen42/344/base
2025-12-04T09:33:41.7839350Z  * [new branch]              gh/williamwen42/344/head    -> origin/gh/williamwen42/344/head
2025-12-04T09:33:41.7840807Z  * [new branch]              gh/williamwen42/344/orig    -> origin/gh/williamwen42/344/orig
2025-12-04T09:33:41.7843606Z  * [new branch]              gh/williamwen42/345/base    -> origin/gh/williamwen42/345/base
2025-12-04T09:33:41.7845083Z  * [new branch]              gh/williamwen42/345/head    -> origin/gh/williamwen42/345/head
2025-12-04T09:33:41.7846573Z  * [new branch]              gh/williamwen42/345/orig    -> origin/gh/williamwen42/345/orig
2025-12-04T09:33:41.7848777Z  * [new branch]              gh/williamwen42/346/base    -> origin/gh/williamwen42/346/base
2025-12-04T09:33:41.7850352Z  * [new branch]              gh/williamwen42/346/head    -> origin/gh/williamwen42/346/head
2025-12-04T09:33:41.7851919Z  * [new branch]              gh/williamwen42/346/orig    -> origin/gh/williamwen42/346/orig
2025-12-04T09:33:41.7854003Z  * [new branch]              gh/williamwen42/347/base    -> origin/gh/williamwen42/347/base
2025-12-04T09:33:41.7855401Z  * [new branch]              gh/williamwen42/347/head    -> origin/gh/williamwen42/347/head
2025-12-04T09:33:41.7856971Z  * [new branch]              gh/williamwen42/347/orig    -> origin/gh/williamwen42/347/orig
2025-12-04T09:33:41.7859068Z  * [new branch]              gh/williamwen42/348/base    -> origin/gh/williamwen42/348/base
2025-12-04T09:33:41.7860439Z  * [new branch]              gh/williamwen42/348/head    -> origin/gh/williamwen42/348/head
2025-12-04T09:33:41.7861931Z  * [new branch]              gh/williamwen42/348/orig    -> origin/gh/williamwen42/348/orig
2025-12-04T09:33:41.7863763Z  * [new branch]              gh/williamwen42/349/base    -> origin/gh/williamwen42/349/base
2025-12-04T09:33:41.7865297Z  * [new branch]              gh/williamwen42/349/head    -> origin/gh/williamwen42/349/head
2025-12-04T09:33:41.7866776Z  * [new branch]              gh/williamwen42/349/orig    -> origin/gh/williamwen42/349/orig
2025-12-04T09:33:41.7868970Z  * [new branch]              gh/williamwen42/350/base    -> origin/gh/williamwen42/350/base
2025-12-04T09:33:41.7870451Z  * [new branch]              gh/williamwen42/350/head    -> origin/gh/williamwen42/350/head
2025-12-04T09:33:41.7872238Z  * [new branch]              gh/williamwen42/350/orig    -> origin/gh/williamwen42/350/orig
2025-12-04T09:33:41.7874214Z  * [new branch]              gh/williamwen42/351/base    -> origin/gh/williamwen42/351/base
2025-12-04T09:33:41.7875849Z  * [new branch]              gh/williamwen42/351/head    -> origin/gh/williamwen42/351/head
2025-12-04T09:33:41.7877336Z  * [new branch]              gh/williamwen42/351/orig    -> origin/gh/williamwen42/351/orig
2025-12-04T09:33:41.7879547Z  * [new branch]              gh/williamwen42/352/base    -> origin/gh/williamwen42/352/base
2025-12-04T09:33:41.7880946Z  * [new branch]              gh/williamwen42/352/head    -> origin/gh/williamwen42/352/head
2025-12-04T09:33:41.7882435Z  * [new branch]              gh/williamwen42/352/orig    -> origin/gh/williamwen42/352/orig
2025-12-04T09:33:41.7884599Z  * [new branch]              gh/williamwen42/353/base    -> origin/gh/williamwen42/353/base
2025-12-04T09:33:41.7886150Z  * [new branch]              gh/williamwen42/353/head    -> origin/gh/williamwen42/353/head
2025-12-04T09:33:41.7887690Z  * [new branch]              gh/williamwen42/353/orig    -> origin/gh/williamwen42/353/orig
2025-12-04T09:33:41.7889696Z  * [new branch]              gh/williamwen42/354/base    -> origin/gh/williamwen42/354/base
2025-12-04T09:33:41.7891405Z  * [new branch]              gh/williamwen42/354/head    -> origin/gh/williamwen42/354/head
2025-12-04T09:33:41.7892906Z  * [new branch]              gh/williamwen42/354/orig    -> origin/gh/williamwen42/354/orig
2025-12-04T09:33:41.7894924Z  * [new branch]              gh/williamwen42/355/base    -> origin/gh/williamwen42/355/base
2025-12-04T09:33:41.7896488Z  * [new branch]              gh/williamwen42/355/head    -> origin/gh/williamwen42/355/head
2025-12-04T09:33:41.7898009Z  * [new branch]              gh/williamwen42/355/orig    -> origin/gh/williamwen42/355/orig
2025-12-04T09:33:41.7900533Z  * [new branch]              gh/williamwen42/356/base    -> origin/gh/williamwen42/356/base
2025-12-04T09:33:41.7901998Z  * [new branch]              gh/williamwen42/356/head    -> origin/gh/williamwen42/356/head
2025-12-04T09:33:41.7903448Z  * [new branch]              gh/williamwen42/356/orig    -> origin/gh/williamwen42/356/orig
2025-12-04T09:33:41.7905468Z  * [new branch]              gh/williamwen42/357/base    -> origin/gh/williamwen42/357/base
2025-12-04T09:33:41.7907078Z  * [new branch]              gh/williamwen42/357/head    -> origin/gh/williamwen42/357/head
2025-12-04T09:33:41.7908583Z  * [new branch]              gh/williamwen42/357/orig    -> origin/gh/williamwen42/357/orig
2025-12-04T09:33:41.7910710Z  * [new branch]              gh/williamwen42/358/base    -> origin/gh/williamwen42/358/base
2025-12-04T09:33:41.7912146Z  * [new branch]              gh/williamwen42/358/head    -> origin/gh/williamwen42/358/head
2025-12-04T09:33:41.7913748Z  * [new branch]              gh/williamwen42/358/orig    -> origin/gh/williamwen42/358/orig
2025-12-04T09:33:41.7916161Z  * [new branch]              gh/xmfan/169/base           -> origin/gh/xmfan/169/base
2025-12-04T09:33:41.7917682Z  * [new branch]              gh/xmfan/169/head           -> origin/gh/xmfan/169/head
2025-12-04T09:33:41.7919569Z  * [new branch]              gh/xmfan/170/base           -> origin/gh/xmfan/170/base
2025-12-04T09:33:41.7920905Z  * [new branch]              gh/xmfan/170/head           -> origin/gh/xmfan/170/head
2025-12-04T09:33:41.7922819Z  * [new branch]              gh/xmfan/274/base           -> origin/gh/xmfan/274/base
2025-12-04T09:33:41.7924276Z  * [new branch]              gh/xmfan/274/head           -> origin/gh/xmfan/274/head
2025-12-04T09:33:41.7925745Z  * [new branch]              gh/xmfan/274/orig           -> origin/gh/xmfan/274/orig
2025-12-04T09:33:41.7927672Z  * [new branch]              gh/xmfan/277/base           -> origin/gh/xmfan/277/base
2025-12-04T09:33:41.7929257Z  * [new branch]              gh/xmfan/277/head           -> origin/gh/xmfan/277/head
2025-12-04T09:33:41.7930764Z  * [new branch]              gh/xmfan/277/orig           -> origin/gh/xmfan/277/orig
2025-12-04T09:33:41.7932766Z  * [new branch]              gh/xmfan/301/base           -> origin/gh/xmfan/301/base
2025-12-04T09:33:41.7934134Z  * [new branch]              gh/xmfan/301/head           -> origin/gh/xmfan/301/head
2025-12-04T09:33:41.7935556Z  * [new branch]              gh/xmfan/301/orig           -> origin/gh/xmfan/301/orig
2025-12-04T09:33:41.7938221Z  * [new branch]              gh/xmfan/304/base           -> origin/gh/xmfan/304/base
2025-12-04T09:33:41.7939709Z  * [new branch]              gh/xmfan/304/head           -> origin/gh/xmfan/304/head
2025-12-04T09:33:41.7941609Z  * [new branch]              gh/xmfan/304/orig           -> origin/gh/xmfan/304/orig
2025-12-04T09:33:41.7943553Z  * [new branch]              gh/xmfan/309/base           -> origin/gh/xmfan/309/base
2025-12-04T09:33:41.7945030Z  * [new branch]              gh/xmfan/309/head           -> origin/gh/xmfan/309/head
2025-12-04T09:33:41.7946616Z  * [new branch]              gh/xmfan/309/orig           -> origin/gh/xmfan/309/orig
2025-12-04T09:33:41.7949011Z  * [new branch]              gh/xmfan/310/base           -> origin/gh/xmfan/310/base
2025-12-04T09:33:41.7950619Z  * [new branch]              gh/xmfan/310/head           -> origin/gh/xmfan/310/head
2025-12-04T09:33:41.7952137Z  * [new branch]              gh/xmfan/310/orig           -> origin/gh/xmfan/310/orig
2025-12-04T09:33:41.7954095Z  * [new branch]              gh/xmfan/311/base           -> origin/gh/xmfan/311/base
2025-12-04T09:33:41.7955522Z  * [new branch]              gh/xmfan/311/head           -> origin/gh/xmfan/311/head
2025-12-04T09:33:41.7956976Z  * [new branch]              gh/xmfan/311/orig           -> origin/gh/xmfan/311/orig
2025-12-04T09:33:41.7958924Z  * [new branch]              gh/xmfan/312/base           -> origin/gh/xmfan/312/base
2025-12-04T09:33:41.7960486Z  * [new branch]              gh/xmfan/312/head           -> origin/gh/xmfan/312/head
2025-12-04T09:33:41.7961820Z  * [new branch]              gh/xmfan/312/orig           -> origin/gh/xmfan/312/orig
2025-12-04T09:33:41.7964301Z  * [new branch]              gh/xmfan/313/base           -> origin/gh/xmfan/313/base
2025-12-04T09:33:41.7965742Z  * [new branch]              gh/xmfan/313/head           -> origin/gh/xmfan/313/head
2025-12-04T09:33:41.7967245Z  * [new branch]              gh/xmfan/313/orig           -> origin/gh/xmfan/313/orig
2025-12-04T09:33:41.7969724Z  * [new branch]              gh/xuanzhang816/27/base     -> origin/gh/xuanzhang816/27/base
2025-12-04T09:33:41.7971577Z  * [new branch]              gh/xuanzhang816/27/head     -> origin/gh/xuanzhang816/27/head
2025-12-04T09:33:41.7973042Z  * [new branch]              gh/xuanzhang816/27/orig     -> origin/gh/xuanzhang816/27/orig
2025-12-04T09:33:41.7975220Z  * [new branch]              gh/xuanzhang816/32/base     -> origin/gh/xuanzhang816/32/base
2025-12-04T09:33:41.7976766Z  * [new branch]              gh/xuanzhang816/32/head     -> origin/gh/xuanzhang816/32/head
2025-12-04T09:33:41.7978702Z  * [new branch]              gh/xuanzhang816/32/orig     -> origin/gh/xuanzhang816/32/orig
2025-12-04T09:33:41.7980710Z  * [new branch]              gh/xuanzhang816/33/base     -> origin/gh/xuanzhang816/33/base
2025-12-04T09:33:41.7982116Z  * [new branch]              gh/xuanzhang816/33/head     -> origin/gh/xuanzhang816/33/head
2025-12-04T09:33:41.7983679Z  * [new branch]              gh/xuanzhang816/33/orig     -> origin/gh/xuanzhang816/33/orig
2025-12-04T09:33:41.7986075Z  * [new branch]              gh/xuanzhang816/34/base     -> origin/gh/xuanzhang816/34/base
2025-12-04T09:33:41.7987569Z  * [new branch]              gh/xuanzhang816/34/head     -> origin/gh/xuanzhang816/34/head
2025-12-04T09:33:41.7989052Z  * [new branch]              gh/xuanzhang816/34/orig     -> origin/gh/xuanzhang816/34/orig
2025-12-04T09:33:41.7991442Z  * [new branch]              gh/xuanzhang816/35/base     -> origin/gh/xuanzhang816/35/base
2025-12-04T09:33:41.7992913Z  * [new branch]              gh/xuanzhang816/35/head     -> origin/gh/xuanzhang816/35/head
2025-12-04T09:33:41.7994619Z  * [new branch]              gh/xuanzhang816/35/orig     -> origin/gh/xuanzhang816/35/orig
2025-12-04T09:33:41.7996959Z  * [new branch]              gh/yanbing-j/11/base        -> origin/gh/yanbing-j/11/base
2025-12-04T09:33:41.7998462Z  * [new branch]              gh/yanbing-j/11/head        -> origin/gh/yanbing-j/11/head
2025-12-04T09:33:41.7999917Z  * [new branch]              gh/yanbing-j/11/orig        -> origin/gh/yanbing-j/11/orig
2025-12-04T09:33:41.8001881Z  * [new branch]              gh/yanbing-j/12/base        -> origin/gh/yanbing-j/12/base
2025-12-04T09:33:41.8003824Z  * [new branch]              gh/yanbing-j/12/head        -> origin/gh/yanbing-j/12/head
2025-12-04T09:33:41.8005360Z  * [new branch]              gh/yanbing-j/12/orig        -> origin/gh/yanbing-j/12/orig
2025-12-04T09:33:41.8007364Z  * [new branch]              gh/yanbing-j/13/base        -> origin/gh/yanbing-j/13/base
2025-12-04T09:33:41.8008893Z  * [new branch]              gh/yanbing-j/13/head        -> origin/gh/yanbing-j/13/head
2025-12-04T09:33:41.8010352Z  * [new branch]              gh/yanbing-j/13/orig        -> origin/gh/yanbing-j/13/orig
2025-12-04T09:33:41.8012383Z  * [new branch]              gh/yanbing-j/14/base        -> origin/gh/yanbing-j/14/base
2025-12-04T09:33:41.8014642Z  * [new branch]              gh/yanbing-j/14/head        -> origin/gh/yanbing-j/14/head
2025-12-04T09:33:41.8015569Z  * [new branch]              gh/yanbing-j/14/orig        -> origin/gh/yanbing-j/14/orig
2025-12-04T09:33:41.8017654Z  * [new branch]              gh/yanbing-j/15/base        -> origin/gh/yanbing-j/15/base
2025-12-04T09:33:41.8018909Z  * [new branch]              gh/yanbing-j/15/head        -> origin/gh/yanbing-j/15/head
2025-12-04T09:33:41.8020577Z  * [new branch]              gh/yanbing-j/15/orig        -> origin/gh/yanbing-j/15/orig
2025-12-04T09:33:41.8022481Z  * [new branch]              gh/yanbing-j/18/base        -> origin/gh/yanbing-j/18/base
2025-12-04T09:33:41.8023816Z  * [new branch]              gh/yanbing-j/18/head        -> origin/gh/yanbing-j/18/head
2025-12-04T09:33:41.8025125Z  * [new branch]              gh/yanbing-j/18/orig        -> origin/gh/yanbing-j/18/orig
2025-12-04T09:33:41.8027214Z  * [new branch]              gh/yanbing-j/19/base        -> origin/gh/yanbing-j/19/base
2025-12-04T09:33:41.8028563Z  * [new branch]              gh/yanbing-j/19/head        -> origin/gh/yanbing-j/19/head
2025-12-04T09:33:41.8030067Z  * [new branch]              gh/yanbing-j/19/orig        -> origin/gh/yanbing-j/19/orig
2025-12-04T09:33:41.8032261Z  * [new branch]              gh/yanbing-j/20/base        -> origin/gh/yanbing-j/20/base
2025-12-04T09:33:41.8033506Z  * [new branch]              gh/yanbing-j/20/head        -> origin/gh/yanbing-j/20/head
2025-12-04T09:33:41.8035200Z  * [new branch]              gh/yanbing-j/20/orig        -> origin/gh/yanbing-j/20/orig
2025-12-04T09:33:41.8037247Z  * [new branch]              gh/yanbing-j/21/base        -> origin/gh/yanbing-j/21/base
2025-12-04T09:33:41.8038629Z  * [new branch]              gh/yanbing-j/21/head        -> origin/gh/yanbing-j/21/head
2025-12-04T09:33:41.8040686Z  * [new branch]              gh/yanbing-j/22/base        -> origin/gh/yanbing-j/22/base
2025-12-04T09:33:41.8041983Z  * [new branch]              gh/yanbing-j/22/head        -> origin/gh/yanbing-j/22/head
2025-12-04T09:33:41.8044113Z  * [new branch]              gh/yanbing-j/22/orig        -> origin/gh/yanbing-j/22/orig
2025-12-04T09:33:41.8046096Z  * [new branch]              gh/yanbing-j/23/base        -> origin/gh/yanbing-j/23/base
2025-12-04T09:33:41.8047444Z  * [new branch]              gh/yanbing-j/23/head        -> origin/gh/yanbing-j/23/head
2025-12-04T09:33:41.8049024Z  * [new branch]              gh/yanbing-j/23/orig        -> origin/gh/yanbing-j/23/orig
2025-12-04T09:33:41.8051095Z  * [new branch]              gh/yanbing-j/24/base        -> origin/gh/yanbing-j/24/base
2025-12-04T09:33:41.8052364Z  * [new branch]              gh/yanbing-j/24/head        -> origin/gh/yanbing-j/24/head
2025-12-04T09:33:41.8054169Z  * [new branch]              gh/yanbing-j/24/orig        -> origin/gh/yanbing-j/24/orig
2025-12-04T09:33:41.8056022Z  * [new branch]              gh/yanbing-j/25/base        -> origin/gh/yanbing-j/25/base
2025-12-04T09:33:41.8057431Z  * [new branch]              gh/yanbing-j/25/head        -> origin/gh/yanbing-j/25/head
2025-12-04T09:33:41.8058959Z  * [new branch]              gh/yanbing-j/25/orig        -> origin/gh/yanbing-j/25/orig
2025-12-04T09:33:41.8061144Z  * [new branch]              gh/yanbing-j/26/base        -> origin/gh/yanbing-j/26/base
2025-12-04T09:33:41.8062436Z  * [new branch]              gh/yanbing-j/26/head        -> origin/gh/yanbing-j/26/head
2025-12-04T09:33:41.8064027Z  * [new branch]              gh/yanbing-j/26/orig        -> origin/gh/yanbing-j/26/orig
2025-12-04T09:33:41.8066560Z  * [new branch]              gh/yang-yu-hang/1/base      -> origin/gh/yang-yu-hang/1/base
2025-12-04T09:33:41.8068265Z  * [new branch]              gh/yang-yu-hang/1/head      -> origin/gh/yang-yu-hang/1/head
2025-12-04T09:33:41.8069926Z  * [new branch]              gh/yang-yu-hang/1/orig      -> origin/gh/yang-yu-hang/1/orig
2025-12-04T09:33:41.8072237Z  * [new branch]              gh/yang-yu-hang/2/base      -> origin/gh/yang-yu-hang/2/base
2025-12-04T09:33:41.8074077Z  * [new branch]              gh/yang-yu-hang/2/head      -> origin/gh/yang-yu-hang/2/head
2025-12-04T09:33:41.8076615Z  * [new branch]              gh/yang-yu-hang/2/orig      -> origin/gh/yang-yu-hang/2/orig
2025-12-04T09:33:41.8078449Z  * [new branch]              gh/yang-yu-hang/3/base      -> origin/gh/yang-yu-hang/3/base
2025-12-04T09:33:41.8079977Z  * [new branch]              gh/yang-yu-hang/3/head      -> origin/gh/yang-yu-hang/3/head
2025-12-04T09:33:41.8081573Z  * [new branch]              gh/yang-yu-hang/3/orig      -> origin/gh/yang-yu-hang/3/orig
2025-12-04T09:33:41.8084448Z  * [new branch]              gh/yangw-dev/12/base        -> origin/gh/yangw-dev/12/base
2025-12-04T09:33:41.8085785Z  * [new branch]              gh/yangw-dev/12/head        -> origin/gh/yangw-dev/12/head
2025-12-04T09:33:41.8087978Z  * [new branch]              gh/yangw-dev/12/orig        -> origin/gh/yangw-dev/12/orig
2025-12-04T09:33:41.8089981Z  * [new branch]              gh/yangw-dev/13/base        -> origin/gh/yangw-dev/13/base
2025-12-04T09:33:41.8091357Z  * [new branch]              gh/yangw-dev/13/head        -> origin/gh/yangw-dev/13/head
2025-12-04T09:33:41.8093061Z  * [new branch]              gh/yangw-dev/13/orig        -> origin/gh/yangw-dev/13/orig
2025-12-04T09:33:41.8094994Z  * [new branch]              gh/yangw-dev/14/base        -> origin/gh/yangw-dev/14/base
2025-12-04T09:33:41.8096660Z  * [new branch]              gh/yangw-dev/14/head        -> origin/gh/yangw-dev/14/head
2025-12-04T09:33:41.8097968Z  * [new branch]              gh/yangw-dev/14/orig        -> origin/gh/yangw-dev/14/orig
2025-12-04T09:33:41.8100141Z  * [new branch]              gh/yangw-dev/15/base        -> origin/gh/yangw-dev/15/base
2025-12-04T09:33:41.8101417Z  * [new branch]              gh/yangw-dev/15/head        -> origin/gh/yangw-dev/15/head
2025-12-04T09:33:41.8102939Z  * [new branch]              gh/yangw-dev/15/orig        -> origin/gh/yangw-dev/15/orig
2025-12-04T09:33:41.8104910Z  * [new branch]              gh/yangw-dev/19/base        -> origin/gh/yangw-dev/19/base
2025-12-04T09:33:41.8106185Z  * [new branch]              gh/yangw-dev/19/head        -> origin/gh/yangw-dev/19/head
2025-12-04T09:33:41.8108223Z  * [new branch]              gh/yangw-dev/19/orig        -> origin/gh/yangw-dev/19/orig
2025-12-04T09:33:41.8110205Z  * [new branch]              gh/yangw-dev/26/base        -> origin/gh/yangw-dev/26/base
2025-12-04T09:33:41.8111764Z  * [new branch]              gh/yangw-dev/26/head        -> origin/gh/yangw-dev/26/head
2025-12-04T09:33:41.8113272Z  * [new branch]              gh/yangw-dev/26/orig        -> origin/gh/yangw-dev/26/orig
2025-12-04T09:33:41.8115234Z  * [new branch]              gh/yangw-dev/27/base        -> origin/gh/yangw-dev/27/base
2025-12-04T09:33:41.8116824Z  * [new branch]              gh/yangw-dev/27/head        -> origin/gh/yangw-dev/27/head
2025-12-04T09:33:41.8117996Z  * [new branch]              gh/yangw-dev/27/orig        -> origin/gh/yangw-dev/27/orig
2025-12-04T09:33:41.8120691Z  * [new branch]              gh/ydwu4/292/base           -> origin/gh/ydwu4/292/base
2025-12-04T09:33:41.8121861Z  * [new branch]              gh/ydwu4/292/head           -> origin/gh/ydwu4/292/head
2025-12-04T09:33:41.8123521Z  * [new branch]              gh/ydwu4/292/orig           -> origin/gh/ydwu4/292/orig
2025-12-04T09:33:41.8125498Z  * [new branch]              gh/ydwu4/294/base           -> origin/gh/ydwu4/294/base
2025-12-04T09:33:41.8126980Z  * [new branch]              gh/ydwu4/294/head           -> origin/gh/ydwu4/294/head
2025-12-04T09:33:41.8128493Z  * [new branch]              gh/ydwu4/294/orig           -> origin/gh/ydwu4/294/orig
2025-12-04T09:33:41.8130699Z  * [new branch]              gh/ydwu4/295/base           -> origin/gh/ydwu4/295/base
2025-12-04T09:33:41.8132421Z  * [new branch]              gh/ydwu4/295/head           -> origin/gh/ydwu4/295/head
2025-12-04T09:33:41.8133700Z  * [new branch]              gh/ydwu4/295/orig           -> origin/gh/ydwu4/295/orig
2025-12-04T09:33:41.8135775Z  * [new branch]              gh/ydwu4/296/base           -> origin/gh/ydwu4/296/base
2025-12-04T09:33:41.8137039Z  * [new branch]              gh/ydwu4/296/head           -> origin/gh/ydwu4/296/head
2025-12-04T09:33:41.8138650Z  * [new branch]              gh/ydwu4/296/orig           -> origin/gh/ydwu4/296/orig
2025-12-04T09:33:41.8140695Z  * [new branch]              gh/ydwu4/306/base           -> origin/gh/ydwu4/306/base
2025-12-04T09:33:41.8142280Z  * [new branch]              gh/ydwu4/306/head           -> origin/gh/ydwu4/306/head
2025-12-04T09:33:41.8144246Z  * [new branch]              gh/ydwu4/306/orig           -> origin/gh/ydwu4/306/orig
2025-12-04T09:33:41.8146285Z  * [new branch]              gh/ydwu4/312/base           -> origin/gh/ydwu4/312/base
2025-12-04T09:33:41.8147603Z  * [new branch]              gh/ydwu4/312/head           -> origin/gh/ydwu4/312/head
2025-12-04T09:33:41.8149167Z  * [new branch]              gh/ydwu4/312/orig           -> origin/gh/ydwu4/312/orig
2025-12-04T09:33:41.8151161Z  * [new branch]              gh/ydwu4/322/base           -> origin/gh/ydwu4/322/base
2025-12-04T09:33:41.8152711Z  * [new branch]              gh/ydwu4/322/head           -> origin/gh/ydwu4/322/head
2025-12-04T09:33:41.8154028Z  * [new branch]              gh/ydwu4/322/orig           -> origin/gh/ydwu4/322/orig
2025-12-04T09:33:41.8156131Z  * [new branch]              gh/ydwu4/327/base           -> origin/gh/ydwu4/327/base
2025-12-04T09:33:41.8157585Z  * [new branch]              gh/ydwu4/327/head           -> origin/gh/ydwu4/327/head
2025-12-04T09:33:41.8159169Z  * [new branch]              gh/ydwu4/327/orig           -> origin/gh/ydwu4/327/orig
2025-12-04T09:33:41.8161257Z  * [new branch]              gh/ydwu4/328/base           -> origin/gh/ydwu4/328/base
2025-12-04T09:33:41.8162493Z  * [new branch]              gh/ydwu4/328/head           -> origin/gh/ydwu4/328/head
2025-12-04T09:33:41.8164044Z  * [new branch]              gh/ydwu4/328/orig           -> origin/gh/ydwu4/328/orig
2025-12-04T09:33:41.8165825Z  * [new branch]              gh/ydwu4/329/base           -> origin/gh/ydwu4/329/base
2025-12-04T09:33:41.8167143Z  * [new branch]              gh/ydwu4/329/head           -> origin/gh/ydwu4/329/head
2025-12-04T09:33:41.8168626Z  * [new branch]              gh/ydwu4/329/orig           -> origin/gh/ydwu4/329/orig
2025-12-04T09:33:41.8170813Z  * [new branch]              gh/ydwu4/330/base           -> origin/gh/ydwu4/330/base
2025-12-04T09:33:41.8175589Z  * [new branch]              gh/ydwu4/330/head           -> origin/gh/ydwu4/330/head
2025-12-04T09:33:41.8176987Z  * [new branch]              gh/ydwu4/330/orig           -> origin/gh/ydwu4/330/orig
2025-12-04T09:33:41.8178954Z  * [new branch]              gh/ydwu4/331/base           -> origin/gh/ydwu4/331/base
2025-12-04T09:33:41.8180636Z  * [new branch]              gh/ydwu4/331/head           -> origin/gh/ydwu4/331/head
2025-12-04T09:33:41.8181845Z  * [new branch]              gh/ydwu4/331/orig           -> origin/gh/ydwu4/331/orig
2025-12-04T09:33:41.8183728Z  * [new branch]              gh/ydwu4/332/base           -> origin/gh/ydwu4/332/base
2025-12-04T09:33:41.8185313Z  * [new branch]              gh/ydwu4/332/head           -> origin/gh/ydwu4/332/head
2025-12-04T09:33:41.8186728Z  * [new branch]              gh/ydwu4/332/orig           -> origin/gh/ydwu4/332/orig
2025-12-04T09:33:41.8188571Z  * [new branch]              gh/ydwu4/333/base           -> origin/gh/ydwu4/333/base
2025-12-04T09:33:41.8189816Z  * [new branch]              gh/ydwu4/333/head           -> origin/gh/ydwu4/333/head
2025-12-04T09:33:41.8191458Z  * [new branch]              gh/ydwu4/333/orig           -> origin/gh/ydwu4/333/orig
2025-12-04T09:33:41.8193223Z  * [new branch]              gh/ydwu4/334/base           -> origin/gh/ydwu4/334/base
2025-12-04T09:33:41.8194828Z  * [new branch]              gh/ydwu4/334/head           -> origin/gh/ydwu4/334/head
2025-12-04T09:33:41.8196370Z  * [new branch]              gh/ydwu4/334/orig           -> origin/gh/ydwu4/334/orig
2025-12-04T09:33:41.8198139Z  * [new branch]              gh/ydwu4/335/base           -> origin/gh/ydwu4/335/base
2025-12-04T09:33:41.8199522Z  * [new branch]              gh/ydwu4/335/head           -> origin/gh/ydwu4/335/head
2025-12-04T09:33:41.8201101Z  * [new branch]              gh/ydwu4/335/orig           -> origin/gh/ydwu4/335/orig
2025-12-04T09:33:41.8203602Z  * [new branch]              gh/ydwu4/337/base           -> origin/gh/ydwu4/337/base
2025-12-04T09:33:41.8204898Z  * [new branch]              gh/ydwu4/337/head           -> origin/gh/ydwu4/337/head
2025-12-04T09:33:41.8206473Z  * [new branch]              gh/ydwu4/337/orig           -> origin/gh/ydwu4/337/orig
2025-12-04T09:33:41.8208519Z  * [new branch]              gh/ydwu4/339/base           -> origin/gh/ydwu4/339/base
2025-12-04T09:33:41.8210048Z  * [new branch]              gh/ydwu4/339/head           -> origin/gh/ydwu4/339/head
2025-12-04T09:33:41.8211338Z  * [new branch]              gh/ydwu4/339/orig           -> origin/gh/ydwu4/339/orig
2025-12-04T09:33:41.8214286Z  * [new branch]              gh/yf225/133/base           -> origin/gh/yf225/133/base
2025-12-04T09:33:41.8215543Z  * [new branch]              gh/yf225/133/head           -> origin/gh/yf225/133/head
2025-12-04T09:33:41.8217709Z  * [new branch]              gh/yf225/93/base            -> origin/gh/yf225/93/base
2025-12-04T09:33:41.8218957Z  * [new branch]              gh/yf225/93/head            -> origin/gh/yf225/93/head
2025-12-04T09:33:41.8222370Z  * [new branch]              gh/yifuwang/152/base        -> origin/gh/yifuwang/152/base
2025-12-04T09:33:41.8224272Z  * [new branch]              gh/yifuwang/152/head        -> origin/gh/yifuwang/152/head
2025-12-04T09:33:41.8225810Z  * [new branch]              gh/yifuwang/152/orig        -> origin/gh/yifuwang/152/orig
2025-12-04T09:33:41.8227770Z  * [new branch]              gh/yifuwang/195/base        -> origin/gh/yifuwang/195/base
2025-12-04T09:33:41.8229360Z  * [new branch]              gh/yifuwang/195/head        -> origin/gh/yifuwang/195/head
2025-12-04T09:33:41.8230659Z  * [new branch]              gh/yifuwang/195/orig        -> origin/gh/yifuwang/195/orig
2025-12-04T09:33:41.8233309Z  * [new branch]              gh/yiming0416/1/base        -> origin/gh/yiming0416/1/base
2025-12-04T09:33:41.8234632Z  * [new branch]              gh/yiming0416/1/head        -> origin/gh/yiming0416/1/head
2025-12-04T09:33:41.8236780Z  * [new branch]              gh/yiming0416/2/base        -> origin/gh/yiming0416/2/base
2025-12-04T09:33:41.8237761Z  * [new branch]              gh/yiming0416/2/head        -> origin/gh/yiming0416/2/head
2025-12-04T09:33:41.8240423Z  * [new branch]              gh/yushangdi/1/base         -> origin/gh/yushangdi/1/base
2025-12-04T09:33:41.8242498Z  * [new branch]              gh/yushangdi/1/head         -> origin/gh/yushangdi/1/head
2025-12-04T09:33:41.8244288Z  * [new branch]              gh/yushangdi/10/base        -> origin/gh/yushangdi/10/base
2025-12-04T09:33:41.8245608Z  * [new branch]              gh/yushangdi/10/head        -> origin/gh/yushangdi/10/head
2025-12-04T09:33:41.8247255Z  * [new branch]              gh/yushangdi/10/orig        -> origin/gh/yushangdi/10/orig
2025-12-04T09:33:41.8249199Z  * [new branch]              gh/yushangdi/11/base        -> origin/gh/yushangdi/11/base
2025-12-04T09:33:41.8250472Z  * [new branch]              gh/yushangdi/11/head        -> origin/gh/yushangdi/11/head
2025-12-04T09:33:41.8252289Z  * [new branch]              gh/yushangdi/11/orig        -> origin/gh/yushangdi/11/orig
2025-12-04T09:33:41.8254111Z  * [new branch]              gh/yushangdi/2/base         -> origin/gh/yushangdi/2/base
2025-12-04T09:33:41.8255370Z  * [new branch]              gh/yushangdi/2/head         -> origin/gh/yushangdi/2/head
2025-12-04T09:33:41.8257685Z  * [new branch]              gh/yushangdi/7/base         -> origin/gh/yushangdi/7/base
2025-12-04T09:33:41.8258974Z  * [new branch]              gh/yushangdi/7/head         -> origin/gh/yushangdi/7/head
2025-12-04T09:33:41.8260586Z  * [new branch]              gh/yushangdi/7/orig         -> origin/gh/yushangdi/7/orig
2025-12-04T09:33:41.8262978Z  * [new branch]              gh/yushangdi/8/base         -> origin/gh/yushangdi/8/base
2025-12-04T09:33:41.8264711Z  * [new branch]              gh/yushangdi/8/head         -> origin/gh/yushangdi/8/head
2025-12-04T09:33:41.8266323Z  * [new branch]              gh/yushangdi/8/orig         -> origin/gh/yushangdi/8/orig
2025-12-04T09:33:41.8268119Z  * [new branch]              gh/yushangdi/9/base         -> origin/gh/yushangdi/9/base
2025-12-04T09:33:41.8269432Z  * [new branch]              gh/yushangdi/9/head         -> origin/gh/yushangdi/9/head
2025-12-04T09:33:41.8271279Z  * [new branch]              gh/yushangdi/9/orig         -> origin/gh/yushangdi/9/orig
2025-12-04T09:33:41.8274013Z  * [new branch]              gh/zklaus/19/base           -> origin/gh/zklaus/19/base
2025-12-04T09:33:41.8275253Z  * [new branch]              gh/zklaus/19/head           -> origin/gh/zklaus/19/head
2025-12-04T09:33:41.8276854Z  * [new branch]              gh/zklaus/19/orig           -> origin/gh/zklaus/19/orig
2025-12-04T09:33:41.8278906Z  * [new branch]              gh/zklaus/20/base           -> origin/gh/zklaus/20/base
2025-12-04T09:33:41.8280988Z  * [new branch]              gh/zklaus/20/head           -> origin/gh/zklaus/20/head
2025-12-04T09:33:41.8282509Z  * [new branch]              gh/zklaus/20/orig           -> origin/gh/zklaus/20/orig
2025-12-04T09:33:41.8284576Z  * [new branch]              gh/zklaus/21/base           -> origin/gh/zklaus/21/base
2025-12-04T09:33:41.8285906Z  * [new branch]              gh/zklaus/21/head           -> origin/gh/zklaus/21/head
2025-12-04T09:33:41.8287554Z  * [new branch]              gh/zklaus/21/orig           -> origin/gh/zklaus/21/orig
2025-12-04T09:33:41.8289580Z  * [new branch]              gh/zklaus/22/base           -> origin/gh/zklaus/22/base
2025-12-04T09:33:41.8290825Z  * [new branch]              gh/zklaus/22/head           -> origin/gh/zklaus/22/head
2025-12-04T09:33:41.8292999Z  * [new branch]              gh/zklaus/22/orig           -> origin/gh/zklaus/22/orig
2025-12-04T09:33:41.8295105Z  * [new branch]              gh/zklaus/23/base           -> origin/gh/zklaus/23/base
2025-12-04T09:33:41.8296433Z  * [new branch]              gh/zklaus/23/head           -> origin/gh/zklaus/23/head
2025-12-04T09:33:41.8298090Z  * [new branch]              gh/zklaus/23/orig           -> origin/gh/zklaus/23/orig
2025-12-04T09:33:41.8299907Z  * [new branch]              gh/zklaus/24/base           -> origin/gh/zklaus/24/base
2025-12-04T09:33:41.8301176Z  * [new branch]              gh/zklaus/24/head           -> origin/gh/zklaus/24/head
2025-12-04T09:33:41.8303322Z  * [new branch]              gh/zklaus/24/orig           -> origin/gh/zklaus/24/orig
2025-12-04T09:33:41.8306187Z  * [new branch]              gh/zou3519/1197/base        -> origin/gh/zou3519/1197/base
2025-12-04T09:33:41.8307249Z  * [new branch]              gh/zou3519/1197/head        -> origin/gh/zou3519/1197/head
2025-12-04T09:33:41.8308882Z  * [new branch]              gh/zou3519/1197/orig        -> origin/gh/zou3519/1197/orig
2025-12-04T09:33:41.8311401Z  * [new branch]              gh/zou3519/1199/base        -> origin/gh/zou3519/1199/base
2025-12-04T09:33:41.8312898Z  * [new branch]              gh/zou3519/1199/head        -> origin/gh/zou3519/1199/head
2025-12-04T09:33:41.8314992Z  * [new branch]              gh/zou3519/1199/orig        -> origin/gh/zou3519/1199/orig
2025-12-04T09:33:41.8316946Z  * [new branch]              gh/zou3519/1200/base        -> origin/gh/zou3519/1200/base
2025-12-04T09:33:41.8318226Z  * [new branch]              gh/zou3519/1200/head        -> origin/gh/zou3519/1200/head
2025-12-04T09:33:41.8319873Z  * [new branch]              gh/zou3519/1200/orig        -> origin/gh/zou3519/1200/orig
2025-12-04T09:33:41.8321933Z  * [new branch]              gh/zou3519/1201/base        -> origin/gh/zou3519/1201/base
2025-12-04T09:33:41.8323139Z  * [new branch]              gh/zou3519/1201/head        -> origin/gh/zou3519/1201/head
2025-12-04T09:33:41.8324758Z  * [new branch]              gh/zou3519/1201/orig        -> origin/gh/zou3519/1201/orig
2025-12-04T09:33:41.8326588Z  * [new branch]              gh/zou3519/1202/base        -> origin/gh/zou3519/1202/base
2025-12-04T09:33:41.8327891Z  * [new branch]              gh/zou3519/1202/head        -> origin/gh/zou3519/1202/head
2025-12-04T09:33:41.8329556Z  * [new branch]              gh/zou3519/1202/orig        -> origin/gh/zou3519/1202/orig
2025-12-04T09:33:41.8332135Z  * [new branch]              gh/zpcore/1/base            -> origin/gh/zpcore/1/base
2025-12-04T09:33:41.8333209Z  * [new branch]              gh/zpcore/1/head            -> origin/gh/zpcore/1/head
2025-12-04T09:33:41.8335316Z  * [new branch]              gh/zpcore/11/base           -> origin/gh/zpcore/11/base
2025-12-04T09:33:41.8336908Z  * [new branch]              gh/zpcore/11/head           -> origin/gh/zpcore/11/head
2025-12-04T09:33:41.8338462Z  * [new branch]              gh/zpcore/11/orig           -> origin/gh/zpcore/11/orig
2025-12-04T09:33:41.8340906Z  * [new branch]              gh/zpcore/12/base           -> origin/gh/zpcore/12/base
2025-12-04T09:33:41.8342370Z  * [new branch]              gh/zpcore/12/head           -> origin/gh/zpcore/12/head
2025-12-04T09:33:41.8343883Z  * [new branch]              gh/zpcore/12/orig           -> origin/gh/zpcore/12/orig
2025-12-04T09:33:41.8345938Z  * [new branch]              gh/zpcore/13/base           -> origin/gh/zpcore/13/base
2025-12-04T09:33:41.8347404Z  * [new branch]              gh/zpcore/13/head           -> origin/gh/zpcore/13/head
2025-12-04T09:33:41.8348880Z  * [new branch]              gh/zpcore/13/orig           -> origin/gh/zpcore/13/orig
2025-12-04T09:33:41.8351296Z  * [new branch]              gh/zpcore/14/base           -> origin/gh/zpcore/14/base
2025-12-04T09:33:41.8352896Z  * [new branch]              gh/zpcore/14/head           -> origin/gh/zpcore/14/head
2025-12-04T09:33:41.8354377Z  * [new branch]              gh/zpcore/14/orig           -> origin/gh/zpcore/14/orig
2025-12-04T09:33:41.8356602Z  * [new branch]              gh/zpcore/15/base           -> origin/gh/zpcore/15/base
2025-12-04T09:33:41.8358055Z  * [new branch]              gh/zpcore/15/head           -> origin/gh/zpcore/15/head
2025-12-04T09:33:41.8359566Z  * [new branch]              gh/zpcore/15/orig           -> origin/gh/zpcore/15/orig
2025-12-04T09:33:41.8361553Z  * [new branch]              gh/zpcore/2/base            -> origin/gh/zpcore/2/base
2025-12-04T09:33:41.8363015Z  * [new branch]              gh/zpcore/2/head            -> origin/gh/zpcore/2/head
2025-12-04T09:33:41.8365598Z  * [new branch]              gh/zpcore/21/base           -> origin/gh/zpcore/21/base
2025-12-04T09:33:41.8367372Z  * [new branch]              gh/zpcore/21/head           -> origin/gh/zpcore/21/head
2025-12-04T09:33:41.8368849Z  * [new branch]              gh/zpcore/21/orig           -> origin/gh/zpcore/21/orig
2025-12-04T09:33:41.8371241Z  * [new branch]              gh/zpcore/22/base           -> origin/gh/zpcore/22/base
2025-12-04T09:33:41.8372793Z  * [new branch]              gh/zpcore/22/head           -> origin/gh/zpcore/22/head
2025-12-04T09:33:41.8374392Z  * [new branch]              gh/zpcore/22/orig           -> origin/gh/zpcore/22/orig
2025-12-04T09:33:41.8376489Z  * [new branch]              gh/zpcore/23/base           -> origin/gh/zpcore/23/base
2025-12-04T09:33:41.8378094Z  * [new branch]              gh/zpcore/23/head           -> origin/gh/zpcore/23/head
2025-12-04T09:33:41.8380079Z  * [new branch]              gh/zpcore/23/orig           -> origin/gh/zpcore/23/orig
2025-12-04T09:33:41.8381842Z  * [new branch]              gh/zpcore/24/base           -> origin/gh/zpcore/24/base
2025-12-04T09:33:41.8383326Z  * [new branch]              gh/zpcore/24/head           -> origin/gh/zpcore/24/head
2025-12-04T09:33:41.8384774Z  * [new branch]              gh/zpcore/24/orig           -> origin/gh/zpcore/24/orig
2025-12-04T09:33:41.8387077Z  * [new branch]              gh/zpcore/25/base           -> origin/gh/zpcore/25/base
2025-12-04T09:33:41.8388609Z  * [new branch]              gh/zpcore/25/head           -> origin/gh/zpcore/25/head
2025-12-04T09:33:41.8390085Z  * [new branch]              gh/zpcore/25/orig           -> origin/gh/zpcore/25/orig
2025-12-04T09:33:41.8392155Z  * [new branch]              gh/zpcore/26/base           -> origin/gh/zpcore/26/base
2025-12-04T09:33:41.8393787Z  * [new branch]              gh/zpcore/26/head           -> origin/gh/zpcore/26/head
2025-12-04T09:33:41.8395352Z  * [new branch]              gh/zpcore/26/orig           -> origin/gh/zpcore/26/orig
2025-12-04T09:33:41.8397608Z  * [new branch]              gh/zpcore/27/base           -> origin/gh/zpcore/27/base
2025-12-04T09:33:41.8399122Z  * [new branch]              gh/zpcore/27/head           -> origin/gh/zpcore/27/head
2025-12-04T09:33:41.8400569Z  * [new branch]              gh/zpcore/27/orig           -> origin/gh/zpcore/27/orig
2025-12-04T09:33:41.8403071Z  * [new branch]              gh/zpcore/28/base           -> origin/gh/zpcore/28/base
2025-12-04T09:33:41.8405108Z  * [new branch]              gh/zpcore/28/head           -> origin/gh/zpcore/28/head
2025-12-04T09:33:41.8406663Z  * [new branch]              gh/zpcore/28/orig           -> origin/gh/zpcore/28/orig
2025-12-04T09:33:41.8408970Z  * [new branch]              gh/zpcore/3/base            -> origin/gh/zpcore/3/base
2025-12-04T09:33:41.8410403Z  * [new branch]              gh/zpcore/3/head            -> origin/gh/zpcore/3/head
2025-12-04T09:33:41.8412189Z  * [new branch]              gh/zpcore/4/base            -> origin/gh/zpcore/4/base
2025-12-04T09:33:41.8413637Z  * [new branch]              gh/zpcore/4/head            -> origin/gh/zpcore/4/head
2025-12-04T09:33:41.8415564Z  * [new branch]              gh/zpcore/5/base            -> origin/gh/zpcore/5/base
2025-12-04T09:33:41.8417039Z  * [new branch]              gh/zpcore/5/head            -> origin/gh/zpcore/5/head
2025-12-04T09:33:41.8418823Z  * [new branch]              gh/zpcore/6/base            -> origin/gh/zpcore/6/base
2025-12-04T09:33:41.8420236Z  * [new branch]              gh/zpcore/6/head            -> origin/gh/zpcore/6/head
2025-12-04T09:33:41.8422509Z  * [new branch]              gh/zpcore/7/base            -> origin/gh/zpcore/7/base
2025-12-04T09:33:41.8423963Z  * [new branch]              gh/zpcore/7/head            -> origin/gh/zpcore/7/head
2025-12-04T09:33:41.8425772Z  * [new branch]              gh/zpcore/8/base            -> origin/gh/zpcore/8/base
2025-12-04T09:33:41.8427264Z  * [new branch]              gh/zpcore/8/head            -> origin/gh/zpcore/8/head
2025-12-04T09:33:41.8429036Z  * [new branch]              google-main                 -> origin/google-main
2025-12-04T09:33:41.8431217Z  * [new branch]              guangyey/external_stream    -> origin/guangyey/external_stream
2025-12-04T09:33:41.8432491Z  * [new branch]              guangyey/test_2025          -> origin/guangyey/test_2025
2025-12-04T09:33:41.8435364Z  * [new branch]              guilhermeleobas/cherry-pick-55d87d9dfd9 -> origin/guilhermeleobas/cherry-pick-55d87d9dfd9
2025-12-04T09:33:41.8437303Z  * [new branch]              hameerabbasi/complex_tensor_subclass -> origin/hameerabbasi/complex_tensor_subclass
2025-12-04T09:33:41.8439010Z  * [new branch]              hameerabbasi/fix-ctensor-gradcheck-tests -> origin/hameerabbasi/fix-ctensor-gradcheck-tests
2025-12-04T09:33:41.8440399Z  * [new branch]              hameerabbasi/gradcheck-allclose -> origin/hameerabbasi/gradcheck-allclose
2025-12-04T09:33:41.8441808Z  * [new branch]              hc_baseline                 -> origin/hc_baseline
2025-12-04T09:33:41.8443368Z  * [new branch]              hhh_rand                    -> origin/hhh_rand
2025-12-04T09:33:41.8445241Z  * [new branch]              huba/f1                     -> origin/huba/f1
2025-12-04T09:33:41.8447849Z  * [new branch]              increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test -> origin/increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test
2025-12-04T09:33:41.8448981Z  * [new branch]              inlining                    -> origin/inlining
2025-12-04T09:33:41.8450555Z  * [new branch]              inlining-ezyang             -> origin/inlining-ezyang
2025-12-04T09:33:41.8452121Z  * [new branch]              install-torchao-0.13.0      -> origin/install-torchao-0.13.0
2025-12-04T09:33:41.8453968Z  * [new branch]              instrument-trunk-pull-linux-with-job-test-filters -> origin/instrument-trunk-pull-linux-with-job-test-filters
2025-12-04T09:33:41.8455149Z  * [new branch]              invoke-subgraph             -> origin/invoke-subgraph
2025-12-04T09:33:41.8456954Z  * [new branch]              issue#58739                 -> origin/issue#58739
2025-12-04T09:33:41.8459063Z  * [new branch]              jainapurva-patch-1          -> origin/jainapurva-patch-1
2025-12-04T09:33:41.8460913Z  * [new branch]              jathu/o3                    -> origin/jathu/o3
2025-12-04T09:33:41.8462140Z  * [new branch]              jathu/sve                   -> origin/jathu/sve
2025-12-04T09:33:41.8465123Z  * [new branch]              jcaip/test-cusparselt-version-0.6.2 -> origin/jcaip/test-cusparselt-version-0.6.2
2025-12-04T09:33:41.8466150Z  * [new branch]              jcaip/update-cusparselt-0.6.2 -> origin/jcaip/update-cusparselt-0.6.2
2025-12-04T09:33:41.8468298Z  * [new branch]              jiannanWang/memorysnapshot_filter -> origin/jiannanWang/memorysnapshot_filter
2025-12-04T09:33:41.8469808Z  * [new branch]              jiannanWang/profilerstepwarning -> origin/jiannanWang/profilerstepwarning
2025-12-04T09:33:41.8471610Z  * [new branch]              jithunnair-amd-patch-1      -> origin/jithunnair-amd-patch-1
2025-12-04T09:33:41.8473221Z  * [new branch]              jithunnair-amd-patch-10     -> origin/jithunnair-amd-patch-10
2025-12-04T09:33:41.8474816Z  * [new branch]              jithunnair-amd-patch-2      -> origin/jithunnair-amd-patch-2
2025-12-04T09:33:41.8476433Z  * [new branch]              jithunnair-amd-patch-3      -> origin/jithunnair-amd-patch-3
2025-12-04T09:33:41.8478063Z  * [new branch]              jithunnair-amd-patch-4      -> origin/jithunnair-amd-patch-4
2025-12-04T09:33:41.8479524Z  * [new branch]              jithunnair-amd-patch-5      -> origin/jithunnair-amd-patch-5
2025-12-04T09:33:41.8481252Z  * [new branch]              jithunnair-amd-patch-6      -> origin/jithunnair-amd-patch-6
2025-12-04T09:33:41.8482739Z  * [new branch]              jithunnair-amd-patch-7      -> origin/jithunnair-amd-patch-7
2025-12-04T09:33:41.8484489Z  * [new branch]              jithunnair-amd-patch-8      -> origin/jithunnair-amd-patch-8
2025-12-04T09:33:41.8485988Z  * [new branch]              jithunnair-amd-patch-9      -> origin/jithunnair-amd-patch-9
2025-12-04T09:33:41.8488097Z  * [new branch]              justinchu/native-qdq        -> origin/justinchu/native-qdq
2025-12-04T09:33:41.8490205Z  * [new branch]              kainan666/xlf_debug         -> origin/kainan666/xlf_debug
2025-12-04T09:33:41.8491514Z  * [new branch]              kainan_test                 -> origin/kainan_test
2025-12-04T09:33:41.8493507Z  * [new branch]              larryliu0820-patch-1        -> origin/larryliu0820-patch-1
2025-12-04T09:33:41.8495698Z  * [new branch]              leslie/test_group_gemm_epilogues -> origin/leslie/test_group_gemm_epilogues
2025-12-04T09:33:41.8498333Z  * [new branch]              lessw2020/fix_cutlass_cache_error -> origin/lessw2020/fix_cutlass_cache_error
2025-12-04T09:33:41.8500218Z  * [new branch]              liaoxuan/shm_all_reduce     -> origin/liaoxuan/shm_all_reduce
2025-12-04T09:33:41.8501769Z  * [new branch]              liaoxuan/test_fa_disable_softmax -> origin/liaoxuan/test_fa_disable_softmax
2025-12-04T09:33:41.8503212Z  * [new branch]              liaoxuan/test_int8_sdpa     -> origin/liaoxuan/test_int8_sdpa
2025-12-04T09:33:41.8504645Z  * [new branch]              llama4-stable               -> origin/llama4-stable
2025-12-04T09:33:41.8507095Z  * [new branch]              lts/release/1.8             -> origin/lts/release/1.8
2025-12-04T09:33:41.8509189Z  * [new branch]              lucaskabela/#94773          -> origin/lucaskabela/#94773
2025-12-04T09:33:41.8510679Z  * [new branch]              lucaskabela/fix_164876      -> origin/lucaskabela/fix_164876
2025-12-04T09:33:41.8512085Z  * [new branch]              lucaskabela/flop_counter    -> origin/lucaskabela/flop_counter
2025-12-04T09:33:41.8513501Z  * [new branch]              lucaskabela/func_under_decomp -> origin/lucaskabela/func_under_decomp
2025-12-04T09:33:41.8514942Z  * [new branch]              lucaskabela/functional_in_dynamo -> origin/lucaskabela/functional_in_dynamo
2025-12-04T09:33:41.8516484Z  * [new branch]              lucaskabela/install_params_as_graph_attr -> origin/lucaskabela/install_params_as_graph_attr
2025-12-04T09:33:41.8518215Z  * [new branch]              lucaskabela/parameters_as_graph_attr -> origin/lucaskabela/parameters_as_graph_attr
2025-12-04T09:33:41.8520184Z  * [new branch]              lucaskabela/remove_aot_dispatcher_metadata -> origin/lucaskabela/remove_aot_dispatcher_metadata
2025-12-04T09:33:41.8521575Z  * [new branch]              lucaskabela/rnn_decomp      -> origin/lucaskabela/rnn_decomp
2025-12-04T09:33:41.8523157Z  * [new branch]              lucaskabela/typing_backends -> origin/lucaskabela/typing_backends
2025-12-04T09:33:41.8524677Z  * [new branch]              lucaskabela/typing_ctx_manager -> origin/lucaskabela/typing_ctx_manager
2025-12-04T09:33:41.8526092Z  * [new branch]              lucaskabela/typing_nn_module -> origin/lucaskabela/typing_nn_module
2025-12-04T09:33:41.8527583Z  * [new branch]              lucaskabela/typing_user_defined -> origin/lucaskabela/typing_user_defined
2025-12-04T09:33:41.8528986Z  * [new branch]              lucaskabela/typing_variables -> origin/lucaskabela/typing_variables
2025-12-04T09:33:41.8530592Z  * [new branch]              lucaskabela/typing_variables_dicts -> origin/lucaskabela/typing_variables_dicts
2025-12-04T09:33:41.8532044Z  * [new branch]              lucaskabela/typing_variables_functions -> origin/lucaskabela/typing_variables_functions
2025-12-04T09:33:41.8533433Z  * [new branch]              lucaskabela/typing_variables_lists -> origin/lucaskabela/typing_variables_lists
2025-12-04T09:33:41.8535496Z  * [new branch]              lw/torch_box_by_ref         -> origin/lw/torch_box_by_ref
2025-12-04T09:33:41.8537197Z  * [new branch]              main                        -> origin/main
2025-12-04T09:33:41.8538920Z  * [new branch]              malfet-patch-1              -> origin/malfet-patch-1
2025-12-04T09:33:41.8540615Z  * [new branch]              malfet-patch-2              -> origin/malfet-patch-2
2025-12-04T09:33:41.8542252Z  * [new branch]              malfet-patch-3              -> origin/malfet-patch-3
2025-12-04T09:33:41.8544067Z  * [new branch]              malfet-patch-4              -> origin/malfet-patch-4
2025-12-04T09:33:41.8545572Z  * [new branch]              malfet-patch-5              -> origin/malfet-patch-5
2025-12-04T09:33:41.8547185Z  * [new branch]              malfet-patch-6              -> origin/malfet-patch-6
2025-12-04T09:33:41.8548805Z  * [new branch]              malfet-patch-7              -> origin/malfet-patch-7
2025-12-04T09:33:41.8550540Z  * [new branch]              malfet-patch-8              -> origin/malfet-patch-8
2025-12-04T09:33:41.8552576Z  * [new branch]              malfet/add-3.14-ci          -> origin/malfet/add-3.14-ci
2025-12-04T09:33:41.8554319Z  * [new branch]              malfet/be-do-not-make-typos-in-build-artifacts -> origin/malfet/be-do-not-make-typos-in-build-artifacts
2025-12-04T09:33:41.8555796Z  * [new branch]              malfet/be-move-more-settings-to-checkout-pytorch -> origin/malfet/be-move-more-settings-to-checkout-pytorch
2025-12-04T09:33:41.8557521Z  * [new branch]              malfet/be-remove-misisng-neon-headers -> origin/malfet/be-remove-misisng-neon-headers
2025-12-04T09:33:41.8559152Z  * [new branch]              malfet/mps-implement-col2im -> origin/malfet/mps-implement-col2im
2025-12-04T09:33:41.8561224Z  * [new branch]              manuel/aoti_metal_shimify-thread_safe -> origin/manuel/aoti_metal_shimify-thread_safe
2025-12-04T09:33:41.8562575Z  * [new branch]              manuel/inductor_link_openmp -> origin/manuel/inductor_link_openmp
2025-12-04T09:33:41.8564557Z  * [new branch]              masnesral/metaconda         -> origin/masnesral/metaconda
2025-12-04T09:33:41.8566190Z  * [new branch]              mem_profiler_flaky_fix      -> origin/mem_profiler_flaky_fix
2025-12-04T09:33:41.8568269Z  * [new branch]              mem_profiler_stack_trace    -> origin/mem_profiler_stack_trace
2025-12-04T09:33:41.8569924Z  * [new branch]              memory_profiler_stack       -> origin/memory_profiler_stack
2025-12-04T09:33:41.8571603Z  * [new branch]              metascroy-patch-1           -> origin/metascroy-patch-1
2025-12-04T09:33:41.8573274Z  * [new branch]              mingw_posix                 -> origin/mingw_posix
2025-12-04T09:33:41.8575318Z  * [new branch]              mlazos/S429861-debug        -> origin/mlazos/S429861-debug
2025-12-04T09:33:41.8576774Z  * [new branch]              mlazos/aa                   -> origin/mlazos/aa
2025-12-04T09:33:41.8578237Z  * [new branch]              mlazos/acts                 -> origin/mlazos/acts
2025-12-04T09:33:41.8579938Z  * [new branch]              mlazos/arg-renames          -> origin/mlazos/arg-renames
2025-12-04T09:33:41.8581367Z  * [new branch]              mlazos/bad-cudagraphs       -> origin/mlazos/bad-cudagraphs
2025-12-04T09:33:41.8582859Z  * [new branch]              mlazos/baseline-graph-breaks -> origin/mlazos/baseline-graph-breaks
2025-12-04T09:33:41.8584230Z  * [new branch]              mlazos/beta-tensor          -> origin/mlazos/beta-tensor
2025-12-04T09:33:41.8585678Z  * [new branch]              mlazos/buffers              -> origin/mlazos/buffers
2025-12-04T09:33:41.8586916Z  * [new branch]              mlazos/buffers2             -> origin/mlazos/buffers2
2025-12-04T09:33:41.8588668Z  * [new branch]              mlazos/buffers3             -> origin/mlazos/buffers3
2025-12-04T09:33:41.8590374Z  * [new branch]              mlazos/bwd                  -> origin/mlazos/bwd
2025-12-04T09:33:41.8591773Z  * [new branch]              mlazos/combo-test           -> origin/mlazos/combo-test
2025-12-04T09:33:41.8593418Z  * [new branch]              mlazos/ctx-cleanup          -> origin/mlazos/ctx-cleanup
2025-12-04T09:33:41.8594892Z  * [new branch]              mlazos/cuda-cmd-log         -> origin/mlazos/cuda-cmd-log
2025-12-04T09:33:41.8596517Z  * [new branch]              mlazos/cudagraph-tests      -> origin/mlazos/cudagraph-tests
2025-12-04T09:33:41.8598049Z  * [new branch]              mlazos/cudagraphs-measurement -> origin/mlazos/cudagraphs-measurement
2025-12-04T09:33:41.8599592Z  * [new branch]              mlazos/cutlass-test         -> origin/mlazos/cutlass-test
2025-12-04T09:33:41.8601784Z  * [new branch]              mlazos/cutlass-topo-bug     -> origin/mlazos/cutlass-topo-bug
2025-12-04T09:33:41.8603100Z  * [new branch]              mlazos/dataclass-proxy      -> origin/mlazos/dataclass-proxy
2025-12-04T09:33:41.8604555Z  * [new branch]              mlazos/dc-attrs             -> origin/mlazos/dc-attrs
2025-12-04T09:33:41.8606005Z  * [new branch]              mlazos/dc-helion            -> origin/mlazos/dc-helion
2025-12-04T09:33:41.8607480Z  * [new branch]              mlazos/dict-fix             -> origin/mlazos/dict-fix
2025-12-04T09:33:41.8609463Z  * [new branch]              mlazos/disable-tf           -> origin/mlazos/disable-tf
2025-12-04T09:33:41.8610927Z  * [new branch]              mlazos/dupe-fix             -> origin/mlazos/dupe-fix
2025-12-04T09:33:41.8612520Z  * [new branch]              mlazos/dyn-batch            -> origin/mlazos/dyn-batch
2025-12-04T09:33:41.8613969Z  * [new branch]              mlazos/evt                  -> origin/mlazos/evt
2025-12-04T09:33:41.8615489Z  * [new branch]              mlazos/extract-examples     -> origin/mlazos/extract-examples
2025-12-04T09:33:41.8617002Z  * [new branch]              mlazos/foreach-op           -> origin/mlazos/foreach-op
2025-12-04T09:33:41.8618567Z  * [new branch]              mlazos/fp8                  -> origin/mlazos/fp8
2025-12-04T09:33:41.8620142Z  * [new branch]              mlazos/fp8-bias             -> origin/mlazos/fp8-bias
2025-12-04T09:33:41.8621619Z  * [new branch]              mlazos/fp8-bias-fusion      -> origin/mlazos/fp8-bias-fusion
2025-12-04T09:33:41.8623052Z  * [new branch]              mlazos/fp8-fixes            -> origin/mlazos/fp8-fixes
2025-12-04T09:33:41.8624509Z  * [new branch]              mlazos/freezing             -> origin/mlazos/freezing
2025-12-04T09:33:41.8626121Z  * [new branch]              mlazos/h-comp               -> origin/mlazos/h-comp
2025-12-04T09:33:41.8627688Z  * [new branch]              mlazos/h-comp2              -> origin/mlazos/h-comp2
2025-12-04T09:33:41.8629171Z  * [new branch]              mlazos/hash-hop             -> origin/mlazos/hash-hop
2025-12-04T09:33:41.8630718Z  * [new branch]              mlazos/hc                   -> origin/mlazos/hc
2025-12-04T09:33:41.8632284Z  * [new branch]              mlazos/hc-cycles            -> origin/mlazos/hc-cycles
2025-12-04T09:33:41.8633781Z  * [new branch]              mlazos/hc-fixes             -> origin/mlazos/hc-fixes
2025-12-04T09:33:41.8635233Z  * [new branch]              mlazos/hc-fixes3            -> origin/mlazos/hc-fixes3
2025-12-04T09:33:41.8636701Z  * [new branch]              mlazos/hc-fixes4            -> origin/mlazos/hc-fixes4
2025-12-04T09:33:41.8638274Z  * [new branch]              mlazos/hc-hf                -> origin/mlazos/hc-hf
2025-12-04T09:33:41.8639686Z  * [new branch]              mlazos/hc-mut               -> origin/mlazos/hc-mut
2025-12-04T09:33:41.8641166Z  * [new branch]              mlazos/hc10                 -> origin/mlazos/hc10
2025-12-04T09:33:41.8642733Z  * [new branch]              mlazos/hc11                 -> origin/mlazos/hc11
2025-12-04T09:33:41.8644214Z  * [new branch]              mlazos/hc12                 -> origin/mlazos/hc12
2025-12-04T09:33:41.8645586Z  * [new branch]              mlazos/hc13                 -> origin/mlazos/hc13
2025-12-04T09:33:41.8647025Z  * [new branch]              mlazos/hc14                 -> origin/mlazos/hc14
2025-12-04T09:33:41.8648442Z  * [new branch]              mlazos/hc15                 -> origin/mlazos/hc15
2025-12-04T09:33:41.8649961Z  * [new branch]              mlazos/hc2                  -> origin/mlazos/hc2
2025-12-04T09:33:41.8651384Z  * [new branch]              mlazos/hc4                  -> origin/mlazos/hc4
2025-12-04T09:33:41.8652810Z  * [new branch]              mlazos/hc5                  -> origin/mlazos/hc5
2025-12-04T09:33:41.8654257Z  * [new branch]              mlazos/hc6                  -> origin/mlazos/hc6
2025-12-04T09:33:41.8655795Z  * [new branch]              mlazos/hc7                  -> origin/mlazos/hc7
2025-12-04T09:33:41.8657280Z  * [new branch]              mlazos/hc8                  -> origin/mlazos/hc8
2025-12-04T09:33:41.8658674Z  * [new branch]              mlazos/hc9                  -> origin/mlazos/hc9
2025-12-04T09:33:41.8660178Z  * [new branch]              mlazos/hc_baseline2         -> origin/mlazos/hc_baseline2
2025-12-04T09:33:41.8661564Z  * [new branch]              mlazos/inductor-streams     -> origin/mlazos/inductor-streams
2025-12-04T09:33:41.8662796Z  * [new branch]              mlazos/main                 -> origin/mlazos/main
2025-12-04T09:33:41.8664313Z  * [new branch]              mlazos/mcg2                 -> origin/mlazos/mcg2
2025-12-04T09:33:41.8665947Z  * [new branch]              mlazos/meta-guards          -> origin/mlazos/meta-guards
2025-12-04T09:33:41.8668080Z  * [new branch]              mlazos/mlazos/foreach-map-adam -> origin/mlazos/mlazos/foreach-map-adam
2025-12-04T09:33:41.8669620Z  * [new branch]              mlazos/mlazos/tf-mode-backup -> origin/mlazos/mlazos/tf-mode-backup
2025-12-04T09:33:41.8671210Z  * [new branch]              mlazos/mod-fix              -> origin/mlazos/mod-fix
2025-12-04T09:33:41.8676007Z  * [new branch]              mlazos/mode-fix             -> origin/mlazos/mode-fix
2025-12-04T09:33:41.8677536Z  * [new branch]              mlazos/offsets              -> origin/mlazos/offsets
2025-12-04T09:33:41.8678960Z  * [new branch]              mlazos/overguarding         -> origin/mlazos/overguarding
2025-12-04T09:33:41.8680415Z  * [new branch]              mlazos/proxy-ctors          -> origin/mlazos/proxy-ctors
2025-12-04T09:33:41.8681897Z  * [new branch]              mlazos/quant-fix            -> origin/mlazos/quant-fix
2025-12-04T09:33:41.8683361Z  * [new branch]              mlazos/resnet-fix           -> origin/mlazos/resnet-fix
2025-12-04T09:33:41.8684896Z  * [new branch]              mlazos/rm-buf-names         -> origin/mlazos/rm-buf-names
2025-12-04T09:33:41.8686368Z  * [new branch]              mlazos/rm-code              -> origin/mlazos/rm-code
2025-12-04T09:33:41.8687878Z  * [new branch]              mlazos/rm-spam              -> origin/mlazos/rm-spam
2025-12-04T09:33:41.8689461Z  * [new branch]              mlazos/rtp                  -> origin/mlazos/rtp
2025-12-04T09:33:41.8690960Z  * [new branch]              mlazos/static-idx-dbg       -> origin/mlazos/static-idx-dbg
2025-12-04T09:33:41.8692603Z  * [new branch]              mlazos/static-inputs-log    -> origin/mlazos/static-inputs-log
2025-12-04T09:33:41.8693875Z  * [new branch]              mlazos/stests               -> origin/mlazos/stests
2025-12-04T09:33:41.8695400Z  * [new branch]              mlazos/stream-ops           -> origin/mlazos/stream-ops
2025-12-04T09:33:41.8696986Z  * [new branch]              mlazos/td-fix2              -> origin/mlazos/td-fix2
2025-12-04T09:33:41.8698697Z  * [new branch]              mlazos/tensor-hasattr2      -> origin/mlazos/tensor-hasattr2
2025-12-04T09:33:41.8699983Z  * [new branch]              mlazos/test                 -> origin/mlazos/test
2025-12-04T09:33:41.8701518Z  * [new branch]              mlazos/tf-mode              -> origin/mlazos/tf-mode
2025-12-04T09:33:41.8703108Z  * [new branch]              mlazos/tf-mode-backup2      -> origin/mlazos/tf-mode-backup2
2025-12-04T09:33:41.8705086Z  * [new branch]              mlazos/tf-mode-reland       -> origin/mlazos/tf-mode-reland
2025-12-04T09:33:41.8706692Z  * [new branch]              mlazos/tf-mode-reland2      -> origin/mlazos/tf-mode-reland2
2025-12-04T09:33:41.8708195Z  * [new branch]              mlazos/tf-mode-reland3      -> origin/mlazos/tf-mode-reland3
2025-12-04T09:33:41.8709659Z  * [new branch]              mlazos/triton-no-epi        -> origin/mlazos/triton-no-epi
2025-12-04T09:33:41.8711201Z  * [new branch]              mlazos/tune-proto           -> origin/mlazos/tune-proto
2025-12-04T09:33:41.8713108Z  * [new branch]              mlazos/tuple-fixes          -> origin/mlazos/tuple-fixes
2025-12-04T09:33:41.8714757Z  * [new branch]              mlazos/tuple-fixes2         -> origin/mlazos/tuple-fixes2
2025-12-04T09:33:41.8716172Z  * [new branch]              mlazos/tuple-handling       -> origin/mlazos/tuple-handling
2025-12-04T09:33:41.8717862Z  * [new branch]              mlazos/user-stream-base     -> origin/mlazos/user-stream-base
2025-12-04T09:33:41.8719400Z  * [new branch]              mlazos/user-streams         -> origin/mlazos/user-streams
2025-12-04T09:33:41.8720870Z  * [new branch]              mlazos/user-streams-backup  -> origin/mlazos/user-streams-backup
2025-12-04T09:33:41.8722433Z  * [new branch]              mlazos/user-streams-backup2 -> origin/mlazos/user-streams-backup2
2025-12-04T09:33:41.8723874Z  * [new branch]              mlazos/vary-beta            -> origin/mlazos/vary-beta
2025-12-04T09:33:41.8725436Z  * [new branch]              mlazos/vary-beta2           -> origin/mlazos/vary-beta2
2025-12-04T09:33:41.8726925Z  * [new branch]              mlazos/weird-perf1          -> origin/mlazos/weird-perf1
2025-12-04T09:33:41.8728638Z  * [new branch]              mm_out_dtype_compile        -> origin/mm_out_dtype_compile
2025-12-04T09:33:41.8730219Z  * [new branch]              module-shim                 -> origin/module-shim
2025-12-04T09:33:41.8731852Z  * [new branch]              move_config                 -> origin/move_config
2025-12-04T09:33:41.8733930Z  * [new branch]              msaroufim/reduce            -> origin/msaroufim/reduce
2025-12-04T09:33:41.8735837Z  * [new branch]              mtia/basic-cmake            -> origin/mtia/basic-cmake
2025-12-04T09:33:41.8738104Z  * [new branch]              mwizak/fix-triton-block-shape -> origin/mwizak/fix-triton-block-shape
2025-12-04T09:33:41.8739854Z  * [new branch]              my_varlen_backup            -> origin/my_varlen_backup
2025-12-04T09:33:41.8741326Z  * [new branch]              nativert_num_outputs        -> origin/nativert_num_outputs
2025-12-04T09:33:41.8742857Z  * [new branch]              new-codegen                 -> origin/new-codegen
2025-12-04T09:33:41.8744492Z  * [new branch]              newtest-base                -> origin/newtest-base
2025-12-04T09:33:41.8746508Z  * [new branch]              ngimel/addmm_dtype          -> origin/ngimel/addmm_dtype
2025-12-04T09:33:41.8747880Z  * [new branch]              ngimel/div_inv              -> origin/ngimel/div_inv
2025-12-04T09:33:41.8749310Z  * [new branch]              ngimel/error_index_list     -> origin/ngimel/error_index_list
2025-12-04T09:33:41.8750726Z  * [new branch]              ngimel/gather_grid          -> origin/ngimel/gather_grid
2025-12-04T09:33:41.8752250Z  * [new branch]              ngimel/gather_grid_release  -> origin/ngimel/gather_grid_release
2025-12-04T09:33:41.8753543Z  * [new branch]              ngimel/gg_new               -> origin/ngimel/gg_new
2025-12-04T09:33:41.8754983Z  * [new branch]              ngimel/hostalloc            -> origin/ngimel/hostalloc
2025-12-04T09:33:41.8756398Z  * [new branch]              ngimel/storage_id           -> origin/ngimel/storage_id
2025-12-04T09:33:41.8757970Z  * [new branch]              nightly                     -> origin/nightly
2025-12-04T09:33:41.8760836Z  * [new branch]              nikitaved/addmm_1_rowcol_lt_path_check -> origin/nikitaved/addmm_1_rowcol_lt_path_check
2025-12-04T09:33:41.8762549Z  * [new branch]              nikitaved/addmm_epilogue_fusions_2d_bias -> origin/nikitaved/addmm_epilogue_fusions_2d_bias
2025-12-04T09:33:41.8764051Z  * [new branch]              nikitaved/addmm_epilogue_fusions_inductor -> origin/nikitaved/addmm_epilogue_fusions_inductor
2025-12-04T09:33:41.8765843Z  * [new branch]              nikitaved/addmm_epilogue_fusions_scratch -> origin/nikitaved/addmm_epilogue_fusions_scratch
2025-12-04T09:33:41.8767664Z  * [new branch]              nikitaved/grad_addmm_epilogue_fusions -> origin/nikitaved/grad_addmm_epilogue_fusions
2025-12-04T09:33:41.8769576Z  * [new branch]              nikitaved/simpler_can_use_32bit_index -> origin/nikitaved/simpler_can_use_32bit_index
2025-12-04T09:33:41.8771283Z  * [new branch]              nikitaved/test              -> origin/nikitaved/test
2025-12-04T09:33:41.8773276Z  * [new branch]              nmacchioni-perf-test-async-autotune -> origin/nmacchioni-perf-test-async-autotune
2025-12-04T09:33:41.8774719Z  * [new branch]              no_distributed_log_spew     -> origin/no_distributed_log_spew
2025-12-04T09:33:41.8776372Z  * [new branch]              nofun-hack                  -> origin/nofun-hack
2025-12-04T09:33:41.8777944Z  * [new branch]              norm_bench                  -> origin/norm_bench
2025-12-04T09:33:41.8779982Z  * [new branch]              nullplay/fuse_matmul        -> origin/nullplay/fuse_matmul
2025-12-04T09:33:41.8781539Z  * [new branch]              nullplay_fuse_matmul        -> origin/nullplay_fuse_matmul
2025-12-04T09:33:41.8783081Z  * [new branch]              optimizer_test              -> origin/optimizer_test
2025-12-04T09:33:41.8785712Z  * [new branch]              orig/release/1.10           -> origin/orig/release/1.10
2025-12-04T09:33:41.8787325Z  * [new branch]              orig/release/1.11           -> origin/orig/release/1.11
2025-12-04T09:33:41.8788854Z  * [new branch]              orig/release/1.12           -> origin/orig/release/1.12
2025-12-04T09:33:41.8790611Z  * [new branch]              orig/release/1.13           -> origin/orig/release/1.13
2025-12-04T09:33:41.8792222Z  * [new branch]              orig/release/1.6            -> origin/orig/release/1.6
2025-12-04T09:33:41.8794418Z  * [new branch]              orig/release/1.7            -> origin/orig/release/1.7
2025-12-04T09:33:41.8795990Z  * [new branch]              orig/release/1.8            -> origin/orig/release/1.8
2025-12-04T09:33:41.8797579Z  * [new branch]              orig/release/1.9            -> origin/orig/release/1.9
2025-12-04T09:33:41.8799153Z  * [new branch]              orig/release/2.0            -> origin/orig/release/2.0
2025-12-04T09:33:41.8800600Z  * [new branch]              orig/release/2.1            -> origin/orig/release/2.1
2025-12-04T09:33:41.8802164Z  * [new branch]              orig/release/2.2            -> origin/orig/release/2.2
2025-12-04T09:33:41.8803602Z  * [new branch]              orig/release/2.3            -> origin/orig/release/2.3
2025-12-04T09:33:41.8805069Z  * [new branch]              orig/release/2.4            -> origin/orig/release/2.4
2025-12-04T09:33:41.8806478Z  * [new branch]              orig/release/2.5            -> origin/orig/release/2.5
2025-12-04T09:33:41.8807984Z  * [new branch]              orig/release/2.6            -> origin/orig/release/2.6
2025-12-04T09:33:41.8809785Z  * [new branch]              orig/release/2.7            -> origin/orig/release/2.7
2025-12-04T09:33:41.8811946Z  * [new branch]              orig/release/2.8            -> origin/orig/release/2.8
2025-12-04T09:33:41.8813369Z  * [new branch]              orig/release/2.9            -> origin/orig/release/2.9
2025-12-04T09:33:41.8816822Z  * [new branch]              origin/gh/fxdawnn/1/base    -> origin/origin/gh/fxdawnn/1/base
2025-12-04T09:33:41.8818226Z  * [new branch]              origin/gh/fxdawnn/1/orig    -> origin/origin/gh/fxdawnn/1/orig
2025-12-04T09:33:41.8820760Z  * [new branch]              origin/gh/zpcore/14/orig    -> origin/origin/gh/zpcore/14/orig
2025-12-04T09:33:41.8822444Z  * [new branch]              oulgen-patch-1              -> origin/oulgen-patch-1
2025-12-04T09:33:41.8824109Z  * [new branch]              oulgen-patch-2              -> origin/oulgen-patch-2
2025-12-04T09:33:41.8825724Z  * [new branch]              oulgen-patch-3              -> origin/oulgen-patch-3
2025-12-04T09:33:41.8827554Z  * [new branch]              oulgen-patch-4              -> origin/oulgen-patch-4
2025-12-04T09:33:41.8829534Z  * [new branch]              padded-tensor               -> origin/padded-tensor
2025-12-04T09:33:41.8831244Z  * [new branch]              pca2                        -> origin/pca2
2025-12-04T09:33:41.8832962Z  * [new branch]              per_channel_backup          -> origin/per_channel_backup
2025-12-04T09:33:41.8834577Z  * [new branch]              perf_ops                    -> origin/perf_ops
2025-12-04T09:33:41.8836084Z  * [new branch]              perf_ops_2_9                -> origin/perf_ops_2_9
2025-12-04T09:33:41.8837792Z  * [new branch]              pianpwk-patch-1             -> origin/pianpwk-patch-1
2025-12-04T09:33:41.8839849Z  * [new branch]              pianpwk/__draft_debug_mode  -> origin/pianpwk/__draft_debug_mode
2025-12-04T09:33:41.8841447Z  * [new branch]              pianpwk/_debug_mode_for_triton_draft -> origin/pianpwk/_debug_mode_for_triton_draft
2025-12-04T09:33:41.8842800Z  * [new branch]              pianpwk/_debug_nn_module_compile -> origin/pianpwk/_debug_nn_module_compile
2025-12-04T09:33:41.8844167Z  * [new branch]              pianpwk/_draft_triton_11_3  -> origin/pianpwk/_draft_triton_11_3
2025-12-04T09:33:41.8845608Z  * [new branch]              pianpwk/_manual_bucket_draft -> origin/pianpwk/_manual_bucket_draft
2025-12-04T09:33:41.8847395Z  * [new branch]              pianpwk/_profile_w_dispatch_keys -> origin/pianpwk/_profile_w_dispatch_keys
2025-12-04T09:33:41.8849194Z  * [new branch]              pianpwk/_super_draft_debug_mode -> origin/pianpwk/_super_draft_debug_mode
2025-12-04T09:33:41.8850857Z  * [new branch]              pianpwk/_unbacked_local_shard_size -> origin/pianpwk/_unbacked_local_shard_size
2025-12-04T09:33:41.8852296Z  * [new branch]              pianpwk/anomaly_tb          -> origin/pianpwk/anomaly_tb
2025-12-04T09:33:41.8853723Z  * [new branch]              pianpwk/auto_fx_annotate    -> origin/pianpwk/auto_fx_annotate
2025-12-04T09:33:41.8855434Z  * [new branch]              pianpwk/backed_size_oblivious_export -> origin/pianpwk/backed_size_oblivious_export
2025-12-04T09:33:41.8856935Z  * [new branch]              pianpwk/bert_dynamic_perf   -> origin/pianpwk/bert_dynamic_perf
2025-12-04T09:33:41.8858583Z  * [new branch]              pianpwk/debug_fwd_stack_traces -> origin/pianpwk/debug_fwd_stack_traces
2025-12-04T09:33:41.8860110Z  * [new branch]              pianpwk/debug_hash_tensor   -> origin/pianpwk/debug_hash_tensor
2025-12-04T09:33:41.8861666Z  * [new branch]              pianpwk/debug_mode_annotate -> origin/pianpwk/debug_mode_annotate
2025-12-04T09:33:41.8863050Z  * [new branch]              pianpwk/debug_mode_defaults -> origin/pianpwk/debug_mode_defaults
2025-12-04T09:33:41.8864557Z  * [new branch]              pianpwk/debug_mode_hacks    -> origin/pianpwk/debug_mode_hacks
2025-12-04T09:33:41.8866072Z  * [new branch]              pianpwk/debug_mode_opcall_refactor -> origin/pianpwk/debug_mode_opcall_refactor
2025-12-04T09:33:41.8867486Z  * [new branch]              pianpwk/debug_mode_show_ids -> origin/pianpwk/debug_mode_show_ids
2025-12-04T09:33:41.8869581Z  * [new branch]              pianpwk/debug_mode_triton   -> origin/pianpwk/debug_mode_triton
2025-12-04T09:33:41.8871431Z  * [new branch]              pianpwk/debug_show_stack_trace -> origin/pianpwk/debug_show_stack_trace
2025-12-04T09:33:41.8872956Z  * [new branch]              pianpwk/debug_wait_on_collective -> origin/pianpwk/debug_wait_on_collective
2025-12-04T09:33:41.8874471Z  * [new branch]              pianpwk/debugmode_compile_tf -> origin/pianpwk/debugmode_compile_tf
2025-12-04T09:33:41.8876108Z  * [new branch]              pianpwk/dispatch_key_debugging_for_debug -> origin/pianpwk/dispatch_key_debugging_for_debug
2025-12-04T09:33:41.8877551Z  * [new branch]              pianpwk/draft_debug_mode_tfcompile -> origin/pianpwk/draft_debug_mode_tfcompile
2025-12-04T09:33:41.8878945Z  * [new branch]              pianpwk/draft_multikernel_nn -> origin/pianpwk/draft_multikernel_nn
2025-12-04T09:33:41.8880646Z  * [new branch]              pianpwk/draft_multikernel_status_10_5 -> origin/pianpwk/draft_multikernel_status_10_5
2025-12-04T09:33:41.8882196Z  * [new branch]              pianpwk/dtensor_custom_chunk -> origin/pianpwk/dtensor_custom_chunk
2025-12-04T09:33:41.8883890Z  * [new branch]              pianpwk/dtensor_unbacked_keypath -> origin/pianpwk/dtensor_unbacked_keypath
2025-12-04T09:33:41.8885466Z  * [new branch]              pianpwk/event_list_tree     -> origin/pianpwk/event_list_tree
2025-12-04T09:33:41.8886930Z  * [new branch]              pianpwk/false_numel_refs    -> origin/pianpwk/false_numel_refs
2025-12-04T09:33:41.8888419Z  * [new branch]              pianpwk/maybe_guard_rel     -> origin/pianpwk/maybe_guard_rel
2025-12-04T09:33:41.8889949Z  * [new branch]              pianpwk/multikernel_hints_draft -> origin/pianpwk/multikernel_hints_draft
2025-12-04T09:33:41.8891579Z  * [new branch]              pianpwk/no_size_oblivious_slice_scat -> origin/pianpwk/no_size_oblivious_slice_scat
2025-12-04T09:33:41.8893071Z  * [new branch]              pianpwk/oblivious_reshape_view_better -> origin/pianpwk/oblivious_reshape_view_better
2025-12-04T09:33:41.8894405Z  * [new branch]              pianpwk/pre_forward_hook    -> origin/pianpwk/pre_forward_hook
2025-12-04T09:33:41.8895916Z  * [new branch]              pianpwk/skip_python_keys_alternate -> origin/pianpwk/skip_python_keys_alternate
2025-12-04T09:33:41.8897512Z  * [new branch]              pianpwk/skip_python_keys_in_guards -> origin/pianpwk/skip_python_keys_in_guards
2025-12-04T09:33:41.8898897Z  * [new branch]              pianpwk/sym_tokens_draft    -> origin/pianpwk/sym_tokens_draft
2025-12-04T09:33:41.8900378Z  * [new branch]              pianpwk/symint_one_hot      -> origin/pianpwk/symint_one_hot
2025-12-04T09:33:41.8902135Z  * [new branch]              pianpwk/test_pointwise_guard_or_false -> origin/pianpwk/test_pointwise_guard_or_false
2025-12-04T09:33:41.8903510Z  * [new branch]              pianpwk/totally_draft_sym_wrap -> origin/pianpwk/totally_draft_sym_wrap
2025-12-04T09:33:41.8905548Z  * [new branch]              pianpwk/try_dumb_stuff      -> origin/pianpwk/try_dumb_stuff
2025-12-04T09:33:41.8907062Z  * [new branch]              pianpwk/try_dumb_stuff_2    -> origin/pianpwk/try_dumb_stuff_2
2025-12-04T09:33:41.8908569Z  * [new branch]              pianpwk/unbacked_dtensor_mm -> origin/pianpwk/unbacked_dtensor_mm
2025-12-04T09:33:41.8910105Z  * [new branch]              pianpwk/unbacked_tracing_12_2 -> origin/pianpwk/unbacked_tracing_12_2
2025-12-04T09:33:41.8911502Z  * [new branch]              pianpwk/user_symints        -> origin/pianpwk/user_symints
2025-12-04T09:33:41.8913079Z  * [new branch]              pianpwk/wan21_reshape       -> origin/pianpwk/wan21_reshape
2025-12-04T09:33:41.8915174Z  * [new branch]              piz/fix_partial_backward_1112 -> origin/piz/fix_partial_backward_1112
2025-12-04T09:33:41.8916549Z  * [new branch]              piz/prop_cache_clean        -> origin/piz/prop_cache_clean
2025-12-04T09:33:41.8918128Z  * [new branch]              pool-separate               -> origin/pool-separate
2025-12-04T09:33:41.8919684Z  * [new branch]              pr-156087                   -> origin/pr-156087
2025-12-04T09:33:41.8922795Z  * [new branch]              pr/131860                   -> origin/pr/131860
2025-12-04T09:33:41.8923682Z  * [new branch]              predispatch_to              -> origin/predispatch_to
2025-12-04T09:33:41.8925092Z  * [new branch]              protect-c17                 -> origin/protect-c17
2025-12-04T09:33:41.8926526Z  * [new branch]              pt-opt-cuda3                -> origin/pt-opt-cuda3
2025-12-04T09:33:41.8928824Z  * [new branch]              python_compiled_autograd    -> origin/python_compiled_autograd
2025-12-04T09:33:41.8931084Z  * [new branch]              q1l1/fix_device_moved_constant_type_unknown -> origin/q1l1/fix_device_moved_constant_type_unknown
2025-12-04T09:33:41.8932577Z  * [new branch]              q1l1/fix_wrong_default_type_for_kernel_call_args -> origin/q1l1/fix_wrong_default_type_for_kernel_call_args
2025-12-04T09:33:41.8934997Z  * [new branch]              qchip/export-D54134695      -> origin/qchip/export-D54134695
2025-12-04T09:33:41.8936738Z  * [new branch]              quote-pytest_cache          -> origin/quote-pytest_cache
2025-12-04T09:33:41.8938817Z  * [new branch]              reland-accgrad-stream-warn  -> origin/reland-accgrad-stream-warn
2025-12-04T09:33:41.8940980Z  * [new branch]              release/1.10                -> origin/release/1.10
2025-12-04T09:33:41.8942505Z  * [new branch]              release/1.11                -> origin/release/1.11
2025-12-04T09:33:41.8944024Z  * [new branch]              release/1.12                -> origin/release/1.12
2025-12-04T09:33:41.8945543Z  * [new branch]              release/1.13                -> origin/release/1.13
2025-12-04T09:33:41.8947011Z  * [new branch]              release/1.4                 -> origin/release/1.4
2025-12-04T09:33:41.8948251Z  * [new branch]              release/1.4.1               -> origin/release/1.4.1
2025-12-04T09:33:41.8949750Z  * [new branch]              release/1.5                 -> origin/release/1.5
2025-12-04T09:33:41.8951407Z  * [new branch]              release/1.6                 -> origin/release/1.6
2025-12-04T09:33:41.8952939Z  * [new branch]              release/1.7                 -> origin/release/1.7
2025-12-04T09:33:41.8954640Z  * [new branch]              release/1.8                 -> origin/release/1.8
2025-12-04T09:33:41.8956026Z  * [new branch]              release/1.9                 -> origin/release/1.9
2025-12-04T09:33:41.8957545Z  * [new branch]              release/2.0                 -> origin/release/2.0
2025-12-04T09:33:41.8959292Z  * [new branch]              release/2.1                 -> origin/release/2.1
2025-12-04T09:33:41.8960897Z  * [new branch]              release/2.2                 -> origin/release/2.2
2025-12-04T09:33:41.8962702Z  * [new branch]              release/2.3                 -> origin/release/2.3
2025-12-04T09:33:41.8964807Z  * [new branch]              release/2.4                 -> origin/release/2.4
2025-12-04T09:33:41.8966759Z  * [new branch]              release/2.5                 -> origin/release/2.5
2025-12-04T09:33:41.8968507Z  * [new branch]              release/2.6                 -> origin/release/2.6
2025-12-04T09:33:41.8970092Z  * [new branch]              release/2.7                 -> origin/release/2.7
2025-12-04T09:33:41.8971889Z  * [new branch]              release/2.8                 -> origin/release/2.8
2025-12-04T09:33:41.8973541Z  * [new branch]              release/2.9                 -> origin/release/2.9
2025-12-04T09:33:41.8975147Z  * [new branch]              release_notes               -> origin/release_notes
2025-12-04T09:33:41.8976865Z  * [new branch]              remove_pyinterpreter        -> origin/remove_pyinterpreter
2025-12-04T09:33:41.8978942Z  * [new branch]              replace-pytorch-labs-20250812-195836 -> origin/replace-pytorch-labs-20250812-195836
2025-12-04T09:33:41.8980301Z  * [new branch]              replace-pytorch-labs-20250812-200248 -> origin/replace-pytorch-labs-20250812-200248
2025-12-04T09:33:41.8981696Z  * [new branch]              replace-pytorch-labs-20250812-200324 -> origin/replace-pytorch-labs-20250812-200324
2025-12-04T09:33:41.8983181Z  * [new branch]              replace-pytorch-labs-20250812-204020 -> origin/replace-pytorch-labs-20250812-204020
2025-12-04T09:33:41.8986132Z  * [new branch]              revert-131069-gh/krzysztofjordan/1/head -> origin/revert-131069-gh/krzysztofjordan/1/head
2025-12-04T09:33:41.8989017Z  * [new branch]              revert-131469-gh/andrewor14/51/head -> origin/revert-131469-gh/andrewor14/51/head
2025-12-04T09:33:41.8992334Z  * [new branch]              revert-152361-gh/fadara01/1/head -> origin/revert-152361-gh/fadara01/1/head
2025-12-04T09:33:41.8995767Z  * [new branch]              revert-156870-gh/skarjala/3/head -> origin/revert-156870-gh/skarjala/3/head
2025-12-04T09:33:41.8997785Z  * [new branch]              revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ -> origin/revert-157914-cherry-pick-157503-by-pytorch_bot_bot_
2025-12-04T09:33:41.8999201Z  * [new branch]              revert-hoo-invoke-subgraph  -> origin/revert-hoo-invoke-subgraph
2025-12-04T09:33:41.9000794Z  * [new branch]              revert_always_build_distributed -> origin/revert_always_build_distributed
2025-12-04T09:33:41.9002265Z  * [new branch]              rms_norm_patch              -> origin/rms_norm_patch
2025-12-04T09:33:41.9004519Z  * [new branch]              ruisi/fix_all_to_all_estimation -> origin/ruisi/fix_all_to_all_estimation
2025-12-04T09:33:41.9006248Z  * [new branch]              ruisi/fix_comm_estimation   -> origin/ruisi/fix_comm_estimation
2025-12-04T09:33:41.9007768Z  * [new branch]              ruisi/fix_dynamic_shape_estimation -> origin/ruisi/fix_dynamic_shape_estimation
2025-12-04T09:33:41.9009204Z  * [new branch]              ruisi/fix_llama3_autobucketing -> origin/ruisi/fix_llama3_autobucketing
2025-12-04T09:33:41.9010968Z  * [new branch]              ruisi/fix_manual_bucketing_ep_pass -> origin/ruisi/fix_manual_bucketing_ep_pass
2025-12-04T09:33:41.9012796Z  * [new branch]              ruisi/manual_bucket_pass    -> origin/ruisi/manual_bucket_pass
2025-12-04T09:33:41.9015176Z  * [new branch]              ryanguo99/cleanup-dynamo-expected-failures -> origin/ryanguo99/cleanup-dynamo-expected-failures
2025-12-04T09:33:41.9016475Z  * [new branch]              ryanguo99/fix-closure-var   -> origin/ryanguo99/fix-closure-var
2025-12-04T09:33:41.9018620Z  * [new branch]              rzou/faketensor_bench       -> origin/rzou/faketensor_bench
2025-12-04T09:33:41.9020027Z  * [new branch]              rzou/njt                    -> origin/rzou/njt
2025-12-04T09:33:41.9021517Z  * [new branch]              rzou/pca                    -> origin/rzou/pca
2025-12-04T09:33:41.9022950Z  * [new branch]              rzou/realprop               -> origin/rzou/realprop
2025-12-04T09:33:41.9024577Z  * [new branch]              samplevllm                  -> origin/samplevllm
2025-12-04T09:33:41.9027173Z  * [new branch]              sanchitintel/weird_thing_with_test_cpu_select_algorithm -> origin/sanchitintel/weird_thing_with_test_cpu_select_algorithm
2025-12-04T09:33:41.9028568Z  * [new branch]              sapling-pr-archive-SS-JIA   -> origin/sapling-pr-archive-SS-JIA
2025-12-04T09:33:41.9030308Z  * [new branch]              sapling-pr-archive-tushar00jain -> origin/sapling-pr-archive-tushar00jain
2025-12-04T09:33:41.9032270Z  * [new branch]              save                        -> origin/save
2025-12-04T09:33:41.9033879Z  * [new branch]              scaled_mm                   -> origin/scaled_mm
2025-12-04T09:33:41.9035468Z  * [new branch]              scan_attempt                -> origin/scan_attempt
2025-12-04T09:33:41.9037627Z  * [new branch]              sdym/2.5.1                  -> origin/sdym/2.5.1
2025-12-04T09:33:41.9039306Z  * [new branch]              sekyondaMeta-dynamoconfig-fix -> origin/sekyondaMeta-dynamoconfig-fix
2025-12-04T09:33:41.9041164Z  * [new branch]              shengf/fx-xform-perf        -> origin/shengf/fx-xform-perf
2025-12-04T09:33:41.9042827Z  * [new branch]              shoumikhin-patch-1          -> origin/shoumikhin-patch-1
2025-12-04T09:33:41.9044435Z  * [new branch]              solve-accuracy-fix          -> origin/solve-accuracy-fix
2025-12-04T09:33:41.9045970Z  * [new branch]              some_rocm_inductor_skips    -> origin/some_rocm_inductor_skips
2025-12-04T09:33:41.9047989Z  * [new branch]              soulitzer/stash-tls-ac      -> origin/soulitzer/stash-tls-ac
2025-12-04T09:33:41.9049627Z  * [new branch]              sparse-mm-bf16-support      -> origin/sparse-mm-bf16-support
2025-12-04T09:33:41.9051700Z  * [new branch]              starterTaskUpdate           -> origin/starterTaskUpdate
2025-12-04T09:33:41.9053283Z  * [new branch]              suo                         -> origin/suo
2025-12-04T09:33:41.9054868Z  * [new branch]              sve-poc                     -> origin/sve-poc
2025-12-04T09:33:41.9056626Z  * [new branch]              switch-bn                   -> origin/switch-bn
2025-12-04T09:33:41.9058325Z  * [new branch]              sy_annotation_in_autograd_hop -> origin/sy_annotation_in_autograd_hop
2025-12-04T09:33:41.9059822Z  * [new branch]              sy_aot_eager_record         -> origin/sy_aot_eager_record
2025-12-04T09:33:41.9061507Z  * [new branch]              sy_custom_bucketing         -> origin/sy_custom_bucketing
2025-12-04T09:33:41.9063301Z  * [new branch]              sy_debug_mode_test          -> origin/sy_debug_mode_test
2025-12-04T09:33:41.9064772Z  * [new branch]              sy_deserialize              -> origin/sy_deserialize
2025-12-04T09:33:41.9066253Z  * [new branch]              sy_dump_gm_code             -> origin/sy_dump_gm_code
2025-12-04T09:33:41.9067781Z  * [new branch]              sy_exp                      -> origin/sy_exp
2025-12-04T09:33:41.9069376Z  * [new branch]              sy_export_annotation        -> origin/sy_export_annotation
2025-12-04T09:33:41.9071139Z  * [new branch]              sy_invoke_subgraph          -> origin/sy_invoke_subgraph
2025-12-04T09:33:41.9072712Z  * [new branch]              sy_kernel_bw_name           -> origin/sy_kernel_bw_name
2025-12-04T09:33:41.9074243Z  * [new branch]              sy_multi_arch               -> origin/sy_multi_arch
2025-12-04T09:33:41.9075858Z  * [new branch]              sy_nn_module_stack          -> origin/sy_nn_module_stack
2025-12-04T09:33:41.9077458Z  * [new branch]              sy_original_dtensor         -> origin/sy_original_dtensor
2025-12-04T09:33:41.9079018Z  * [new branch]              sy_profiler_cia             -> origin/sy_profiler_cia
2025-12-04T09:33:41.9080551Z  * [new branch]              symm_mem_sync               -> origin/symm_mem_sync
2025-12-04T09:33:41.9082196Z  * [new branch]              sympy-bottleneck-repro      -> origin/sympy-bottleneck-repro
2025-12-04T09:33:41.9083800Z  * [new branch]              tensordict_integration      -> origin/tensordict_integration
2025-12-04T09:33:41.9085548Z  * [new branch]              test-move-conda-builds      -> origin/test-move-conda-builds
2025-12-04T09:33:41.9087117Z  * [new branch]              test-old                    -> origin/test-old
2025-12-04T09:33:41.9089619Z  * [new branch]              test/bmm_heur               -> origin/test/bmm_heur
2025-12-04T09:33:41.9091780Z  * [new branch]              tianren/customOp_autotune_fix -> origin/tianren/customOp_autotune_fix
2025-12-04T09:33:41.9093349Z  * [new branch]              tianren/customOp_enable_max_autotune -> origin/tianren/customOp_enable_max_autotune
2025-12-04T09:33:41.9094692Z  * [new branch]              tianren/customOp_fusion     -> origin/tianren/customOp_fusion
2025-12-04T09:33:41.9096380Z  * [new branch]              tianren/customop_collectiveop_benchmark -> origin/tianren/customop_collectiveop_benchmark
2025-12-04T09:33:41.9098276Z  * [new branch]              tianren/customop_collectiveop_benchmark_fix -> origin/tianren/customop_collectiveop_benchmark_fix
2025-12-04T09:33:41.9100410Z  * [new branch]              tianren/customop_dynamic_config -> origin/tianren/customop_dynamic_config
2025-12-04T09:33:41.9101931Z  * [new branch]              tianren/dynamic_range_input -> origin/tianren/dynamic_range_input
2025-12-04T09:33:41.9103554Z  * [new branch]              tianren/dynamic_range_input_fix -> origin/tianren/dynamic_range_input_fix
2025-12-04T09:33:41.9105035Z  * [new branch]              tianren/dynamic_range_input_merge -> origin/tianren/dynamic_range_input_merge
2025-12-04T09:33:41.9106489Z  * [new branch]              tianren/flex_paged_attn_fix_temp -> origin/tianren/flex_paged_attn_fix_temp
2025-12-04T09:33:41.9108038Z  * [new branch]              tianren/fx_codegen_dump     -> origin/tianren/fx_codegen_dump
2025-12-04T09:33:41.9109508Z  * [new branch]              tianren/symmetric_memory    -> origin/tianren/symmetric_memory
2025-12-04T09:33:41.9111114Z  * [new branch]              tianren/test                -> origin/tianren/test
2025-12-04T09:33:41.9112829Z  * [new branch]              tidy_performance_cyy        -> origin/tidy_performance_cyy
2025-12-04T09:33:41.9114309Z  * [new branch]              tmp                         -> origin/tmp
2025-12-04T09:33:41.9115915Z  * [new branch]              torchtitan_ep               -> origin/torchtitan_ep
2025-12-04T09:33:41.9117530Z  * [new branch]              torchtitan_integration      -> origin/torchtitan_integration
2025-12-04T09:33:41.9119302Z  * [new branch]              trace_fsdp_torchtune_lora   -> origin/trace_fsdp_torchtune_lora
2025-12-04T09:33:41.9120686Z  * [new branch]              traceable_fsdp_unit_tests   -> origin/traceable_fsdp_unit_tests
2025-12-04T09:33:41.9122305Z  * [new branch]              tree_loop_vec_base          -> origin/tree_loop_vec_base
2025-12-04T09:33:41.9123929Z  * [new branch]              triton_kernel               -> origin/triton_kernel
2025-12-04T09:33:41.9125465Z  * [new branch]              tt_pkg_1908                 -> origin/tt_pkg_1908
2025-12-04T09:33:41.9127021Z  * [new branch]              type_dec                    -> origin/type_dec
2025-12-04T09:33:41.9128658Z  * [new branch]              udate-sphinx-dependancies   -> origin/udate-sphinx-dependancies
2025-12-04T09:33:41.9130842Z  * [new branch]              update-audio-commit-hash/17630256502-1803-1 -> origin/update-audio-commit-hash/17630256502-1803-1
2025-12-04T09:33:41.9132302Z  * [new branch]              update-audio-commit-hash/19087141161-1916-1 -> origin/update-audio-commit-hash/19087141161-1916-1
2025-12-04T09:33:41.9133755Z  * [new branch]              update-audio-commit-hash/19250643381-1929-1 -> origin/update-audio-commit-hash/19250643381-1929-1
2025-12-04T09:33:41.9135346Z  * [new branch]              update-audio-commit-hash/19397724337-1935-1 -> origin/update-audio-commit-hash/19397724337-1935-1
2025-12-04T09:33:41.9136824Z  * [new branch]              update-audio-commit-hash/19555670148-1941-1 -> origin/update-audio-commit-hash/19555670148-1941-1
2025-12-04T09:33:41.9138727Z  * [new branch]              update-audio-commit-hash/19750627930-1946-1 -> origin/update-audio-commit-hash/19750627930-1946-1
2025-12-04T09:33:41.9140887Z  * [new branch]              update-triton-commit-hash/13663274526-1487-2 -> origin/update-triton-commit-hash/13663274526-1487-2
2025-12-04T09:33:41.9142871Z  * [new branch]              update-vision-commit-hash/19087141161-1916-1 -> origin/update-vision-commit-hash/19087141161-1916-1
2025-12-04T09:33:41.9144360Z  * [new branch]              update-vision-commit-hash/19184897099-1925-1 -> origin/update-vision-commit-hash/19184897099-1925-1
2025-12-04T09:33:41.9145671Z  * [new branch]              update-vision-commit-hash/19250643381-1929-1 -> origin/update-vision-commit-hash/19250643381-1929-1
2025-12-04T09:33:41.9147178Z  * [new branch]              update-vision-commit-hash/19381328640-1934-1 -> origin/update-vision-commit-hash/19381328640-1934-1
2025-12-04T09:33:41.9148569Z  * [new branch]              update-vision-commit-hash/19485237164-1938-1 -> origin/update-vision-commit-hash/19485237164-1938-1
2025-12-04T09:33:41.9150754Z  * [new branch]              update-vllm-commit-hash/18451675449-1879-1 -> origin/update-vllm-commit-hash/18451675449-1879-1
2025-12-04T09:33:41.9152164Z  * [new branch]              update-vllm-dockerfile      -> origin/update-vllm-dockerfile
2025-12-04T09:33:41.9154372Z  * [new branch]              update-xla-commit-hash/19224287370-211-1 -> origin/update-xla-commit-hash/19224287370-211-1
2025-12-04T09:33:41.9155939Z  * [new branch]              update-xla-commit-hash/19422028566-212-1 -> origin/update-xla-commit-hash/19422028566-212-1
2025-12-04T09:33:41.9157330Z  * [new branch]              update-xla-commit-hash/19626841311-213-1 -> origin/update-xla-commit-hash/19626841311-213-1
2025-12-04T09:33:41.9159104Z  * [new branch]              update_docs_torch_multinomial_issue#125388 -> origin/update_docs_torch_multinomial_issue#125388
2025-12-04T09:33:41.9160474Z  * [new branch]              update_operator_readme      -> origin/update_operator_readme
2025-12-04T09:33:41.9162090Z  * [new branch]              update_slow_tests_1722488736 -> origin/update_slow_tests_1722488736
2025-12-04T09:33:41.9163689Z  * [new branch]              update_slow_tests_1722879173 -> origin/update_slow_tests_1722879173
2025-12-04T09:33:41.9165282Z  * [new branch]              update_slow_tests_1762155677 -> origin/update_slow_tests_1762155677
2025-12-04T09:33:41.9166981Z  * [new branch]              update_slow_tests_1763365283 -> origin/update_slow_tests_1763365283
2025-12-04T09:33:41.9168466Z  * [new branch]              update_submodule_FBGEMM     -> origin/update_submodule_FBGEMM
2025-12-04T09:33:41.9170076Z  * [new branch]              update_submodule_kineto     -> origin/update_submodule_kineto
2025-12-04T09:33:41.9171838Z  * [new branch]              update_submodule_tensorpipe -> origin/update_submodule_tensorpipe
2025-12-04T09:33:41.9173422Z  * [new branch]              upload-tests-for-autorevert -> origin/upload-tests-for-autorevert
2025-12-04T09:33:41.9175049Z  * [new branch]              v0.1.2                      -> origin/v0.1.2
2025-12-04T09:33:41.9176791Z  * [new branch]              v1.0.1                      -> origin/v1.0.1
2025-12-04T09:33:41.9178474Z  * [new branch]              v1.0.3                      -> origin/v1.0.3
2025-12-04T09:33:41.9180342Z  * [new branch]              v1.1.0                      -> origin/v1.1.0
2025-12-04T09:33:41.9182215Z  * [new branch]              v1.2.0                      -> origin/v1.2.0
2025-12-04T09:33:41.9183816Z  * [new branch]              v1.3.0                      -> origin/v1.3.0
2025-12-04T09:33:41.9185460Z  * [new branch]              v1.3.1                      -> origin/v1.3.1
2025-12-04T09:33:41.9187066Z  * [new branch]              validate_fn                 -> origin/validate_fn
2025-12-04T09:33:41.9188841Z  * [new branch]              validations_2.6             -> origin/validations_2.6
2025-12-04T09:33:41.9191293Z  * [new branch]              validations_2.8             -> origin/validations_2.8
2025-12-04T09:33:41.9192902Z  * [new branch]              varlen-api                  -> origin/varlen-api
2025-12-04T09:33:41.9194523Z  * [new branch]              varlen-api-backup           -> origin/varlen-api-backup
2025-12-04T09:33:41.9196074Z  * [new branch]              varlen_batch_invariance     -> origin/varlen_batch_invariance
2025-12-04T09:33:41.9197924Z  * [new branch]              viable/strict               -> origin/viable/strict
2025-12-04T09:33:41.9200731Z  * [new branch]              vishal9-team/dtensor_parallelism_toy -> origin/vishal9-team/dtensor_parallelism_toy
2025-12-04T09:33:41.9202153Z  * [new branch]              vllmbuildci                 -> origin/vllmbuildci
2025-12-04T09:33:41.9203798Z  * [new branch]              vllmpin                     -> origin/vllmpin
2025-12-04T09:33:41.9205581Z  * [new branch]              vscode-recommend-pyrefly    -> origin/vscode-recommend-pyrefly
2025-12-04T09:33:41.9207468Z  * [new branch]              wdvr-patch-1                -> origin/wdvr-patch-1
2025-12-04T09:33:41.9209553Z  * [new branch]              wdvr/iss_145259             -> origin/wdvr/iss_145259
2025-12-04T09:33:41.9211588Z  * [new branch]              whc/pei                     -> origin/whc/pei
2025-12-04T09:33:41.9213029Z  * [new branch]              whc/pp_fix                  -> origin/whc/pp_fix
2025-12-04T09:33:41.9214611Z  * [new branch]              whc/sharding                -> origin/whc/sharding
2025-12-04T09:33:41.9216025Z  * [new branch]              whc/sharding2               -> origin/whc/sharding2
2025-12-04T09:33:41.9217623Z  * [new branch]              whc/uneven                  -> origin/whc/uneven
2025-12-04T09:33:41.9219452Z  * [new branch]              whc/uneven-merge            -> origin/whc/uneven-merge
2025-12-04T09:33:41.9221060Z  * [new branch]              win_warnings                -> origin/win_warnings
2025-12-04T09:33:41.9222592Z  * [new branch]              windows_libtorch_free       -> origin/windows_libtorch_free
2025-12-04T09:33:41.9224693Z  * [new branch]              xmfan-war                   -> origin/xmfan-war
2025-12-04T09:33:41.9226709Z  * [new branch]              xmfan/ca_0516               -> origin/xmfan/ca_0516
2025-12-04T09:33:41.9228225Z  * [new branch]              xmfan/ca_1051b93192         -> origin/xmfan/ca_1051b93192
2025-12-04T09:33:41.9230075Z  * [new branch]              xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 -> origin/xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8
2025-12-04T09:33:41.9230968Z  * [new branch]              xmfan/ca_5a2be192d1         -> origin/xmfan/ca_5a2be192d1
2025-12-04T09:33:41.9232401Z  * [new branch]              xmfan/ca_9d59b516e9         -> origin/xmfan/ca_9d59b516e9
2025-12-04T09:33:41.9233801Z  * [new branch]              xmfan/ca_apr8               -> origin/xmfan/ca_apr8
2025-12-04T09:33:41.9235191Z  * [new branch]              xmfan/ca_base               -> origin/xmfan/ca_base
2025-12-04T09:33:41.9236987Z  * [new branch]              xmfan/ca_dynamic            -> origin/xmfan/ca_dynamic
2025-12-04T09:33:41.9238812Z  * [new branch]              xmfan/ca_fix_dyn            -> origin/xmfan/ca_fix_dyn
2025-12-04T09:33:41.9240362Z  * [new branch]              xmfan/ca_fix_lowering       -> origin/xmfan/ca_fix_lowering
2025-12-04T09:33:41.9241842Z  * [new branch]              xmfan/ca_fix_polyfills      -> origin/xmfan/ca_fix_polyfills
2025-12-04T09:33:41.9243132Z  * [new branch]              xmfan/ca_jan3               -> origin/xmfan/ca_jan3
2025-12-04T09:33:41.9244574Z  * [new branch]              xmfan/ca_jun18              -> origin/xmfan/ca_jun18
2025-12-04T09:33:41.9246087Z  * [new branch]              xmfan/ca_jun24              -> origin/xmfan/ca_jun24
2025-12-04T09:33:41.9247542Z  * [new branch]              xmfan/ca_nested             -> origin/xmfan/ca_nested
2025-12-04T09:33:41.9249017Z  * [new branch]              xmfan/ca_overhead           -> origin/xmfan/ca_overhead
2025-12-04T09:33:41.9250554Z  * [new branch]              xmfan/ca_overhead_0eba7e5451 -> origin/xmfan/ca_overhead_0eba7e5451
2025-12-04T09:33:41.9251948Z  * [new branch]              xmfan/cacu_jun18            -> origin/xmfan/cacu_jun18
2025-12-04T09:33:41.9253549Z  * [new branch]              xmfan/cacu_jun19            -> origin/xmfan/cacu_jun19
2025-12-04T09:33:41.9255478Z  * [new branch]              xmfan/cacu_jun4             -> origin/xmfan/cacu_jun4
2025-12-04T09:33:41.9257107Z  * [new branch]              xmfan/disable_duck_shape    -> origin/xmfan/disable_duck_shape
2025-12-04T09:33:41.9258693Z  * [new branch]              xmfan/fca_cpp_node_passthrough -> origin/xmfan/fca_cpp_node_passthrough
2025-12-04T09:33:41.9260430Z  * [new branch]              xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9
2025-12-04T09:33:41.9261964Z  * [new branch]              xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9
2025-12-04T09:33:41.9263097Z  * [new branch]              xmfan/single_step           -> origin/xmfan/single_step
2025-12-04T09:33:41.9264496Z  * [new branch]              xmfan/sth_0829              -> origin/xmfan/sth_0829
2025-12-04T09:33:41.9266598Z  * [new branch]              xmfan/test                  -> origin/xmfan/test
2025-12-04T09:33:41.9269296Z  * [new branch]              yguo/debug-0226-constexpr   -> origin/yguo/debug-0226-constexpr
2025-12-04T09:33:41.9270690Z  * [new branch]              yguo/new_latest_changes     -> origin/yguo/new_latest_changes
2025-12-04T09:33:41.9275672Z  * [new branch]              yguo/patch_constexpr_changes -> origin/yguo/patch_constexpr_changes
2025-12-04T09:33:41.9277618Z  * [new branch]              yiming/bootcamp             -> origin/yiming/bootcamp
2025-12-04T09:33:41.9279182Z  * [new branch]              yiming/run_with_start_end_rng_hop -> origin/yiming/run_with_start_end_rng_hop
2025-12-04T09:33:41.9280857Z  * [new branch]              yolo-llama3                 -> origin/yolo-llama3
2025-12-04T09:33:41.9282864Z  * [new branch]              zainr/canary-test           -> origin/zainr/canary-test
2025-12-04T09:33:41.9284599Z  * [new branch]              zainr/cleanup-gh-runners    -> origin/zainr/cleanup-gh-runners
2025-12-04T09:33:41.9285998Z  * [new branch]              zainr/pull-migration-c      -> origin/zainr/pull-migration-c
2025-12-04T09:33:41.9287848Z  * [new branch]              zainr/test2                 -> origin/zainr/test2
2025-12-04T09:33:41.9289808Z  * [new branch]              zasdfgbnm-patch-3           -> origin/zasdfgbnm-patch-3
2025-12-04T09:33:41.9291220Z  * [new branch]              zb2p                        -> origin/zb2p
2025-12-04T09:33:41.9292820Z  * [new branch]              zeros-and-scatter-part2     -> origin/zeros-and-scatter-part2
2025-12-04T09:33:41.9295267Z  * [new branch]              zhxchen17/ci/vllm_lora_oom  -> origin/zhxchen17/ci/vllm_lora_oom
2025-12-04T09:33:41.9296861Z  * [new branch]              zhxchen17/ci/vllm_multimodal_oom -> origin/zhxchen17/ci/vllm_multimodal_oom
2025-12-04T09:33:41.9298363Z  * [new branch]              zhxchen17/ci/vllm_pin       -> origin/zhxchen17/ci/vllm_pin
2025-12-04T09:33:41.9300502Z  * [new branch]              zhxchen17/dynamo/unsafe_drop_all_guards -> origin/zhxchen17/dynamo/unsafe_drop_all_guards
2025-12-04T09:33:41.9302535Z  * [new branch]              zhxchen17/export/call_override -> origin/zhxchen17/export/call_override
2025-12-04T09:33:41.9303932Z  * [new branch]              zhxchen17/export/codemod1   -> origin/zhxchen17/export/codemod1
2025-12-04T09:33:41.9305545Z  * [new branch]              zhxchen17/export/ctx_return -> origin/zhxchen17/export/ctx_return
2025-12-04T09:33:41.9307149Z  * [new branch]              zhxchen17/export/disable_side_effect_warn -> origin/zhxchen17/export/disable_side_effect_warn
2025-12-04T09:33:41.9308659Z  * [new branch]              zhxchen17/export/pytree_check -> origin/zhxchen17/export/pytree_check
2025-12-04T09:33:41.9310567Z  * [new branch]              zhxchen17/precompile/aoti   -> origin/zhxchen17/precompile/aoti
2025-12-04T09:33:41.9312166Z  * [new branch]              zhxchen17/precompile/globals -> origin/zhxchen17/precompile/globals
2025-12-04T09:33:41.9313655Z  * [new branch]              zhxchen17/precompile/inductor_guards -> origin/zhxchen17/precompile/inductor_guards
2025-12-04T09:33:41.9315459Z  * [new branch]              zhxchen17/scratch/0         -> origin/zhxchen17/scratch/0
2025-12-04T09:33:41.9317176Z  * [new branch]              zhxchen17/torch_export_api_update -> origin/zhxchen17/torch_export_api_update
2025-12-04T09:33:41.9319225Z  * [new branch]              zhxhcen17/moodycamel        -> origin/zhxhcen17/moodycamel
2025-12-04T09:33:41.9321545Z  * [new branch]              zxiiro/build-times          -> origin/zxiiro/build-times
2025-12-04T09:33:41.9323116Z  * [new branch]              zxiiro/c7i.2xlarge          -> origin/zxiiro/c7i.2xlarge
2025-12-04T09:33:41.9324679Z  * [new branch]              zxiiro/c7i.2xlarge.h100     -> origin/zxiiro/c7i.2xlarge.h100
2025-12-04T09:33:41.9326196Z  * [new branch]              zxiiro/main                 -> origin/zxiiro/main
2025-12-04T09:33:41.9327720Z  * [new branch]              zxiiro/risc64               -> origin/zxiiro/risc64
2025-12-04T09:33:41.9329400Z  * [new branch]              zxiiro/test-multicloud-arc  -> origin/zxiiro/test-multicloud-arc
2025-12-04T09:33:41.9331023Z  * [new tag]                 bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug -> bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug
2025-12-04T09:33:41.9332197Z  * [new tag]                 ci/binaries/77164           -> ci/binaries/77164
2025-12-04T09:33:41.9333568Z  * [new tag]                 ciflow/b200/115316          -> ciflow/b200/115316
2025-12-04T09:33:41.9334564Z  * [new tag]                 ciflow/b200/160685          -> ciflow/b200/160685
2025-12-04T09:33:41.9335621Z  * [new tag]                 ciflow/b200/161607          -> ciflow/b200/161607
2025-12-04T09:33:41.9336779Z  * [new tag]                 ciflow/b200/161938          -> ciflow/b200/161938
2025-12-04T09:33:41.9338027Z  * [new tag]                 ciflow/b200/167207          -> ciflow/b200/167207
2025-12-04T09:33:41.9338869Z  * [new tag]                 ciflow/b200/167989          -> ciflow/b200/167989
2025-12-04T09:33:41.9340098Z  * [new tag]                 ciflow/b200/168096          -> ciflow/b200/168096
2025-12-04T09:33:41.9341275Z  * [new tag]                 ciflow/b200/168175          -> ciflow/b200/168175
2025-12-04T09:33:41.9342379Z  * [new tag]                 ciflow/b200/168195          -> ciflow/b200/168195
2025-12-04T09:33:41.9343449Z  * [new tag]                 ciflow/b200/169200          -> ciflow/b200/169200
2025-12-04T09:33:41.9344615Z  * [new tag]                 ciflow/b200/169216          -> ciflow/b200/169216
2025-12-04T09:33:41.9346133Z  * [new tag]                 ciflow/b200/169380          -> ciflow/b200/169380
2025-12-04T09:33:41.9347619Z  * [new tag]                 ciflow/b200/169412          -> ciflow/b200/169412
2025-12-04T09:33:41.9348855Z  * [new tag]                 ciflow/b200/169470          -> ciflow/b200/169470
2025-12-04T09:33:41.9350106Z  * [new tag]                 ciflow/b200/169471          -> ciflow/b200/169471
2025-12-04T09:33:41.9351105Z  * [new tag]                 ciflow/b200/169472          -> ciflow/b200/169472
2025-12-04T09:33:41.9352460Z  * [new tag]                 ciflow/b200/169514          -> ciflow/b200/169514
2025-12-04T09:33:41.9353518Z  * [new tag]                 ciflow/b200/169517          -> ciflow/b200/169517
2025-12-04T09:33:41.9354948Z  * [new tag]                 ciflow/binaries/165922      -> ciflow/binaries/165922
2025-12-04T09:33:41.9356293Z  * [new tag]                 ciflow/binaries/169510      -> ciflow/binaries/169510
2025-12-04T09:33:41.9357740Z  * [new tag]                 ciflow/binaries_wheel/157994 -> ciflow/binaries_wheel/157994
2025-12-04T09:33:41.9358897Z  * [new tag]                 ciflow/binaries_wheel/166829 -> ciflow/binaries_wheel/166829
2025-12-04T09:33:41.9359897Z  * [new tag]                 ciflow/binaries_wheel/167972 -> ciflow/binaries_wheel/167972
2025-12-04T09:33:41.9361085Z  * [new tag]                 ciflow/binaries_wheel/167981 -> ciflow/binaries_wheel/167981
2025-12-04T09:33:41.9362252Z  * [new tag]                 ciflow/dynamo/167695        -> ciflow/dynamo/167695
2025-12-04T09:33:41.9363251Z  * [new tag]                 ciflow/dynamo/168096        -> ciflow/dynamo/168096
2025-12-04T09:33:41.9364423Z  * [new tag]                 ciflow/dynamo/169525        -> ciflow/dynamo/169525
2025-12-04T09:33:41.9365830Z  * [new tag]                 ciflow/h100-cutlass-backend/161938 -> ciflow/h100-cutlass-backend/161938
2025-12-04T09:33:41.9366744Z  * [new tag]                 ciflow/h100-cutlass-backend/161940 -> ciflow/h100-cutlass-backend/161940
2025-12-04T09:33:41.9368126Z  * [new tag]                 ciflow/h100-distributed/168923 -> ciflow/h100-distributed/168923
2025-12-04T09:33:41.9369332Z  * [new tag]                 ciflow/h100-symm-mem/167552 -> ciflow/h100-symm-mem/167552
2025-12-04T09:33:41.9370198Z  * [new tag]                 ciflow/h100-symm-mem/168129 -> ciflow/h100-symm-mem/168129
2025-12-04T09:33:41.9371401Z  * [new tag]                 ciflow/h100-symm-mem/168917 -> ciflow/h100-symm-mem/168917
2025-12-04T09:33:41.9372893Z  * [new tag]                 ciflow/h100-symm-mem/169156 -> ciflow/h100-symm-mem/169156
2025-12-04T09:33:41.9373757Z  * [new tag]                 ciflow/h100-symm-mem/169200 -> ciflow/h100-symm-mem/169200
2025-12-04T09:33:41.9374883Z  * [new tag]                 ciflow/h100-symm-mem/169216 -> ciflow/h100-symm-mem/169216
2025-12-04T09:33:41.9375720Z  * [new tag]                 ciflow/h100-symm-mem/169338 -> ciflow/h100-symm-mem/169338
2025-12-04T09:33:41.9377101Z  * [new tag]                 ciflow/h100-symm-mem/169355 -> ciflow/h100-symm-mem/169355
2025-12-04T09:33:41.9377968Z  * [new tag]                 ciflow/h100-symm-mem/169543 -> ciflow/h100-symm-mem/169543
2025-12-04T09:33:41.9379324Z  * [new tag]                 ciflow/h100/115316          -> ciflow/h100/115316
2025-12-04T09:33:41.9380361Z  * [new tag]                 ciflow/h100/160685          -> ciflow/h100/160685
2025-12-04T09:33:41.9381370Z  * [new tag]                 ciflow/h100/160729          -> ciflow/h100/160729
2025-12-04T09:33:41.9382337Z  * [new tag]                 ciflow/h100/161607          -> ciflow/h100/161607
2025-12-04T09:33:41.9383363Z  * [new tag]                 ciflow/h100/161938          -> ciflow/h100/161938
2025-12-04T09:33:41.9384527Z  * [new tag]                 ciflow/h100/167207          -> ciflow/h100/167207
2025-12-04T09:33:41.9385255Z  * [new tag]                 ciflow/h100/167989          -> ciflow/h100/167989
2025-12-04T09:33:41.9386340Z  * [new tag]                 ciflow/h100/168096          -> ciflow/h100/168096
2025-12-04T09:33:41.9387366Z  * [new tag]                 ciflow/h100/168175          -> ciflow/h100/168175
2025-12-04T09:33:41.9388200Z  * [new tag]                 ciflow/h100/168195          -> ciflow/h100/168195
2025-12-04T09:33:41.9389308Z  * [new tag]                 ciflow/h100/168980          -> ciflow/h100/168980
2025-12-04T09:33:41.9390655Z  * [new tag]                 ciflow/h100/169200          -> ciflow/h100/169200
2025-12-04T09:33:41.9392056Z  * [new tag]                 ciflow/h100/169216          -> ciflow/h100/169216
2025-12-04T09:33:41.9393299Z  * [new tag]                 ciflow/h100/169380          -> ciflow/h100/169380
2025-12-04T09:33:41.9394362Z  * [new tag]                 ciflow/h100/169412          -> ciflow/h100/169412
2025-12-04T09:33:41.9395450Z  * [new tag]                 ciflow/h100/169470          -> ciflow/h100/169470
2025-12-04T09:33:41.9396495Z  * [new tag]                 ciflow/h100/169471          -> ciflow/h100/169471
2025-12-04T09:33:41.9397571Z  * [new tag]                 ciflow/h100/169472          -> ciflow/h100/169472
2025-12-04T09:33:41.9398488Z  * [new tag]                 ciflow/h100/169514          -> ciflow/h100/169514
2025-12-04T09:33:41.9399851Z  * [new tag]                 ciflow/inductor-cu126/168096 -> ciflow/inductor-cu126/168096
2025-12-04T09:33:41.9401580Z  * [new tag]                 ciflow/inductor-micro-benchmark-cpu-x86/168096 -> ciflow/inductor-micro-benchmark-cpu-x86/168096
2025-12-04T09:33:41.9402593Z  * [new tag]                 ciflow/inductor-micro-benchmark/166165 -> ciflow/inductor-micro-benchmark/166165
2025-12-04T09:33:41.9403637Z  * [new tag]                 ciflow/inductor-micro-benchmark/168096 -> ciflow/inductor-micro-benchmark/168096
2025-12-04T09:33:41.9404921Z  * [new tag]                 ciflow/inductor-perf-compare/168096 -> ciflow/inductor-perf-compare/168096
2025-12-04T09:33:41.9406752Z  * [new tag]                 ciflow/inductor-perf-test-nightly-rocm-mi300/168073 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168073
2025-12-04T09:33:41.9407609Z  * [new tag]                 ciflow/inductor-perf-test-nightly-rocm-mi300/168096 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168096
2025-12-04T09:33:41.9408786Z  * [new tag]                 ciflow/inductor-perf-test-nightly-rocm-mi300/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi300/169024
2025-12-04T09:33:41.9410211Z  * [new tag]                 ciflow/inductor-perf-test-nightly-rocm-mi355/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi355/169024
2025-12-04T09:33:41.9411176Z  * [new tag]                 ciflow/inductor-perf-test-nightly/168096 -> ciflow/inductor-perf-test-nightly/168096
2025-12-04T09:33:41.9412402Z  * [new tag]                 ciflow/inductor-periodic/168096 -> ciflow/inductor-periodic/168096
2025-12-04T09:33:41.9413302Z  * [new tag]                 ciflow/inductor-periodic/169024 -> ciflow/inductor-periodic/169024
2025-12-04T09:33:41.9414546Z  * [new tag]                 ciflow/inductor-periodic/169425 -> ciflow/inductor-periodic/169425
2025-12-04T09:33:41.9415925Z  * [new tag]                 ciflow/inductor-rocm-mi200/165545 -> ciflow/inductor-rocm-mi200/165545
2025-12-04T09:33:41.9417206Z  * [new tag]                 ciflow/inductor-rocm-mi200/165997 -> ciflow/inductor-rocm-mi200/165997
2025-12-04T09:33:41.9418102Z  * [new tag]                 ciflow/inductor-rocm-mi200/168096 -> ciflow/inductor-rocm-mi200/168096
2025-12-04T09:33:41.9419378Z  * [new tag]                 ciflow/inductor-rocm-mi200/169063 -> ciflow/inductor-rocm-mi200/169063
2025-12-04T09:33:41.9420265Z  * [new tag]                 ciflow/inductor-rocm-mi200/169425 -> ciflow/inductor-rocm-mi200/169425
2025-12-04T09:33:41.9421688Z  * [new tag]                 ciflow/inductor-rocm-mi300/165545 -> ciflow/inductor-rocm-mi300/165545
2025-12-04T09:33:41.9422474Z  * [new tag]                 ciflow/inductor-rocm-mi300/168096 -> ciflow/inductor-rocm-mi300/168096
2025-12-04T09:33:41.9423422Z  * [new tag]                 ciflow/inductor-rocm-mi300/169063 -> ciflow/inductor-rocm-mi300/169063
2025-12-04T09:33:41.9424369Z  * [new tag]                 ciflow/inductor-rocm-mi300/169425 -> ciflow/inductor-rocm-mi300/169425
2025-12-04T09:33:41.9425803Z  * [new tag]                 ciflow/inductor-rocm/162052 -> ciflow/inductor-rocm/162052
2025-12-04T09:33:41.9426684Z  * [new tag]                 ciflow/inductor-rocm/168971 -> ciflow/inductor-rocm/168971
2025-12-04T09:33:41.9428046Z  * [new tag]                 ciflow/inductor-windows/168096 -> ciflow/inductor-windows/168096
2025-12-04T09:33:41.9429221Z  * [new tag]                 ciflow/inductor/144542      -> ciflow/inductor/144542
2025-12-04T09:33:41.9430794Z  * [new tag]                 ciflow/inductor/146506      -> ciflow/inductor/146506
2025-12-04T09:33:41.9431693Z  * [new tag]                 ciflow/inductor/147990      -> ciflow/inductor/147990
2025-12-04T09:33:41.9433013Z  * [new tag]                 ciflow/inductor/148294      -> ciflow/inductor/148294
2025-12-04T09:33:41.9434103Z  * [new tag]                 ciflow/inductor/148492      -> ciflow/inductor/148492
2025-12-04T09:33:41.9434986Z  * [new tag]                 ciflow/inductor/157149      -> ciflow/inductor/157149
2025-12-04T09:33:41.9436079Z  * [new tag]                 ciflow/inductor/157994      -> ciflow/inductor/157994
2025-12-04T09:33:41.9437209Z  * [new tag]                 ciflow/inductor/160685      -> ciflow/inductor/160685
2025-12-04T09:33:41.9438099Z  * [new tag]                 ciflow/inductor/160686      -> ciflow/inductor/160686
2025-12-04T09:33:41.9439186Z  * [new tag]                 ciflow/inductor/160687      -> ciflow/inductor/160687
2025-12-04T09:33:41.9440062Z  * [new tag]                 ciflow/inductor/160688      -> ciflow/inductor/160688
2025-12-04T09:33:41.9441535Z  * [new tag]                 ciflow/inductor/160706      -> ciflow/inductor/160706
2025-12-04T09:33:41.9442913Z  * [new tag]                 ciflow/inductor/160729      -> ciflow/inductor/160729
2025-12-04T09:33:41.9444159Z  * [new tag]                 ciflow/inductor/161938      -> ciflow/inductor/161938
2025-12-04T09:33:41.9445202Z  * [new tag]                 ciflow/inductor/161939      -> ciflow/inductor/161939
2025-12-04T09:33:41.9446277Z  * [new tag]                 ciflow/inductor/161940      -> ciflow/inductor/161940
2025-12-04T09:33:41.9447364Z  * [new tag]                 ciflow/inductor/162052      -> ciflow/inductor/162052
2025-12-04T09:33:41.9448367Z  * [new tag]                 ciflow/inductor/162275      -> ciflow/inductor/162275
2025-12-04T09:33:41.9449436Z  * [new tag]                 ciflow/inductor/162795      -> ciflow/inductor/162795
2025-12-04T09:33:41.9450747Z  * [new tag]                 ciflow/inductor/163245      -> ciflow/inductor/163245
2025-12-04T09:33:41.9451818Z  * [new tag]                 ciflow/inductor/163335      -> ciflow/inductor/163335
2025-12-04T09:33:41.9452899Z  * [new tag]                 ciflow/inductor/163503      -> ciflow/inductor/163503
2025-12-04T09:33:41.9453796Z  * [new tag]                 ciflow/inductor/163942      -> ciflow/inductor/163942
2025-12-04T09:33:41.9455088Z  * [new tag]                 ciflow/inductor/165270      -> ciflow/inductor/165270
2025-12-04T09:33:41.9456149Z  * [new tag]                 ciflow/inductor/165274      -> ciflow/inductor/165274
2025-12-04T09:33:41.9457363Z  * [new tag]                 ciflow/inductor/165322      -> ciflow/inductor/165322
2025-12-04T09:33:41.9458401Z  * [new tag]                 ciflow/inductor/165597      -> ciflow/inductor/165597
2025-12-04T09:33:41.9459495Z  * [new tag]                 ciflow/inductor/166063      -> ciflow/inductor/166063
2025-12-04T09:33:41.9460526Z  * [new tag]                 ciflow/inductor/166075      -> ciflow/inductor/166075
2025-12-04T09:33:41.9461688Z  * [new tag]                 ciflow/inductor/166165      -> ciflow/inductor/166165
2025-12-04T09:33:41.9462966Z  * [new tag]                 ciflow/inductor/166254      -> ciflow/inductor/166254
2025-12-04T09:33:41.9464034Z  * [new tag]                 ciflow/inductor/166483      -> ciflow/inductor/166483
2025-12-04T09:33:41.9465125Z  * [new tag]                 ciflow/inductor/166494      -> ciflow/inductor/166494
2025-12-04T09:33:41.9466241Z  * [new tag]                 ciflow/inductor/166545      -> ciflow/inductor/166545
2025-12-04T09:33:41.9467116Z  * [new tag]                 ciflow/inductor/166788      -> ciflow/inductor/166788
2025-12-04T09:33:41.9468452Z  * [new tag]                 ciflow/inductor/166846      -> ciflow/inductor/166846
2025-12-04T09:33:41.9469503Z  * [new tag]                 ciflow/inductor/167300      -> ciflow/inductor/167300
2025-12-04T09:33:41.9470570Z  * [new tag]                 ciflow/inductor/167407      -> ciflow/inductor/167407
2025-12-04T09:33:41.9471945Z  * [new tag]                 ciflow/inductor/167536      -> ciflow/inductor/167536
2025-12-04T09:33:41.9473092Z  * [new tag]                 ciflow/inductor/167552      -> ciflow/inductor/167552
2025-12-04T09:33:41.9474117Z  * [new tag]                 ciflow/inductor/167555      -> ciflow/inductor/167555
2025-12-04T09:33:41.9475417Z  * [new tag]                 ciflow/inductor/167583      -> ciflow/inductor/167583
2025-12-04T09:33:41.9476237Z  * [new tag]                 ciflow/inductor/167599      -> ciflow/inductor/167599
2025-12-04T09:33:41.9477401Z  * [new tag]                 ciflow/inductor/167647      -> ciflow/inductor/167647
2025-12-04T09:33:41.9478434Z  * [new tag]                 ciflow/inductor/167677      -> ciflow/inductor/167677
2025-12-04T09:33:41.9479477Z  * [new tag]                 ciflow/inductor/167680      -> ciflow/inductor/167680
2025-12-04T09:33:41.9480525Z  * [new tag]                 ciflow/inductor/167695      -> ciflow/inductor/167695
2025-12-04T09:33:41.9481585Z  * [new tag]                 ciflow/inductor/167742      -> ciflow/inductor/167742
2025-12-04T09:33:41.9482638Z  * [new tag]                 ciflow/inductor/167768      -> ciflow/inductor/167768
2025-12-04T09:33:41.9483953Z  * [new tag]                 ciflow/inductor/167773      -> ciflow/inductor/167773
2025-12-04T09:33:41.9485097Z  * [new tag]                 ciflow/inductor/167781      -> ciflow/inductor/167781
2025-12-04T09:33:41.9486083Z  * [new tag]                 ciflow/inductor/167880      -> ciflow/inductor/167880
2025-12-04T09:33:41.9487166Z  * [new tag]                 ciflow/inductor/167887      -> ciflow/inductor/167887
2025-12-04T09:33:41.9488259Z  * [new tag]                 ciflow/inductor/167972      -> ciflow/inductor/167972
2025-12-04T09:33:41.9489315Z  * [new tag]                 ciflow/inductor/167989      -> ciflow/inductor/167989
2025-12-04T09:33:41.9490380Z  * [new tag]                 ciflow/inductor/168002      -> ciflow/inductor/168002
2025-12-04T09:33:41.9491407Z  * [new tag]                 ciflow/inductor/168050      -> ciflow/inductor/168050
2025-12-04T09:33:41.9492487Z  * [new tag]                 ciflow/inductor/168051      -> ciflow/inductor/168051
2025-12-04T09:33:41.9493482Z  * [new tag]                 ciflow/inductor/168052      -> ciflow/inductor/168052
2025-12-04T09:33:41.9494572Z  * [new tag]                 ciflow/inductor/168073      -> ciflow/inductor/168073
2025-12-04T09:33:41.9495653Z  * [new tag]                 ciflow/inductor/168096      -> ciflow/inductor/168096
2025-12-04T09:33:41.9496805Z  * [new tag]                 ciflow/inductor/168114      -> ciflow/inductor/168114
2025-12-04T09:33:41.9497894Z  * [new tag]                 ciflow/inductor/168115      -> ciflow/inductor/168115
2025-12-04T09:33:41.9498881Z  * [new tag]                 ciflow/inductor/168127      -> ciflow/inductor/168127
2025-12-04T09:33:41.9500062Z  * [new tag]                 ciflow/inductor/168129      -> ciflow/inductor/168129
2025-12-04T09:33:41.9501683Z  * [new tag]                 ciflow/inductor/168157      -> ciflow/inductor/168157
2025-12-04T09:33:41.9502910Z  * [new tag]                 ciflow/inductor/168175      -> ciflow/inductor/168175
2025-12-04T09:33:41.9503723Z  * [new tag]                 ciflow/inductor/168185      -> ciflow/inductor/168185
2025-12-04T09:33:41.9504831Z  * [new tag]                 ciflow/inductor/168195      -> ciflow/inductor/168195
2025-12-04T09:33:41.9505868Z  * [new tag]                 ciflow/inductor/168209      -> ciflow/inductor/168209
2025-12-04T09:33:41.9506906Z  * [new tag]                 ciflow/inductor/168266      -> ciflow/inductor/168266
2025-12-04T09:33:41.9508129Z  * [new tag]                 ciflow/inductor/168316      -> ciflow/inductor/168316
2025-12-04T09:33:41.9509410Z  * [new tag]                 ciflow/inductor/168326      -> ciflow/inductor/168326
2025-12-04T09:33:41.9510459Z  * [new tag]                 ciflow/inductor/168368      -> ciflow/inductor/168368
2025-12-04T09:33:41.9511530Z  * [new tag]                 ciflow/inductor/168894      -> ciflow/inductor/168894
2025-12-04T09:33:41.9512575Z  * [new tag]                 ciflow/inductor/168934      -> ciflow/inductor/168934
2025-12-04T09:33:41.9513681Z  * [new tag]                 ciflow/inductor/168939      -> ciflow/inductor/168939
2025-12-04T09:33:41.9514703Z  * [new tag]                 ciflow/inductor/168946      -> ciflow/inductor/168946
2025-12-04T09:33:41.9515605Z  * [new tag]                 ciflow/inductor/168950      -> ciflow/inductor/168950
2025-12-04T09:33:41.9516758Z  * [new tag]                 ciflow/inductor/168951      -> ciflow/inductor/168951
2025-12-04T09:33:41.9517788Z  * [new tag]                 ciflow/inductor/168952      -> ciflow/inductor/168952
2025-12-04T09:33:41.9518848Z  * [new tag]                 ciflow/inductor/168955      -> ciflow/inductor/168955
2025-12-04T09:33:41.9519831Z  * [new tag]                 ciflow/inductor/168971      -> ciflow/inductor/168971
2025-12-04T09:33:41.9520919Z  * [new tag]                 ciflow/inductor/168979      -> ciflow/inductor/168979
2025-12-04T09:33:41.9521970Z  * [new tag]                 ciflow/inductor/168980      -> ciflow/inductor/168980
2025-12-04T09:33:41.9523188Z  * [new tag]                 ciflow/inductor/168983      -> ciflow/inductor/168983
2025-12-04T09:33:41.9524252Z  * [new tag]                 ciflow/inductor/169006      -> ciflow/inductor/169006
2025-12-04T09:33:41.9525336Z  * [new tag]                 ciflow/inductor/169023      -> ciflow/inductor/169023
2025-12-04T09:33:41.9526422Z  * [new tag]                 ciflow/inductor/169024      -> ciflow/inductor/169024
2025-12-04T09:33:41.9527297Z  * [new tag]                 ciflow/inductor/169025      -> ciflow/inductor/169025
2025-12-04T09:33:41.9528471Z  * [new tag]                 ciflow/inductor/169066      -> ciflow/inductor/169066
2025-12-04T09:33:41.9529516Z  * [new tag]                 ciflow/inductor/169091      -> ciflow/inductor/169091
2025-12-04T09:33:41.9530593Z  * [new tag]                 ciflow/inductor/169102      -> ciflow/inductor/169102
2025-12-04T09:33:41.9531492Z  * [new tag]                 ciflow/inductor/169103      -> ciflow/inductor/169103
2025-12-04T09:33:41.9532617Z  * [new tag]                 ciflow/inductor/169121      -> ciflow/inductor/169121
2025-12-04T09:33:41.9533730Z  * [new tag]                 ciflow/inductor/169134      -> ciflow/inductor/169134
2025-12-04T09:33:41.9534717Z  * [new tag]                 ciflow/inductor/169135      -> ciflow/inductor/169135
2025-12-04T09:33:41.9535766Z  * [new tag]                 ciflow/inductor/169141      -> ciflow/inductor/169141
2025-12-04T09:33:41.9536866Z  * [new tag]                 ciflow/inductor/169151      -> ciflow/inductor/169151
2025-12-04T09:33:41.9538112Z  * [new tag]                 ciflow/inductor/169161      -> ciflow/inductor/169161
2025-12-04T09:33:41.9539198Z  * [new tag]                 ciflow/inductor/169167      -> ciflow/inductor/169167
2025-12-04T09:33:41.9540434Z  * [new tag]                 ciflow/inductor/169177      -> ciflow/inductor/169177
2025-12-04T09:33:41.9541732Z  * [new tag]                 ciflow/inductor/169185      -> ciflow/inductor/169185
2025-12-04T09:33:41.9542822Z  * [new tag]                 ciflow/inductor/169196      -> ciflow/inductor/169196
2025-12-04T09:33:41.9543913Z  * [new tag]                 ciflow/inductor/169200      -> ciflow/inductor/169200
2025-12-04T09:33:41.9544975Z  * [new tag]                 ciflow/inductor/169204      -> ciflow/inductor/169204
2025-12-04T09:33:41.9545995Z  * [new tag]                 ciflow/inductor/169216      -> ciflow/inductor/169216
2025-12-04T09:33:41.9547083Z  * [new tag]                 ciflow/inductor/169219      -> ciflow/inductor/169219
2025-12-04T09:33:41.9548125Z  * [new tag]                 ciflow/inductor/169220      -> ciflow/inductor/169220
2025-12-04T09:33:41.9549409Z  * [new tag]                 ciflow/inductor/169230      -> ciflow/inductor/169230
2025-12-04T09:33:41.9550437Z  * [new tag]                 ciflow/inductor/169242      -> ciflow/inductor/169242
2025-12-04T09:33:41.9551465Z  * [new tag]                 ciflow/inductor/169245      -> ciflow/inductor/169245
2025-12-04T09:33:41.9552691Z  * [new tag]                 ciflow/inductor/169260      -> ciflow/inductor/169260
2025-12-04T09:33:41.9553720Z  * [new tag]                 ciflow/inductor/169282      -> ciflow/inductor/169282
2025-12-04T09:33:41.9554915Z  * [new tag]                 ciflow/inductor/169286      -> ciflow/inductor/169286
2025-12-04T09:33:41.9555700Z  * [new tag]                 ciflow/inductor/169299      -> ciflow/inductor/169299
2025-12-04T09:33:41.9556988Z  * [new tag]                 ciflow/inductor/169304      -> ciflow/inductor/169304
2025-12-04T09:33:41.9558552Z  * [new tag]                 ciflow/inductor/169305      -> ciflow/inductor/169305
2025-12-04T09:33:41.9559628Z  * [new tag]                 ciflow/inductor/169308      -> ciflow/inductor/169308
2025-12-04T09:33:41.9560705Z  * [new tag]                 ciflow/inductor/169319      -> ciflow/inductor/169319
2025-12-04T09:33:41.9561782Z  * [new tag]                 ciflow/inductor/169326      -> ciflow/inductor/169326
2025-12-04T09:33:41.9562851Z  * [new tag]                 ciflow/inductor/169332      -> ciflow/inductor/169332
2025-12-04T09:33:41.9563728Z  * [new tag]                 ciflow/inductor/169333      -> ciflow/inductor/169333
2025-12-04T09:33:41.9565147Z  * [new tag]                 ciflow/inductor/169336      -> ciflow/inductor/169336
2025-12-04T09:33:41.9566279Z  * [new tag]                 ciflow/inductor/169340      -> ciflow/inductor/169340
2025-12-04T09:33:41.9567365Z  * [new tag]                 ciflow/inductor/169341      -> ciflow/inductor/169341
2025-12-04T09:33:41.9568466Z  * [new tag]                 ciflow/inductor/169343      -> ciflow/inductor/169343
2025-12-04T09:33:41.9569336Z  * [new tag]                 ciflow/inductor/169346      -> ciflow/inductor/169346
2025-12-04T09:33:41.9570662Z  * [new tag]                 ciflow/inductor/169348      -> ciflow/inductor/169348
2025-12-04T09:33:41.9573023Z  * [new tag]                 ciflow/inductor/169350      -> ciflow/inductor/169350
2025-12-04T09:33:41.9573575Z  * [new tag]                 ciflow/inductor/169355      -> ciflow/inductor/169355
2025-12-04T09:33:41.9574701Z  * [new tag]                 ciflow/inductor/169370      -> ciflow/inductor/169370
2025-12-04T09:33:41.9576278Z  * [new tag]                 ciflow/inductor/169375      -> ciflow/inductor/169375
2025-12-04T09:33:41.9577452Z  * [new tag]                 ciflow/inductor/169389      -> ciflow/inductor/169389
2025-12-04T09:33:41.9578523Z  * [new tag]                 ciflow/inductor/169391      -> ciflow/inductor/169391
2025-12-04T09:33:41.9579686Z  * [new tag]                 ciflow/inductor/169393      -> ciflow/inductor/169393
2025-12-04T09:33:41.9580733Z  * [new tag]                 ciflow/inductor/169399      -> ciflow/inductor/169399
2025-12-04T09:33:41.9582022Z  * [new tag]                 ciflow/inductor/169400      -> ciflow/inductor/169400
2025-12-04T09:33:41.9583065Z  * [new tag]                 ciflow/inductor/169415      -> ciflow/inductor/169415
2025-12-04T09:33:41.9584378Z  * [new tag]                 ciflow/inductor/169417      -> ciflow/inductor/169417
2025-12-04T09:33:41.9585073Z  * [new tag]                 ciflow/inductor/169418      -> ciflow/inductor/169418
2025-12-04T09:33:41.9586521Z  * [new tag]                 ciflow/inductor/169430      -> ciflow/inductor/169430
2025-12-04T09:33:41.9587570Z  * [new tag]                 ciflow/inductor/169432      -> ciflow/inductor/169432
2025-12-04T09:33:41.9588584Z  * [new tag]                 ciflow/inductor/169436      -> ciflow/inductor/169436
2025-12-04T09:33:41.9589819Z  * [new tag]                 ciflow/inductor/169437      -> ciflow/inductor/169437
2025-12-04T09:33:41.9590875Z  * [new tag]                 ciflow/inductor/169438      -> ciflow/inductor/169438
2025-12-04T09:33:41.9591947Z  * [new tag]                 ciflow/inductor/169441      -> ciflow/inductor/169441
2025-12-04T09:33:41.9593024Z  * [new tag]                 ciflow/inductor/169446      -> ciflow/inductor/169446
2025-12-04T09:33:41.9594246Z  * [new tag]                 ciflow/inductor/169447      -> ciflow/inductor/169447
2025-12-04T09:33:41.9595144Z  * [new tag]                 ciflow/inductor/169452      -> ciflow/inductor/169452
2025-12-04T09:33:41.9596491Z  * [new tag]                 ciflow/inductor/169455      -> ciflow/inductor/169455
2025-12-04T09:33:41.9597436Z  * [new tag]                 ciflow/inductor/169459      -> ciflow/inductor/169459
2025-12-04T09:33:41.9598673Z  * [new tag]                 ciflow/inductor/169463      -> ciflow/inductor/169463
2025-12-04T09:33:41.9599905Z  * [new tag]                 ciflow/inductor/169476      -> ciflow/inductor/169476
2025-12-04T09:33:41.9600946Z  * [new tag]                 ciflow/inductor/169485      -> ciflow/inductor/169485
2025-12-04T09:33:41.9602043Z  * [new tag]                 ciflow/inductor/169493      -> ciflow/inductor/169493
2025-12-04T09:33:41.9603202Z  * [new tag]                 ciflow/inductor/169496      -> ciflow/inductor/169496
2025-12-04T09:33:41.9604098Z  * [new tag]                 ciflow/inductor/169497      -> ciflow/inductor/169497
2025-12-04T09:33:41.9605239Z  * [new tag]                 ciflow/inductor/169503      -> ciflow/inductor/169503
2025-12-04T09:33:41.9606287Z  * [new tag]                 ciflow/inductor/169504      -> ciflow/inductor/169504
2025-12-04T09:33:41.9607685Z  * [new tag]                 ciflow/inductor/169505      -> ciflow/inductor/169505
2025-12-04T09:33:41.9609238Z  * [new tag]                 ciflow/inductor/169508      -> ciflow/inductor/169508
2025-12-04T09:33:41.9610308Z  * [new tag]                 ciflow/inductor/169509      -> ciflow/inductor/169509
2025-12-04T09:33:41.9611445Z  * [new tag]                 ciflow/inductor/169513      -> ciflow/inductor/169513
2025-12-04T09:33:41.9612531Z  * [new tag]                 ciflow/inductor/169514      -> ciflow/inductor/169514
2025-12-04T09:33:41.9613588Z  * [new tag]                 ciflow/inductor/169515      -> ciflow/inductor/169515
2025-12-04T09:33:41.9614657Z  * [new tag]                 ciflow/inductor/169517      -> ciflow/inductor/169517
2025-12-04T09:33:41.9615861Z  * [new tag]                 ciflow/inductor/169519      -> ciflow/inductor/169519
2025-12-04T09:33:41.9616844Z  * [new tag]                 ciflow/inductor/169520      -> ciflow/inductor/169520
2025-12-04T09:33:41.9618066Z  * [new tag]                 ciflow/inductor/169521      -> ciflow/inductor/169521
2025-12-04T09:33:41.9619090Z  * [new tag]                 ciflow/inductor/169524      -> ciflow/inductor/169524
2025-12-04T09:33:41.9620157Z  * [new tag]                 ciflow/inductor/169527      -> ciflow/inductor/169527
2025-12-04T09:33:41.9621205Z  * [new tag]                 ciflow/inductor/169528      -> ciflow/inductor/169528
2025-12-04T09:33:41.9622421Z  * [new tag]                 ciflow/inductor/169532      -> ciflow/inductor/169532
2025-12-04T09:33:41.9623506Z  * [new tag]                 ciflow/inductor/169535      -> ciflow/inductor/169535
2025-12-04T09:33:41.9624556Z  * [new tag]                 ciflow/inductor/169536      -> ciflow/inductor/169536
2025-12-04T09:33:41.9625746Z  * [new tag]                 ciflow/inductor/169547      -> ciflow/inductor/169547
2025-12-04T09:33:41.9626565Z  * [new tag]                 ciflow/inductor/169548      -> ciflow/inductor/169548
2025-12-04T09:33:41.9627711Z  * [new tag]                 ciflow/inductor/169549      -> ciflow/inductor/169549
2025-12-04T09:33:41.9628794Z  * [new tag]                 ciflow/inductor/169551      -> ciflow/inductor/169551
2025-12-04T09:33:41.9629832Z  * [new tag]                 ciflow/inductor/169552      -> ciflow/inductor/169552
2025-12-04T09:33:41.9630914Z  * [new tag]                 ciflow/inductor/169553      -> ciflow/inductor/169553
2025-12-04T09:33:41.9631946Z  * [new tag]                 ciflow/inductor/169557      -> ciflow/inductor/169557
2025-12-04T09:33:41.9633427Z  * [new tag]                 ciflow/inductor/3b9a386     -> ciflow/inductor/3b9a386
2025-12-04T09:33:41.9634748Z  * [new tag]                 ciflow/inductor/3d4b92b     -> ciflow/inductor/3d4b92b
2025-12-04T09:33:41.9635947Z  * [new tag]                 ciflow/inductor/d224ac7     -> ciflow/inductor/d224ac7
2025-12-04T09:33:41.9637321Z  * [new tag]                 ciflow/linux-aarch64/157994 -> ciflow/linux-aarch64/157994
2025-12-04T09:33:41.9638185Z  * [new tag]                 ciflow/linux-aarch64/166075 -> ciflow/linux-aarch64/166075
2025-12-04T09:33:41.9639372Z  * [new tag]                 ciflow/linux-aarch64/166876 -> ciflow/linux-aarch64/166876
2025-12-04T09:33:41.9640207Z  * [new tag]                 ciflow/linux-aarch64/167981 -> ciflow/linux-aarch64/167981
2025-12-04T09:33:41.9641549Z  * [new tag]                 ciflow/mps/166254           -> ciflow/mps/166254
2025-12-04T09:33:41.9642376Z  * [new tag]                 ciflow/mps/169017           -> ciflow/mps/169017
2025-12-04T09:33:41.9643628Z  * [new tag]                 ciflow/mps/169372           -> ciflow/mps/169372
2025-12-04T09:33:41.9644500Z  * [new tag]                 ciflow/mps/169478           -> ciflow/mps/169478
2025-12-04T09:33:41.9645919Z  * [new tag]                 ciflow/op-benchmark/157994  -> ciflow/op-benchmark/157994
2025-12-04T09:33:41.9646775Z  * [new tag]                 ciflow/op-benchmark/166075  -> ciflow/op-benchmark/166075
2025-12-04T09:33:41.9647989Z  * [new tag]                 ciflow/op-benchmark/169544  -> ciflow/op-benchmark/169544
2025-12-04T09:33:41.9649348Z  * [new tag]                 ciflow/periodic-rocm-mi200/165997 -> ciflow/periodic-rocm-mi200/165997
2025-12-04T09:33:41.9650593Z  * [new tag]                 ciflow/periodic-rocm-mi200/166517 -> ciflow/periodic-rocm-mi200/166517
2025-12-04T09:33:41.9651451Z  * [new tag]                 ciflow/periodic-rocm-mi200/169063 -> ciflow/periodic-rocm-mi200/169063
2025-12-04T09:33:41.9652613Z  * [new tag]                 ciflow/periodic-rocm-mi200/169425 -> ciflow/periodic-rocm-mi200/169425
2025-12-04T09:33:41.9653858Z  * [new tag]                 ciflow/periodic-rocm-mi300/166517 -> ciflow/periodic-rocm-mi300/166517
2025-12-04T09:33:41.9654841Z  * [new tag]                 ciflow/periodic-rocm-mi300/169063 -> ciflow/periodic-rocm-mi300/169063
2025-12-04T09:33:41.9655666Z  * [new tag]                 ciflow/periodic-rocm-mi300/169425 -> ciflow/periodic-rocm-mi300/169425
2025-12-04T09:33:41.9657389Z  * [new tag]                 ciflow/periodic/054a2fd     -> ciflow/periodic/054a2fd
2025-12-04T09:33:41.9658214Z  * [new tag]                 ciflow/periodic/167207      -> ciflow/periodic/167207
2025-12-04T09:33:41.9659469Z  * [new tag]                 ciflow/periodic/167978      -> ciflow/periodic/167978
2025-12-04T09:33:41.9660335Z  * [new tag]                 ciflow/periodic/168096      -> ciflow/periodic/168096
2025-12-04T09:33:41.9661439Z  * [new tag]                 ciflow/periodic/169286      -> ciflow/periodic/169286
2025-12-04T09:33:41.9662766Z  * [new tag]                 ciflow/periodic/2a6d37d     -> ciflow/periodic/2a6d37d
2025-12-04T09:33:41.9663919Z  * [new tag]                 ciflow/periodic/317eeb8     -> ciflow/periodic/317eeb8
2025-12-04T09:33:41.9665168Z  * [new tag]                 ciflow/periodic/3c32        -> ciflow/periodic/3c32
2025-12-04T09:33:41.9666327Z  * [new tag]                 ciflow/periodic/3e98831     -> ciflow/periodic/3e98831
2025-12-04T09:33:41.9668372Z  * [new tag]                 ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9
2025-12-04T09:33:41.9669592Z  * [new tag]                 ciflow/periodic/94512-point -> ciflow/periodic/94512-point
2025-12-04T09:33:41.9671416Z  * [new tag]                 ciflow/periodic/csl/test87519 -> ciflow/periodic/csl/test87519
2025-12-04T09:33:41.9675876Z  * [new tag]                 ciflow/periodic/csltest88275 -> ciflow/periodic/csltest88275
2025-12-04T09:33:41.9677090Z  * [new tag]                 ciflow/periodic/csltest88761 -> ciflow/periodic/csltest88761
2025-12-04T09:33:41.9678379Z  * [new tag]                 ciflow/periodic/release_1.12 -> ciflow/periodic/release_1.12
2025-12-04T09:33:41.9679828Z  * [new tag]                 ciflow/periodic/release_1.12.0 -> ciflow/periodic/release_1.12.0
2025-12-04T09:33:41.9681212Z  * [new tag]                 ciflow/periodic/sha-ec5b83  -> ciflow/periodic/sha-ec5b83
2025-12-04T09:33:41.9682442Z  * [new tag]                 ciflow/pull/167207          -> ciflow/pull/167207
2025-12-04T09:33:41.9684095Z  * [new tag]                 ciflow/quantization-periodic/169207 -> ciflow/quantization-periodic/169207
2025-12-04T09:33:41.9685194Z  * [new tag]                 ciflow/rocm-mi200/165545    -> ciflow/rocm-mi200/165545
2025-12-04T09:33:41.9686075Z  * [new tag]                 ciflow/rocm-mi200/165997    -> ciflow/rocm-mi200/165997
2025-12-04T09:33:41.9687104Z  * [new tag]                 ciflow/rocm-mi200/168096    -> ciflow/rocm-mi200/168096
2025-12-04T09:33:41.9688324Z  * [new tag]                 ciflow/rocm-mi200/168275    -> ciflow/rocm-mi200/168275
2025-12-04T09:33:41.9689214Z  * [new tag]                 ciflow/rocm-mi200/169063    -> ciflow/rocm-mi200/169063
2025-12-04T09:33:41.9690484Z  * [new tag]                 ciflow/rocm-mi200/169356    -> ciflow/rocm-mi200/169356
2025-12-04T09:33:41.9691349Z  * [new tag]                 ciflow/rocm-mi200/169425    -> ciflow/rocm-mi200/169425
2025-12-04T09:33:41.9692655Z  * [new tag]                 ciflow/rocm-mi300/165545    -> ciflow/rocm-mi300/165545
2025-12-04T09:33:41.9693871Z  * [new tag]                 ciflow/rocm-mi300/167157    -> ciflow/rocm-mi300/167157
2025-12-04T09:33:41.9694862Z  * [new tag]                 ciflow/rocm-mi300/168096    -> ciflow/rocm-mi300/168096
2025-12-04T09:33:41.9695771Z  * [new tag]                 ciflow/rocm-mi300/169063    -> ciflow/rocm-mi300/169063
2025-12-04T09:33:41.9696868Z  * [new tag]                 ciflow/rocm-mi300/169425    -> ciflow/rocm-mi300/169425
2025-12-04T09:33:41.9698140Z  * [new tag]                 ciflow/rocm-mi355/167157    -> ciflow/rocm-mi355/167157
2025-12-04T09:33:41.9699180Z  * [new tag]                 ciflow/rocm-mi355/168275    -> ciflow/rocm-mi355/168275
2025-12-04T09:33:41.9700662Z  * [new tag]                 ciflow/rocm-mi355/169425    -> ciflow/rocm-mi355/169425
2025-12-04T09:33:41.9701965Z  * [new tag]                 ciflow/rocm-navi31/168275   -> ciflow/rocm-navi31/168275
2025-12-04T09:33:41.9702856Z  * [new tag]                 ciflow/rocm-navi31/169425   -> ciflow/rocm-navi31/169425
2025-12-04T09:33:41.9704153Z  * [new tag]                 ciflow/rocm/115316          -> ciflow/rocm/115316
2025-12-04T09:33:41.9705209Z  * [new tag]                 ciflow/rocm/148492          -> ciflow/rocm/148492
2025-12-04T09:33:41.9706203Z  * [new tag]                 ciflow/rocm/160685          -> ciflow/rocm/160685
2025-12-04T09:33:41.9707195Z  * [new tag]                 ciflow/rocm/161607          -> ciflow/rocm/161607
2025-12-04T09:33:41.9708201Z  * [new tag]                 ciflow/rocm/162052          -> ciflow/rocm/162052
2025-12-04T09:33:41.9709234Z  * [new tag]                 ciflow/rocm/165997          -> ciflow/rocm/165997
2025-12-04T09:33:41.9710345Z  * [new tag]                 ciflow/rocm/166165          -> ciflow/rocm/166165
2025-12-04T09:33:41.9711164Z  * [new tag]                 ciflow/rocm/166517          -> ciflow/rocm/166517
2025-12-04T09:33:41.9712244Z  * [new tag]                 ciflow/rocm/167207          -> ciflow/rocm/167207
2025-12-04T09:33:41.9713376Z  * [new tag]                 ciflow/rocm/167536          -> ciflow/rocm/167536
2025-12-04T09:33:41.9714242Z  * [new tag]                 ciflow/rocm/167781          -> ciflow/rocm/167781
2025-12-04T09:33:41.9716428Z  * [new tag]                 ciflow/rocm/167989          -> ciflow/rocm/167989
2025-12-04T09:33:41.9717171Z  * [new tag]                 ciflow/rocm/168073          -> ciflow/rocm/168073
2025-12-04T09:33:41.9718594Z  * [new tag]                 ciflow/rocm/168195          -> ciflow/rocm/168195
2025-12-04T09:33:41.9719659Z  * [new tag]                 ciflow/rocm/168939          -> ciflow/rocm/168939
2025-12-04T09:33:41.9720695Z  * [new tag]                 ciflow/rocm/168971          -> ciflow/rocm/168971
2025-12-04T09:33:41.9721807Z  * [new tag]                 ciflow/rocm/169024          -> ciflow/rocm/169024
2025-12-04T09:33:41.9722855Z  * [new tag]                 ciflow/rocm/169200          -> ciflow/rocm/169200
2025-12-04T09:33:41.9723897Z  * [new tag]                 ciflow/rocm/169216          -> ciflow/rocm/169216
2025-12-04T09:33:41.9724873Z  * [new tag]                 ciflow/rocm/169312          -> ciflow/rocm/169312
2025-12-04T09:33:41.9725988Z  * [new tag]                 ciflow/rocm/169380          -> ciflow/rocm/169380
2025-12-04T09:33:41.9727628Z  * [new tag]                 ciflow/rocm/169427          -> ciflow/rocm/169427
2025-12-04T09:33:41.9728648Z  * [new tag]                 ciflow/rocm/169455          -> ciflow/rocm/169455
2025-12-04T09:33:41.9729720Z  * [new tag]                 ciflow/rocm/169470          -> ciflow/rocm/169470
2025-12-04T09:33:41.9730776Z  * [new tag]                 ciflow/rocm/169471          -> ciflow/rocm/169471
2025-12-04T09:33:41.9731890Z  * [new tag]                 ciflow/rocm/169472          -> ciflow/rocm/169472
2025-12-04T09:33:41.9732860Z  * [new tag]                 ciflow/rocm/169514          -> ciflow/rocm/169514
2025-12-04T09:33:41.9734422Z  * [new tag]                 ciflow/slow/01c7106         -> ciflow/slow/01c7106
2025-12-04T09:33:41.9735600Z  * [new tag]                 ciflow/slow/0577043         -> ciflow/slow/0577043
2025-12-04T09:33:41.9737477Z  * [new tag]                 ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym -> ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym
2025-12-04T09:33:41.9738155Z  * [new tag]                 ciflow/slow/0e81104         -> ciflow/slow/0e81104
2025-12-04T09:33:41.9739260Z  * [new tag]                 ciflow/slow/167207          -> ciflow/slow/167207
2025-12-04T09:33:41.9740240Z  * [new tag]                 ciflow/slow/168050          -> ciflow/slow/168050
2025-12-04T09:33:41.9741558Z  * [new tag]                 ciflow/slow/1732077         -> ciflow/slow/1732077
2025-12-04T09:33:41.9742884Z  * [new tag]                 ciflow/slow/187eb7c         -> ciflow/slow/187eb7c
2025-12-04T09:33:41.9744407Z  * [new tag]                 ciflow/slow/1faef89         -> ciflow/slow/1faef89
2025-12-04T09:33:41.9745896Z  * [new tag]                 ciflow/slow/3920ec1         -> ciflow/slow/3920ec1
2025-12-04T09:33:41.9747366Z  * [new tag]                 ciflow/slow/3b7c6b2         -> ciflow/slow/3b7c6b2
2025-12-04T09:33:41.9748636Z  * [new tag]                 ciflow/slow/59a3759         -> ciflow/slow/59a3759
2025-12-04T09:33:41.9749877Z  * [new tag]                 ciflow/slow/70ef0bb         -> ciflow/slow/70ef0bb
2025-12-04T09:33:41.9751215Z  * [new tag]                 ciflow/slow/788ff06         -> ciflow/slow/788ff06
2025-12-04T09:33:41.9753058Z  * [new tag]                 ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym -> ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym
2025-12-04T09:33:41.9753796Z  * [new tag]                 ciflow/slow/9d85864         -> ciflow/slow/9d85864
2025-12-04T09:33:41.9755210Z  * [new tag]                 ciflow/slow/9ffad5b         -> ciflow/slow/9ffad5b
2025-12-04T09:33:41.9756367Z  * [new tag]                 ciflow/slow/a206e8b         -> ciflow/slow/a206e8b
2025-12-04T09:33:41.9757647Z  * [new tag]                 ciflow/slow/a837609         -> ciflow/slow/a837609
2025-12-04T09:33:41.9758976Z  * [new tag]                 ciflow/slow/af841f3         -> ciflow/slow/af841f3
2025-12-04T09:33:41.9760841Z  * [new tag]                 ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym -> ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym
2025-12-04T09:33:41.9761644Z  * [new tag]                 ciflow/torchbench/168175    -> ciflow/torchbench/168175
2025-12-04T09:33:41.9762950Z  * [new tag]                 ciflow/trunk/148492         -> ciflow/trunk/148492
2025-12-04T09:33:41.9763991Z  * [new tag]                 ciflow/trunk/157149         -> ciflow/trunk/157149
2025-12-04T09:33:41.9765015Z  * [new tag]                 ciflow/trunk/157994         -> ciflow/trunk/157994
2025-12-04T09:33:41.9766014Z  * [new tag]                 ciflow/trunk/159718         -> ciflow/trunk/159718
2025-12-04T09:33:41.9767064Z  * [new tag]                 ciflow/trunk/160685         -> ciflow/trunk/160685
2025-12-04T09:33:41.9768099Z  * [new tag]                 ciflow/trunk/160729         -> ciflow/trunk/160729
2025-12-04T09:33:41.9768992Z  * [new tag]                 ciflow/trunk/162275         -> ciflow/trunk/162275
2025-12-04T09:33:41.9770109Z  * [new tag]                 ciflow/trunk/162795         -> ciflow/trunk/162795
2025-12-04T09:33:41.9771421Z  * [new tag]                 ciflow/trunk/163245         -> ciflow/trunk/163245
2025-12-04T09:33:41.9772485Z  * [new tag]                 ciflow/trunk/163942         -> ciflow/trunk/163942
2025-12-04T09:33:41.9773393Z  * [new tag]                 ciflow/trunk/165274         -> ciflow/trunk/165274
2025-12-04T09:33:41.9775080Z  * [new tag]                 ciflow/trunk/165483         -> ciflow/trunk/165483
2025-12-04T09:33:41.9776506Z  * [new tag]                 ciflow/trunk/165728         -> ciflow/trunk/165728
2025-12-04T09:33:41.9777779Z  * [new tag]                 ciflow/trunk/165922         -> ciflow/trunk/165922
2025-12-04T09:33:41.9778843Z  * [new tag]                 ciflow/trunk/166075         -> ciflow/trunk/166075
2025-12-04T09:33:41.9779945Z  * [new tag]                 ciflow/trunk/166165         -> ciflow/trunk/166165
2025-12-04T09:33:41.9780989Z  * [new tag]                 ciflow/trunk/166829         -> ciflow/trunk/166829
2025-12-04T09:33:41.9782255Z  * [new tag]                 ciflow/trunk/166843         -> ciflow/trunk/166843
2025-12-04T09:33:41.9783303Z  * [new tag]                 ciflow/trunk/166876         -> ciflow/trunk/166876
2025-12-04T09:33:41.9784364Z  * [new tag]                 ciflow/trunk/167207         -> ciflow/trunk/167207
2025-12-04T09:33:41.9785399Z  * [new tag]                 ciflow/trunk/167536         -> ciflow/trunk/167536
2025-12-04T09:33:41.9787183Z  * [new tag]                 ciflow/trunk/167552         -> ciflow/trunk/167552
2025-12-04T09:33:41.9788274Z  * [new tag]                 ciflow/trunk/167555         -> ciflow/trunk/167555
2025-12-04T09:33:41.9789406Z  * [new tag]                 ciflow/trunk/167599         -> ciflow/trunk/167599
2025-12-04T09:33:41.9790472Z  * [new tag]                 ciflow/trunk/167659         -> ciflow/trunk/167659
2025-12-04T09:33:41.9791670Z  * [new tag]                 ciflow/trunk/167672         -> ciflow/trunk/167672
2025-12-04T09:33:41.9792731Z  * [new tag]                 ciflow/trunk/167742         -> ciflow/trunk/167742
2025-12-04T09:33:41.9793757Z  * [new tag]                 ciflow/trunk/167781         -> ciflow/trunk/167781
2025-12-04T09:33:41.9795054Z  * [new tag]                 ciflow/trunk/167837         -> ciflow/trunk/167837
2025-12-04T09:33:41.9796149Z  * [new tag]                 ciflow/trunk/167887         -> ciflow/trunk/167887
2025-12-04T09:33:41.9797181Z  * [new tag]                 ciflow/trunk/167978         -> ciflow/trunk/167978
2025-12-04T09:33:41.9798438Z  * [new tag]                 ciflow/trunk/168050         -> ciflow/trunk/168050
2025-12-04T09:33:41.9799175Z  * [new tag]                 ciflow/trunk/168051         -> ciflow/trunk/168051
2025-12-04T09:33:41.9800312Z  * [new tag]                 ciflow/trunk/168096         -> ciflow/trunk/168096
2025-12-04T09:33:41.9801352Z  * [new tag]                 ciflow/trunk/168127         -> ciflow/trunk/168127
2025-12-04T09:33:41.9802406Z  * [new tag]                 ciflow/trunk/168157         -> ciflow/trunk/168157
2025-12-04T09:33:41.9803498Z  * [new tag]                 ciflow/trunk/168175         -> ciflow/trunk/168175
2025-12-04T09:33:41.9804535Z  * [new tag]                 ciflow/trunk/168209         -> ciflow/trunk/168209
2025-12-04T09:33:41.9805754Z  * [new tag]                 ciflow/trunk/168213         -> ciflow/trunk/168213
2025-12-04T09:33:41.9807012Z  * [new tag]                 ciflow/trunk/168226         -> ciflow/trunk/168226
2025-12-04T09:33:41.9808095Z  * [new tag]                 ciflow/trunk/168262         -> ciflow/trunk/168262
2025-12-04T09:33:41.9809171Z  * [new tag]                 ciflow/trunk/168275         -> ciflow/trunk/168275
2025-12-04T09:33:41.9810403Z  * [new tag]                 ciflow/trunk/168328         -> ciflow/trunk/168328
2025-12-04T09:33:41.9811363Z  * [new tag]                 ciflow/trunk/168368         -> ciflow/trunk/168368
2025-12-04T09:33:41.9812486Z  * [new tag]                 ciflow/trunk/168917         -> ciflow/trunk/168917
2025-12-04T09:33:41.9813541Z  * [new tag]                 ciflow/trunk/168933         -> ciflow/trunk/168933
2025-12-04T09:33:41.9814794Z  * [new tag]                 ciflow/trunk/168941         -> ciflow/trunk/168941
2025-12-04T09:33:41.9815858Z  * [new tag]                 ciflow/trunk/168955         -> ciflow/trunk/168955
2025-12-04T09:33:41.9816989Z  * [new tag]                 ciflow/trunk/168980         -> ciflow/trunk/168980
2025-12-04T09:33:41.9818307Z  * [new tag]                 ciflow/trunk/169004         -> ciflow/trunk/169004
2025-12-04T09:33:41.9819430Z  * [new tag]                 ciflow/trunk/169006         -> ciflow/trunk/169006
2025-12-04T09:33:41.9820499Z  * [new tag]                 ciflow/trunk/169023         -> ciflow/trunk/169023
2025-12-04T09:33:41.9821554Z  * [new tag]                 ciflow/trunk/169025         -> ciflow/trunk/169025
2025-12-04T09:33:41.9822804Z  * [new tag]                 ciflow/trunk/169048         -> ciflow/trunk/169048
2025-12-04T09:33:41.9823869Z  * [new tag]                 ciflow/trunk/169066         -> ciflow/trunk/169066
2025-12-04T09:33:41.9824980Z  * [new tag]                 ciflow/trunk/169091         -> ciflow/trunk/169091
2025-12-04T09:33:41.9825956Z  * [new tag]                 ciflow/trunk/169102         -> ciflow/trunk/169102
2025-12-04T09:33:41.9827065Z  * [new tag]                 ciflow/trunk/169103         -> ciflow/trunk/169103
2025-12-04T09:33:41.9828282Z  * [new tag]                 ciflow/trunk/169125         -> ciflow/trunk/169125
2025-12-04T09:33:41.9829517Z  * [new tag]                 ciflow/trunk/169139         -> ciflow/trunk/169139
2025-12-04T09:33:41.9830786Z  * [new tag]                 ciflow/trunk/169148         -> ciflow/trunk/169148
2025-12-04T09:33:41.9831851Z  * [new tag]                 ciflow/trunk/169151         -> ciflow/trunk/169151
2025-12-04T09:33:41.9832957Z  * [new tag]                 ciflow/trunk/169156         -> ciflow/trunk/169156
2025-12-04T09:33:41.9834187Z  * [new tag]                 ciflow/trunk/169176         -> ciflow/trunk/169176
2025-12-04T09:33:41.9835250Z  * [new tag]                 ciflow/trunk/169204         -> ciflow/trunk/169204
2025-12-04T09:33:41.9836296Z  * [new tag]                 ciflow/trunk/169207         -> ciflow/trunk/169207
2025-12-04T09:33:41.9837349Z  * [new tag]                 ciflow/trunk/169211         -> ciflow/trunk/169211
2025-12-04T09:33:41.9838820Z  * [new tag]                 ciflow/trunk/169231         -> ciflow/trunk/169231
2025-12-04T09:33:41.9839889Z  * [new tag]                 ciflow/trunk/169260         -> ciflow/trunk/169260
2025-12-04T09:33:41.9841263Z  * [new tag]                 ciflow/trunk/169271         -> ciflow/trunk/169271
2025-12-04T09:33:41.9842318Z  * [new tag]                 ciflow/trunk/169280         -> ciflow/trunk/169280
2025-12-04T09:33:41.9843459Z  * [new tag]                 ciflow/trunk/169281         -> ciflow/trunk/169281
2025-12-04T09:33:41.9844331Z  * [new tag]                 ciflow/trunk/169286         -> ciflow/trunk/169286
2025-12-04T09:33:41.9845714Z  * [new tag]                 ciflow/trunk/169293         -> ciflow/trunk/169293
2025-12-04T09:33:41.9846779Z  * [new tag]                 ciflow/trunk/169296         -> ciflow/trunk/169296
2025-12-04T09:33:41.9847872Z  * [new tag]                 ciflow/trunk/169304         -> ciflow/trunk/169304
2025-12-04T09:33:41.9848951Z  * [new tag]                 ciflow/trunk/169305         -> ciflow/trunk/169305
2025-12-04T09:33:41.9849994Z  * [new tag]                 ciflow/trunk/169312         -> ciflow/trunk/169312
2025-12-04T09:33:41.9851479Z  * [new tag]                 ciflow/trunk/169328         -> ciflow/trunk/169328
2025-12-04T09:33:41.9852571Z  * [new tag]                 ciflow/trunk/169343         -> ciflow/trunk/169343
2025-12-04T09:33:41.9853614Z  * [new tag]                 ciflow/trunk/169355         -> ciflow/trunk/169355
2025-12-04T09:33:41.9854677Z  * [new tag]                 ciflow/trunk/169370         -> ciflow/trunk/169370
2025-12-04T09:33:41.9855944Z  * [new tag]                 ciflow/trunk/169379         -> ciflow/trunk/169379
2025-12-04T09:33:41.9857172Z  * [new tag]                 ciflow/trunk/169380         -> ciflow/trunk/169380
2025-12-04T09:33:41.9858245Z  * [new tag]                 ciflow/trunk/169385         -> ciflow/trunk/169385
2025-12-04T09:33:41.9859990Z  * [new tag]                 ciflow/trunk/169387         -> ciflow/trunk/169387
2025-12-04T09:33:41.9861267Z  * [new tag]                 ciflow/trunk/169410         -> ciflow/trunk/169410
2025-12-04T09:33:41.9862411Z  * [new tag]                 ciflow/trunk/169412         -> ciflow/trunk/169412
2025-12-04T09:33:41.9863454Z  * [new tag]                 ciflow/trunk/169418         -> ciflow/trunk/169418
2025-12-04T09:33:41.9864522Z  * [new tag]                 ciflow/trunk/169423         -> ciflow/trunk/169423
2025-12-04T09:33:41.9865589Z  * [new tag]                 ciflow/trunk/169427         -> ciflow/trunk/169427
2025-12-04T09:33:41.9866731Z  * [new tag]                 ciflow/trunk/169430         -> ciflow/trunk/169430
2025-12-04T09:33:41.9867827Z  * [new tag]                 ciflow/trunk/169437         -> ciflow/trunk/169437
2025-12-04T09:33:41.9868929Z  * [new tag]                 ciflow/trunk/169442         -> ciflow/trunk/169442
2025-12-04T09:33:41.9869999Z  * [new tag]                 ciflow/trunk/169452         -> ciflow/trunk/169452
2025-12-04T09:33:41.9871260Z  * [new tag]                 ciflow/trunk/169454         -> ciflow/trunk/169454
2025-12-04T09:33:41.9872368Z  * [new tag]                 ciflow/trunk/169459         -> ciflow/trunk/169459
2025-12-04T09:33:41.9873651Z  * [new tag]                 ciflow/trunk/169474         -> ciflow/trunk/169474
2025-12-04T09:33:41.9874724Z  * [new tag]                 ciflow/trunk/169475         -> ciflow/trunk/169475
2025-12-04T09:33:41.9875799Z  * [new tag]                 ciflow/trunk/169476         -> ciflow/trunk/169476
2025-12-04T09:33:41.9877092Z  * [new tag]                 ciflow/trunk/169487         -> ciflow/trunk/169487
2025-12-04T09:33:41.9878152Z  * [new tag]                 ciflow/trunk/169497         -> ciflow/trunk/169497
2025-12-04T09:33:41.9879263Z  * [new tag]                 ciflow/trunk/169503         -> ciflow/trunk/169503
2025-12-04T09:33:41.9880310Z  * [new tag]                 ciflow/trunk/169505         -> ciflow/trunk/169505
2025-12-04T09:33:41.9881371Z  * [new tag]                 ciflow/trunk/169507         -> ciflow/trunk/169507
2025-12-04T09:33:41.9882466Z  * [new tag]                 ciflow/trunk/169514         -> ciflow/trunk/169514
2025-12-04T09:33:41.9883699Z  * [new tag]                 ciflow/trunk/169517         -> ciflow/trunk/169517
2025-12-04T09:33:41.9884495Z  * [new tag]                 ciflow/trunk/169519         -> ciflow/trunk/169519
2025-12-04T09:33:41.9885608Z  * [new tag]                 ciflow/trunk/169528         -> ciflow/trunk/169528
2025-12-04T09:33:41.9886566Z  * [new tag]                 ciflow/trunk/169541         -> ciflow/trunk/169541
2025-12-04T09:33:41.9887831Z  * [new tag]                 ciflow/trunk/169555         -> ciflow/trunk/169555
2025-12-04T09:33:41.9889459Z  * [new tag]                 ciflow/unstable/123         -> ciflow/unstable/123
2025-12-04T09:33:41.9890765Z  * [new tag]                 ciflow/vllm/165270          -> ciflow/vllm/165270
2025-12-04T09:33:41.9891729Z  * [new tag]                 ciflow/vllm/165274          -> ciflow/vllm/165274
2025-12-04T09:33:41.9892804Z  * [new tag]                 ciflow/vllm/166494          -> ciflow/vllm/166494
2025-12-04T09:33:41.9893931Z  * [new tag]                 ciflow/vllm/169219          -> ciflow/vllm/169219
2025-12-04T09:33:41.9894920Z  * [new tag]                 ciflow/vllm/169220          -> ciflow/vllm/169220
2025-12-04T09:33:41.9896240Z  * [new tag]                 ciflow/xpu/157994           -> ciflow/xpu/157994
2025-12-04T09:33:41.9897319Z  * [new tag]                 ciflow/xpu/159718           -> ciflow/xpu/159718
2025-12-04T09:33:41.9898234Z  * [new tag]                 ciflow/xpu/161940           -> ciflow/xpu/161940
2025-12-04T09:33:41.9899467Z  * [new tag]                 ciflow/xpu/163251           -> ciflow/xpu/163251
2025-12-04T09:33:41.9900494Z  * [new tag]                 ciflow/xpu/166829           -> ciflow/xpu/166829
2025-12-04T09:33:41.9901452Z  * [new tag]                 ciflow/xpu/166843           -> ciflow/xpu/166843
2025-12-04T09:33:41.9902485Z  * [new tag]                 ciflow/xpu/167972           -> ciflow/xpu/167972
2025-12-04T09:33:41.9903517Z  * [new tag]                 ciflow/xpu/167981           -> ciflow/xpu/167981
2025-12-04T09:33:41.9904551Z  * [new tag]                 ciflow/xpu/168213           -> ciflow/xpu/168213
2025-12-04T09:33:41.9905589Z  * [new tag]                 ciflow/xpu/168262           -> ciflow/xpu/168262
2025-12-04T09:33:41.9906642Z  * [new tag]                 ciflow/xpu/168328           -> ciflow/xpu/168328
2025-12-04T09:33:41.9908122Z  * [new tag]                 ciflow/xpu/168950           -> ciflow/xpu/168950
2025-12-04T09:33:41.9909641Z  * [new tag]                 ciflow/xpu/169039           -> ciflow/xpu/169039
2025-12-04T09:33:41.9911368Z  * [new tag]                 ciflow/xpu/169200           -> ciflow/xpu/169200
2025-12-04T09:33:41.9911837Z  * [new tag]                 ciflow/xpu/169203           -> ciflow/xpu/169203
2025-12-04T09:33:41.9913057Z  * [new tag]                 ciflow/xpu/169230           -> ciflow/xpu/169230
2025-12-04T09:33:41.9914107Z  * [new tag]                 ciflow/xpu/169231           -> ciflow/xpu/169231
2025-12-04T09:33:41.9915365Z  * [new tag]                 ciflow/xpu/169241           -> ciflow/xpu/169241
2025-12-04T09:33:41.9916401Z  * [new tag]                 ciflow/xpu/169280           -> ciflow/xpu/169280
2025-12-04T09:33:41.9917478Z  * [new tag]                 ciflow/xpu/169296           -> ciflow/xpu/169296
2025-12-04T09:33:41.9918733Z  * [new tag]                 ciflow/xpu/169353           -> ciflow/xpu/169353
2025-12-04T09:33:41.9919876Z  * [new tag]                 ciflow/xpu/169410           -> ciflow/xpu/169410
2025-12-04T09:33:41.9920728Z  * [new tag]                 ciflow/xpu/169442           -> ciflow/xpu/169442
2025-12-04T09:33:41.9921959Z  * [new tag]                 ciflow/xpu/169555           -> ciflow/xpu/169555
2025-12-04T09:33:41.9923146Z  * [new tag]                 cslpull75                   -> cslpull75
2025-12-04T09:33:41.9924367Z  * [new tag]                 cslpull76                   -> cslpull76
2025-12-04T09:33:41.9925458Z  * [new tag]                 cslpull77                   -> cslpull77
2025-12-04T09:33:41.9926691Z  * [new tag]                 cslpull78                   -> cslpull78
2025-12-04T09:33:41.9928050Z  * [new tag]                 cslpull79                   -> cslpull79
2025-12-04T09:33:41.9929615Z  * [new tag]                 cslpull80                   -> cslpull80
2025-12-04T09:33:41.9930826Z  * [new tag]                 cslpull81                   -> cslpull81
2025-12-04T09:33:41.9932058Z  * [new tag]                 cslpull82                   -> cslpull82
2025-12-04T09:33:41.9933221Z  * [new tag]                 cslpull83                   -> cslpull83
2025-12-04T09:33:41.9934472Z  * [new tag]                 cslpull84                   -> cslpull84
2025-12-04T09:33:41.9935506Z  * [new tag]                 cslpull85                   -> cslpull85
2025-12-04T09:33:41.9936859Z  * [new tag]                 cslpull86                   -> cslpull86
2025-12-04T09:33:41.9938129Z  * [new tag]                 cslpull87                   -> cslpull87
2025-12-04T09:33:41.9939364Z  * [new tag]                 cslpull88                   -> cslpull88
2025-12-04T09:33:41.9940524Z  * [new tag]                 cslpull89                   -> cslpull89
2025-12-04T09:33:41.9941523Z  * [new tag]                 cslpull90                   -> cslpull90
2025-12-04T09:33:41.9943225Z  * [new tag]                 cslpull91                   -> cslpull91
2025-12-04T09:33:41.9944372Z  * [new tag]                 cslpull92                   -> cslpull92
2025-12-04T09:33:41.9945692Z  * [new tag]                 flight_5                    -> flight_5
2025-12-04T09:33:41.9947093Z  * [new tag]                 flight_5.1                  -> flight_5.1
2025-12-04T09:33:41.9948358Z  * [new tag]                 flight_5.2                  -> flight_5.2
2025-12-04T09:33:41.9949718Z  * [new tag]                 flight_5.3                  -> flight_5.3
2025-12-04T09:33:41.9950885Z  * [new tag]                 forpull1                    -> forpull1
2025-12-04T09:33:41.9952484Z  * [new tag]                 malfet/tag-2ef5611          -> malfet/tag-2ef5611
2025-12-04T09:33:41.9953857Z  * [new tag]                 malfet/tag-317b1a0          -> malfet/tag-317b1a0
2025-12-04T09:33:41.9955067Z  * [new tag]                 malfet/tag-ec6f767          -> malfet/tag-ec6f767
2025-12-04T09:33:41.9956361Z  * [new tag]                 nightly-binary              -> nightly-binary
2025-12-04T09:33:41.9957608Z  * [new tag]                 sqzhang_flight4_plus        -> sqzhang_flight4_plus
2025-12-04T09:33:41.9959073Z  * [new tag]                 sqzhang_flight_3            -> sqzhang_flight_3
2025-12-04T09:33:41.9960723Z  * [new tag]                 trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272 -> trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272
2025-12-04T09:33:41.9961857Z  * [new tag]                 trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e -> trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e
2025-12-04T09:33:41.9963572Z  * [new tag]                 trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88 -> trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88
2025-12-04T09:33:41.9964996Z  * [new tag]                 trunk/07dcc0b83db3211653a38565a24e15acdba75654 -> trunk/07dcc0b83db3211653a38565a24e15acdba75654
2025-12-04T09:33:41.9966306Z  * [new tag]                 trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb -> trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb
2025-12-04T09:33:41.9967306Z  * [new tag]                 trunk/088048f2fea28ff7d450f65c72419ca45780d30b -> trunk/088048f2fea28ff7d450f65c72419ca45780d30b
2025-12-04T09:33:41.9968848Z  * [new tag]                 trunk/09076941a95c76f4d9ad189d064dfd8baa39e672 -> trunk/09076941a95c76f4d9ad189d064dfd8baa39e672
2025-12-04T09:33:41.9970141Z  * [new tag]                 trunk/0b80a4c62b94402844bf221791c096b0035c6d75 -> trunk/0b80a4c62b94402844bf221791c096b0035c6d75
2025-12-04T09:33:41.9971803Z  * [new tag]                 trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2 -> trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2
2025-12-04T09:33:41.9974313Z  * [new tag]                 trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5 -> trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5
2025-12-04T09:33:41.9975692Z  * [new tag]                 trunk/135f3753c418a6879b1954904184937b67e61688 -> trunk/135f3753c418a6879b1954904184937b67e61688
2025-12-04T09:33:41.9976814Z  * [new tag]                 trunk/15da21026cb13cd20257dc9e96830db108743c10 -> trunk/15da21026cb13cd20257dc9e96830db108743c10
2025-12-04T09:33:41.9978118Z  * [new tag]                 trunk/166efdad2ac827f30fb02504c6017520257f88ec -> trunk/166efdad2ac827f30fb02504c6017520257f88ec
2025-12-04T09:33:41.9979084Z  * [new tag]                 trunk/174272c15fae553d8488140af931f7d8050a313f -> trunk/174272c15fae553d8488140af931f7d8050a313f
2025-12-04T09:33:41.9980536Z  * [new tag]                 trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11 -> trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11
2025-12-04T09:33:41.9981540Z  * [new tag]                 trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63 -> trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63
2025-12-04T09:33:41.9982581Z  * [new tag]                 trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5 -> trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5
2025-12-04T09:33:41.9983725Z  * [new tag]                 trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676 -> trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676
2025-12-04T09:33:41.9984793Z  * [new tag]                 trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e -> trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e
2025-12-04T09:33:41.9985955Z  * [new tag]                 trunk/1c87554d74140eaee964ca8b1832cede67f5f520 -> trunk/1c87554d74140eaee964ca8b1832cede67f5f520
2025-12-04T09:33:41.9987035Z  * [new tag]                 trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8 -> trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8
2025-12-04T09:33:41.9988300Z  * [new tag]                 trunk/1cee47d6ce0a02227185b566593f002dd639ca0c -> trunk/1cee47d6ce0a02227185b566593f002dd639ca0c
2025-12-04T09:33:41.9989115Z  * [new tag]                 trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d -> trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d
2025-12-04T09:33:41.9990217Z  * [new tag]                 trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8 -> trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8
2025-12-04T09:33:41.9993514Z  * [new tag]                 trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de -> trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de
2025-12-04T09:33:41.9993987Z  * [new tag]                 trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543 -> trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543
2025-12-04T09:33:41.9994441Z  * [new tag]                 trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7 -> trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7
2025-12-04T09:33:41.9994902Z  * [new tag]                 trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f -> trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f
2025-12-04T09:33:41.9995866Z  * [new tag]                 trunk/285779b1621cf9f073a062b0889a642d200308d9 -> trunk/285779b1621cf9f073a062b0889a642d200308d9
2025-12-04T09:33:41.9996697Z  * [new tag]                 trunk/2887faaec6295d081580d09fce161201826c6d87 -> trunk/2887faaec6295d081580d09fce161201826c6d87
2025-12-04T09:33:41.9997796Z  * [new tag]                 trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc -> trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc
2025-12-04T09:33:41.9999063Z  * [new tag]                 trunk/29856679769b3dede478767e2fe6cfb51197cb25 -> trunk/29856679769b3dede478767e2fe6cfb51197cb25
2025-12-04T09:33:42.0000054Z  * [new tag]                 trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563 -> trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563
2025-12-04T09:33:42.0001178Z  * [new tag]                 trunk/2ac3ef882afb23136adc188975f0a8802fc68adf -> trunk/2ac3ef882afb23136adc188975f0a8802fc68adf
2025-12-04T09:33:42.0002021Z  * [new tag]                 trunk/2bec68e73b64715354af076ad309335f943e36cd -> trunk/2bec68e73b64715354af076ad309335f943e36cd
2025-12-04T09:33:42.0003121Z  * [new tag]                 trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1 -> trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1
2025-12-04T09:33:42.0004292Z  * [new tag]                 trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708 -> trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708
2025-12-04T09:33:42.0005407Z  * [new tag]                 trunk/2df6058f116a65722a0e03073402feb242572d35 -> trunk/2df6058f116a65722a0e03073402feb242572d35
2025-12-04T09:33:42.0006459Z  * [new tag]                 trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec -> trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec
2025-12-04T09:33:42.0007821Z  * [new tag]                 trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94 -> trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94
2025-12-04T09:33:42.0008784Z  * [new tag]                 trunk/305168768a95d69c444df5cd334bb774edfe06f1 -> trunk/305168768a95d69c444df5cd334bb774edfe06f1
2025-12-04T09:33:42.0009831Z  * [new tag]                 trunk/31fc12773026e8e00f054dd79ad9b2491e693b48 -> trunk/31fc12773026e8e00f054dd79ad9b2491e693b48
2025-12-04T09:33:42.0010876Z  * [new tag]                 trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991 -> trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991
2025-12-04T09:33:42.0011942Z  * [new tag]                 trunk/3418bd29475dff06695045fcdf93e7d0dac67da8 -> trunk/3418bd29475dff06695045fcdf93e7d0dac67da8
2025-12-04T09:33:42.0012945Z  * [new tag]                 trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf -> trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf
2025-12-04T09:33:42.0013909Z  * [new tag]                 trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee -> trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee
2025-12-04T09:33:42.0015140Z  * [new tag]                 trunk/39d07dbf03a911bdd45d1af78d8638dc92074938 -> trunk/39d07dbf03a911bdd45d1af78d8638dc92074938
2025-12-04T09:33:42.0015954Z  * [new tag]                 trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725 -> trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725
2025-12-04T09:33:42.0017090Z  * [new tag]                 trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae -> trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae
2025-12-04T09:33:42.0018212Z  * [new tag]                 trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f -> trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f
2025-12-04T09:33:42.0019255Z  * [new tag]                 trunk/42e9005cda22da3f1c559c3649218cebd671027c -> trunk/42e9005cda22da3f1c559c3649218cebd671027c
2025-12-04T09:33:42.0020485Z  * [new tag]                 trunk/43b94713bbf340d3c124fde02d0f73add4021247 -> trunk/43b94713bbf340d3c124fde02d0f73add4021247
2025-12-04T09:33:42.0021485Z  * [new tag]                 trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c -> trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c
2025-12-04T09:33:42.0022601Z  * [new tag]                 trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a -> trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a
2025-12-04T09:33:42.0024245Z  * [new tag]                 trunk/45d310ad84854dff730c0b12e577d7998d978686 -> trunk/45d310ad84854dff730c0b12e577d7998d978686
2025-12-04T09:33:42.0025685Z  * [new tag]                 trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54 -> trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54
2025-12-04T09:33:42.0026463Z  * [new tag]                 trunk/481e5ab336275bd3acd5fa8a611b05b4469012af -> trunk/481e5ab336275bd3acd5fa8a611b05b4469012af
2025-12-04T09:33:42.0027801Z  * [new tag]                 trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96 -> trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96
2025-12-04T09:33:42.0028810Z  * [new tag]                 trunk/49a04d26088acc17d948ddd66920f3e16371e873 -> trunk/49a04d26088acc17d948ddd66920f3e16371e873
2025-12-04T09:33:42.0029875Z  * [new tag]                 trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985 -> trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985
2025-12-04T09:33:42.0030824Z  * [new tag]                 trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f -> trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f
2025-12-04T09:33:42.0032207Z  * [new tag]                 trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa -> trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa
2025-12-04T09:33:42.0033273Z  * [new tag]                 trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c -> trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c
2025-12-04T09:33:42.0035027Z  * [new tag]                 trunk/4fefb8e7e942386ffac764a41b232241f82bea3a -> trunk/4fefb8e7e942386ffac764a41b232241f82bea3a
2025-12-04T09:33:42.0036000Z  * [new tag]                 trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d -> trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d
2025-12-04T09:33:42.0037065Z  * [new tag]                 trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9 -> trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9
2025-12-04T09:33:42.0038140Z  * [new tag]                 trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3 -> trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3
2025-12-04T09:33:42.0039487Z  * [new tag]                 trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a -> trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a
2025-12-04T09:33:42.0040477Z  * [new tag]                 trunk/539ba711b029de9f191070f4f0d12f18f5b7f292 -> trunk/539ba711b029de9f191070f4f0d12f18f5b7f292
2025-12-04T09:33:42.0041573Z  * [new tag]                 trunk/556375b55deebebbc56cb7aef81f4d52f031ba28 -> trunk/556375b55deebebbc56cb7aef81f4d52f031ba28
2025-12-04T09:33:42.0043388Z  * [new tag]                 trunk/55c4ab554845481d0a69a3811937575fe8bb1a66 -> trunk/55c4ab554845481d0a69a3811937575fe8bb1a66
2025-12-04T09:33:42.0044332Z  * [new tag]                 trunk/5634469fda9e5d98869c82c7d03bb08914245f96 -> trunk/5634469fda9e5d98869c82c7d03bb08914245f96
2025-12-04T09:33:42.0045210Z  * [new tag]                 trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc -> trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc
2025-12-04T09:33:42.0046325Z  * [new tag]                 trunk/587d63a3e07de5dc91065f9ef70bcacda9989068 -> trunk/587d63a3e07de5dc91065f9ef70bcacda9989068
2025-12-04T09:33:42.0047351Z  * [new tag]                 trunk/597930f6b568852356ca9795dac76f9e4653adbd -> trunk/597930f6b568852356ca9795dac76f9e4653adbd
2025-12-04T09:33:42.0048261Z  * [new tag]                 trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6 -> trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6
2025-12-04T09:33:42.0049599Z  * [new tag]                 trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883 -> trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883
2025-12-04T09:33:42.0050558Z  * [new tag]                 trunk/5a607febc04c3a2b5824c75f3f60307867439a2c -> trunk/5a607febc04c3a2b5824c75f3f60307867439a2c
2025-12-04T09:33:42.0051676Z  * [new tag]                 trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b -> trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b
2025-12-04T09:33:42.0052540Z  * [new tag]                 trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c -> trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c
2025-12-04T09:33:42.0053587Z  * [new tag]                 trunk/5f21d27e71268464d362a96c9ac09ea475f7f202 -> trunk/5f21d27e71268464d362a96c9ac09ea475f7f202
2025-12-04T09:33:42.0054701Z  * [new tag]                 trunk/5fafc13038c9988d9ac21fa793fbd5890604b447 -> trunk/5fafc13038c9988d9ac21fa793fbd5890604b447
2025-12-04T09:33:42.0055849Z  * [new tag]                 trunk/61be54a31dc09b59d99b62176fb935aee0b924ef -> trunk/61be54a31dc09b59d99b62176fb935aee0b924ef
2025-12-04T09:33:42.0056979Z  * [new tag]                 trunk/62d3ccd71484ed6a760d909b41487101bbc65719 -> trunk/62d3ccd71484ed6a760d909b41487101bbc65719
2025-12-04T09:33:42.0058297Z  * [new tag]                 trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b -> trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b
2025-12-04T09:33:42.0059254Z  * [new tag]                 trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a -> trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a
2025-12-04T09:33:42.0060782Z  * [new tag]                 trunk/66004b993744b4106bf8afaba71f3c228a804206 -> trunk/66004b993744b4106bf8afaba71f3c228a804206
2025-12-04T09:33:42.0061413Z  * [new tag]                 trunk/6658a04c7ca67acb64512341342e7b3ee13ee386 -> trunk/6658a04c7ca67acb64512341342e7b3ee13ee386
2025-12-04T09:33:42.0062439Z  * [new tag]                 trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4 -> trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4
2025-12-04T09:33:42.0063606Z  * [new tag]                 trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d -> trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d
2025-12-04T09:33:42.0064581Z  * [new tag]                 trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b -> trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b
2025-12-04T09:33:42.0065634Z  * [new tag]                 trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5 -> trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5
2025-12-04T09:33:42.0066744Z  * [new tag]                 trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8 -> trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8
2025-12-04T09:33:42.0067993Z  * [new tag]                 trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec -> trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec
2025-12-04T09:33:42.0068954Z  * [new tag]                 trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71 -> trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71
2025-12-04T09:33:42.0070233Z  * [new tag]                 trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d -> trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d
2025-12-04T09:33:42.0071364Z  * [new tag]                 trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a -> trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a
2025-12-04T09:33:42.0072900Z  * [new tag]                 trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e -> trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e
2025-12-04T09:33:42.0073859Z  * [new tag]                 trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8 -> trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8
2025-12-04T09:33:42.0074921Z  * [new tag]                 trunk/70d797a5fc109b20a517646fcaa819477cd0d485 -> trunk/70d797a5fc109b20a517646fcaa819477cd0d485
2025-12-04T09:33:42.0075940Z  * [new tag]                 trunk/7348cb355ff0a6f79cd4871215aea72185748734 -> trunk/7348cb355ff0a6f79cd4871215aea72185748734
2025-12-04T09:33:42.0077040Z  * [new tag]                 trunk/74fe26a1ebe32931783569f2e762e3c2c974901f -> trunk/74fe26a1ebe32931783569f2e762e3c2c974901f
2025-12-04T09:33:42.0078158Z  * [new tag]                 trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696 -> trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696
2025-12-04T09:33:42.0079053Z  * [new tag]                 trunk/7716da9fb23f27a65b41f9f016a2afadf281c18f -> trunk/7716da9fb23f27a65b41f9f016a2afadf281c18f
2025-12-04T09:33:42.0080146Z  * [new tag]                 trunk/7741edd4ed665f3988052e260863efb508d61a03 -> trunk/7741edd4ed665f3988052e260863efb508d61a03
2025-12-04T09:33:42.0081387Z  * [new tag]                 trunk/78adb3b3df41b45d2368b67226d2f864b78939a6 -> trunk/78adb3b3df41b45d2368b67226d2f864b78939a6
2025-12-04T09:33:42.0082368Z  * [new tag]                 trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7 -> trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7
2025-12-04T09:33:42.0083261Z  * [new tag]                 trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3 -> trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3
2025-12-04T09:33:42.0084286Z  * [new tag]                 trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca -> trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca
2025-12-04T09:33:42.0085312Z  * [new tag]                 trunk/7b7af390ea8541c611d1ce2018a6934188fc197b -> trunk/7b7af390ea8541c611d1ce2018a6934188fc197b
2025-12-04T09:33:42.0086414Z  * [new tag]                 trunk/7ba4680f3755a560af81aa0f688791e367aa3609 -> trunk/7ba4680f3755a560af81aa0f688791e367aa3609
2025-12-04T09:33:42.0087674Z  * [new tag]                 trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b -> trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b
2025-12-04T09:33:42.0088462Z  * [new tag]                 trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9
2025-12-04T09:33:42.0089423Z  * [new tag]                 trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8 -> trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8
2025-12-04T09:33:42.0090514Z  * [new tag]                 trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed -> trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed
2025-12-04T09:33:42.0091705Z  * [new tag]                 trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8 -> trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8
2025-12-04T09:33:42.0092711Z  * [new tag]                 trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e -> trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e
2025-12-04T09:33:42.0093646Z  * [new tag]                 trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead -> trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead
2025-12-04T09:33:42.0095372Z  * [new tag]                 trunk/81af382128efa094d8702e18f2c133760904c718 -> trunk/81af382128efa094d8702e18f2c133760904c718
2025-12-04T09:33:42.0096885Z  * [new tag]                 trunk/84149583d483e9c973c9a0feda70e4f3964947b0 -> trunk/84149583d483e9c973c9a0feda70e4f3964947b0
2025-12-04T09:33:42.0098353Z  * [new tag]                 trunk/85a315917efe82c24306be805c584ec044951c75 -> trunk/85a315917efe82c24306be805c584ec044951c75
2025-12-04T09:33:42.0099354Z  * [new tag]                 trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece -> trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece
2025-12-04T09:33:42.0100247Z  * [new tag]                 trunk/892640e25aeefa8007c5af837214b4502b6b62a6 -> trunk/892640e25aeefa8007c5af837214b4502b6b62a6
2025-12-04T09:33:42.0101594Z  * [new tag]                 trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4 -> trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4
2025-12-04T09:33:42.0102527Z  * [new tag]                 trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c -> trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c
2025-12-04T09:33:42.0103625Z  * [new tag]                 trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43 -> trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43
2025-12-04T09:33:42.0104897Z  * [new tag]                 trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922 -> trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922
2025-12-04T09:33:42.0105968Z  * [new tag]                 trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca -> trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca
2025-12-04T09:33:42.0107039Z  * [new tag]                 trunk/90b27e7e8352cde97d32ddad24740ef819633f38 -> trunk/90b27e7e8352cde97d32ddad24740ef819633f38
2025-12-04T09:33:42.0107914Z  * [new tag]                 trunk/90f0139e64b2951815d524b6a373bed20c4fbf90 -> trunk/90f0139e64b2951815d524b6a373bed20c4fbf90
2025-12-04T09:33:42.0115058Z  * [new tag]                 trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c -> trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c
2025-12-04T09:33:42.0115692Z  * [new tag]                 trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87 -> trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87
2025-12-04T09:33:42.0116163Z  * [new tag]                 trunk/9844fbeadd5cebdf1281d6fbf79164139c352693 -> trunk/9844fbeadd5cebdf1281d6fbf79164139c352693
2025-12-04T09:33:42.0116634Z  * [new tag]                 trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa -> trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa
2025-12-04T09:33:42.0117084Z  * [new tag]                 trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d -> trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d
2025-12-04T09:33:42.0117539Z  * [new tag]                 trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639 -> trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639
2025-12-04T09:33:42.0118001Z  * [new tag]                 trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8 -> trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8
2025-12-04T09:33:42.0118460Z  * [new tag]                 trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d -> trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d
2025-12-04T09:33:42.0118931Z  * [new tag]                 trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a -> trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a
2025-12-04T09:33:42.0119373Z  * [new tag]                 trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742 -> trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742
2025-12-04T09:33:42.0119842Z  * [new tag]                 trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098 -> trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098
2025-12-04T09:33:42.0120703Z  * [new tag]                 trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa -> trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa
2025-12-04T09:33:42.0122121Z  * [new tag]                 trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d -> trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d
2025-12-04T09:33:42.0122901Z  * [new tag]                 trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c -> trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c
2025-12-04T09:33:42.0124004Z  * [new tag]                 trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90 -> trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90
2025-12-04T09:33:42.0125104Z  * [new tag]                 trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c -> trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c
2025-12-04T09:33:42.0125991Z  * [new tag]                 trunk/a7dc6dab9ad911259d4801c502907e531594db45 -> trunk/a7dc6dab9ad911259d4801c502907e531594db45
2025-12-04T09:33:42.0127233Z  * [new tag]                 trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109 -> trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109
2025-12-04T09:33:42.0128242Z  * [new tag]                 trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e -> trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e
2025-12-04T09:33:42.0129466Z  * [new tag]                 trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e -> trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e
2025-12-04T09:33:42.0130326Z  * [new tag]                 trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e -> trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e
2025-12-04T09:33:42.0131238Z  * [new tag]                 trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48 -> trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48
2025-12-04T09:33:42.0132611Z  * [new tag]                 trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62 -> trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62
2025-12-04T09:33:42.0133708Z  * [new tag]                 trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2 -> trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2
2025-12-04T09:33:42.0134767Z  * [new tag]                 trunk/b39813b4a04931682b0491adba2138d01d716d99 -> trunk/b39813b4a04931682b0491adba2138d01d716d99
2025-12-04T09:33:42.0135831Z  * [new tag]                 trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24 -> trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24
2025-12-04T09:33:42.0137535Z  * [new tag]                 trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7 -> trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7
2025-12-04T09:33:42.0138221Z  * [new tag]                 trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a -> trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a
2025-12-04T09:33:42.0139464Z  * [new tag]                 trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417 -> trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417
2025-12-04T09:33:42.0140568Z  * [new tag]                 trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4 -> trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4
2025-12-04T09:33:42.0141667Z  * [new tag]                 trunk/b7d60685f8cbc939b68a20871e90db67e729329b -> trunk/b7d60685f8cbc939b68a20871e90db67e729329b
2025-12-04T09:33:42.0142779Z  * [new tag]                 trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e -> trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e
2025-12-04T09:33:42.0144118Z  * [new tag]                 trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf -> trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf
2025-12-04T09:33:42.0144982Z  * [new tag]                 trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5 -> trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5
2025-12-04T09:33:42.0146057Z  * [new tag]                 trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f -> trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f
2025-12-04T09:33:42.0147174Z  * [new tag]                 trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f -> trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f
2025-12-04T09:33:42.0148230Z  * [new tag]                 trunk/bb3034198b459401fabeab254e1b99f0115046e2 -> trunk/bb3034198b459401fabeab254e1b99f0115046e2
2025-12-04T09:33:42.0149542Z  * [new tag]                 trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55 -> trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55
2025-12-04T09:33:42.0150912Z  * [new tag]                 trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8 -> trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8
2025-12-04T09:33:42.0151846Z  * [new tag]                 trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09 -> trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09
2025-12-04T09:33:42.0152906Z  * [new tag]                 trunk/bea4912944defdbcb8b061800caab6cbbbd01df5 -> trunk/bea4912944defdbcb8b061800caab6cbbbd01df5
2025-12-04T09:33:42.0154472Z  * [new tag]                 trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564 -> trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564
2025-12-04T09:33:42.0155473Z  * [new tag]                 trunk/c0660bcee27e7d7731634e274576a7081882bede -> trunk/c0660bcee27e7d7731634e274576a7081882bede
2025-12-04T09:33:42.0156586Z  * [new tag]                 trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac -> trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac
2025-12-04T09:33:42.0157696Z  * [new tag]                 trunk/c55b1e8f61d041ee436d697449eb028931d574fb -> trunk/c55b1e8f61d041ee436d697449eb028931d574fb
2025-12-04T09:33:42.0158678Z  * [new tag]                 trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1 -> trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1
2025-12-04T09:33:42.0160162Z  * [new tag]                 trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0 -> trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0
2025-12-04T09:33:42.0161213Z  * [new tag]                 trunk/cc0853af42122f8185321f542616f4474e717f09 -> trunk/cc0853af42122f8185321f542616f4474e717f09
2025-12-04T09:33:42.0162243Z  * [new tag]                 trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9 -> trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9
2025-12-04T09:33:42.0163366Z  * [new tag]                 trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a -> trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a
2025-12-04T09:33:42.0164481Z  * [new tag]                 trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace -> trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace
2025-12-04T09:33:42.0165488Z  * [new tag]                 trunk/d16447dacaf2420ea175f0c275c75da951f57d39 -> trunk/d16447dacaf2420ea175f0c275c75da951f57d39
2025-12-04T09:33:42.0167229Z  * [new tag]                 trunk/d19f1e8cab6810bb2e99141f9976665954c67a50 -> trunk/d19f1e8cab6810bb2e99141f9976665954c67a50
2025-12-04T09:33:42.0168224Z  * [new tag]                 trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01 -> trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01
2025-12-04T09:33:42.0169564Z  * [new tag]                 trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf -> trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf
2025-12-04T09:33:42.0170524Z  * [new tag]                 trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8 -> trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8
2025-12-04T09:33:42.0171594Z  * [new tag]                 trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d -> trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d
2025-12-04T09:33:42.0173073Z  * [new tag]                 trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47 -> trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47
2025-12-04T09:33:42.0174031Z  * [new tag]                 trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1 -> trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1
2025-12-04T09:33:42.0175129Z  * [new tag]                 trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e -> trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e
2025-12-04T09:33:42.0176169Z  * [new tag]                 trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a -> trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a
2025-12-04T09:33:42.0177712Z  * [new tag]                 trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b -> trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b
2025-12-04T09:33:42.0178653Z  * [new tag]                 trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec -> trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec
2025-12-04T09:33:42.0179892Z  * [new tag]                 trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf -> trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf
2025-12-04T09:33:42.0180981Z  * [new tag]                 trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd -> trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd
2025-12-04T09:33:42.0181945Z  * [new tag]                 trunk/dd18a75336a4fbd7497955cc5665904724fce889 -> trunk/dd18a75336a4fbd7497955cc5665904724fce889
2025-12-04T09:33:42.0183020Z  * [new tag]                 trunk/ded9bcd61a059bf723e6e84689552962b480ea77 -> trunk/ded9bcd61a059bf723e6e84689552962b480ea77
2025-12-04T09:33:42.0184143Z  * [new tag]                 trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c -> trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c
2025-12-04T09:33:42.0185519Z  * [new tag]                 trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b -> trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b
2025-12-04T09:33:42.0186305Z  * [new tag]                 trunk/e3f24fd73ad74c6e7176687986436956c7c18235 -> trunk/e3f24fd73ad74c6e7176687986436956c7c18235
2025-12-04T09:33:42.0187599Z  * [new tag]                 trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e -> trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e
2025-12-04T09:33:42.0188650Z  * [new tag]                 trunk/ea7035f462a0d2830865ee86c832bd101e1427fc -> trunk/ea7035f462a0d2830865ee86c832bd101e1427fc
2025-12-04T09:33:42.0189723Z  * [new tag]                 trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3 -> trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3
2025-12-04T09:33:42.0190857Z  * [new tag]                 trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf -> trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf
2025-12-04T09:33:42.0191988Z  * [new tag]                 trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e -> trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e
2025-12-04T09:33:42.0193113Z  * [new tag]                 trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e -> trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e
2025-12-04T09:33:42.0194772Z  * [new tag]                 trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2 -> trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2
2025-12-04T09:33:42.0195730Z  * [new tag]                 trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4 -> trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4
2025-12-04T09:33:42.0196805Z  * [new tag]                 trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53 -> trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53
2025-12-04T09:33:42.0197896Z  * [new tag]                 trunk/f1076f5510920044912247b1abb8760cb820f598 -> trunk/f1076f5510920044912247b1abb8760cb820f598
2025-12-04T09:33:42.0199036Z  * [new tag]                 trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40 -> trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40
2025-12-04T09:33:42.0200095Z  * [new tag]                 trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56 -> trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56
2025-12-04T09:33:42.0201186Z  * [new tag]                 trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8 -> trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8
2025-12-04T09:33:42.0202176Z  * [new tag]                 trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467 -> trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467
2025-12-04T09:33:42.0203266Z  * [new tag]                 trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17 -> trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17
2025-12-04T09:33:42.0204335Z  * [new tag]                 trunk/f7e1bd80a063e17453c361837ba6ea2570920a73 -> trunk/f7e1bd80a063e17453c361837ba6ea2570920a73
2025-12-04T09:33:42.0205288Z  * [new tag]                 trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7 -> trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7
2025-12-04T09:33:42.0206643Z  * [new tag]                 trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b -> trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b
2025-12-04T09:33:42.0207631Z  * [new tag]                 trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7 -> trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7
2025-12-04T09:33:42.0209438Z  * [new tag]                 trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307 -> trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307
2025-12-04T09:33:42.0210383Z  * [new tag]                 trunk/fec710bf89173f5355468a7ce1afe9157c3d9009 -> trunk/fec710bf89173f5355468a7ce1afe9157c3d9009
2025-12-04T09:33:42.0211658Z  * [new tag]                 trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 -> trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:33:42.0212359Z  * [new tag]                 v0.1.1                      -> v0.1.1
2025-12-04T09:33:42.0213637Z  * [new tag]                 v0.1.10                     -> v0.1.10
2025-12-04T09:33:42.0214668Z  * [new tag]                 v0.1.11                     -> v0.1.11
2025-12-04T09:33:42.0215714Z  * [new tag]                 v0.1.12                     -> v0.1.12
2025-12-04T09:33:42.0216824Z  * [new tag]                 v0.1.2                      -> v0.1.2
2025-12-04T09:33:42.0217808Z  * [new tag]                 v0.1.3                      -> v0.1.3
2025-12-04T09:33:42.0218744Z  * [new tag]                 v0.1.4                      -> v0.1.4
2025-12-04T09:33:42.0219738Z  * [new tag]                 v0.1.5                      -> v0.1.5
2025-12-04T09:33:42.0220694Z  * [new tag]                 v0.1.6                      -> v0.1.6
2025-12-04T09:33:42.0221682Z  * [new tag]                 v0.1.7                      -> v0.1.7
2025-12-04T09:33:42.0222473Z  * [new tag]                 v0.1.8                      -> v0.1.8
2025-12-04T09:33:42.0223614Z  * [new tag]                 v0.1.9                      -> v0.1.9
2025-12-04T09:33:42.0224613Z  * [new tag]                 v0.2.0                      -> v0.2.0
2025-12-04T09:33:42.0225675Z  * [new tag]                 v0.3.0                      -> v0.3.0
2025-12-04T09:33:42.0226777Z  * [new tag]                 v0.3.1                      -> v0.3.1
2025-12-04T09:33:42.0227761Z  * [new tag]                 v0.4.0                      -> v0.4.0
2025-12-04T09:33:42.0228612Z  * [new tag]                 v0.4.1                      -> v0.4.1
2025-12-04T09:33:42.0229731Z  * [new tag]                 v1.0.0                      -> v1.0.0
2025-12-04T09:33:42.0230792Z  * [new tag]                 v1.0.0a0                    -> v1.0.0a0
2025-12-04T09:33:42.0231623Z  * [new tag]                 v1.0.1                      -> v1.0.1
2025-12-04T09:33:42.0232780Z  * [new tag]                 v1.0rc0                     -> v1.0rc0
2025-12-04T09:33:42.0233457Z  * [new tag]                 v1.0rc1                     -> v1.0rc1
2025-12-04T09:33:42.0234590Z  * [new tag]                 v1.1.0                      -> v1.1.0
2025-12-04T09:33:42.0235853Z  * [new tag]                 v1.1.0a0                    -> v1.1.0a0
2025-12-04T09:33:42.0237086Z  * [new tag]                 v1.10.0                     -> v1.10.0
2025-12-04T09:33:42.0238186Z  * [new tag]                 v1.10.0-rc1                 -> v1.10.0-rc1
2025-12-04T09:33:42.0239278Z  * [new tag]                 v1.10.0-rc2                 -> v1.10.0-rc2
2025-12-04T09:33:42.0239955Z  * [new tag]                 v1.10.0-rc3                 -> v1.10.0-rc3
2025-12-04T09:33:42.0241134Z  * [new tag]                 v1.10.1                     -> v1.10.1
2025-12-04T09:33:42.0241830Z  * [new tag]                 v1.10.1-rc1                 -> v1.10.1-rc1
2025-12-04T09:33:42.0242805Z  * [new tag]                 v1.10.2                     -> v1.10.2
2025-12-04T09:33:42.0244123Z  * [new tag]                 v1.10.2-rc1                 -> v1.10.2-rc1
2025-12-04T09:33:42.0245201Z  * [new tag]                 v1.11.0                     -> v1.11.0
2025-12-04T09:33:42.0246482Z  * [new tag]                 v1.11.0-rc1                 -> v1.11.0-rc1
2025-12-04T09:33:42.0247612Z  * [new tag]                 v1.11.0-rc2                 -> v1.11.0-rc2
2025-12-04T09:33:42.0248738Z  * [new tag]                 v1.11.0-rc3                 -> v1.11.0-rc3
2025-12-04T09:33:42.0249841Z  * [new tag]                 v1.11.0-rc4                 -> v1.11.0-rc4
2025-12-04T09:33:42.0250925Z  * [new tag]                 v1.11.0-rc5                 -> v1.11.0-rc5
2025-12-04T09:33:42.0251644Z  * [new tag]                 v1.11.0-rc6                 -> v1.11.0-rc6
2025-12-04T09:33:42.0252489Z  * [new tag]                 v1.11.0-rc7                 -> v1.11.0-rc7
2025-12-04T09:33:42.0253803Z  * [new tag]                 v1.12.0                     -> v1.12.0
2025-12-04T09:33:42.0254603Z  * [new tag]                 v1.12.0-rc1                 -> v1.12.0-rc1
2025-12-04T09:33:42.0255791Z  * [new tag]                 v1.12.0-rc2                 -> v1.12.0-rc2
2025-12-04T09:33:42.0257004Z  * [new tag]                 v1.12.0-rc3                 -> v1.12.0-rc3
2025-12-04T09:33:42.0258230Z  * [new tag]                 v1.12.0-rc4                 -> v1.12.0-rc4
2025-12-04T09:33:42.0259235Z  * [new tag]                 v1.12.0-rc5                 -> v1.12.0-rc5
2025-12-04T09:33:42.0260421Z  * [new tag]                 v1.12.0-rc6                 -> v1.12.0-rc6
2025-12-04T09:33:42.0261181Z  * [new tag]                 v1.12.0-rc7                 -> v1.12.0-rc7
2025-12-04T09:33:42.0261940Z  * [new tag]                 v1.12.0-rc8                 -> v1.12.0-rc8
2025-12-04T09:33:42.0262766Z  * [new tag]                 v1.12.1                     -> v1.12.1
2025-12-04T09:33:42.0264066Z  * [new tag]                 v1.12.1-rc1                 -> v1.12.1-rc1
2025-12-04T09:33:42.0265168Z  * [new tag]                 v1.12.1-rc2                 -> v1.12.1-rc2
2025-12-04T09:33:42.0266244Z  * [new tag]                 v1.12.1-rc3                 -> v1.12.1-rc3
2025-12-04T09:33:42.0267323Z  * [new tag]                 v1.12.1-rc4                 -> v1.12.1-rc4
2025-12-04T09:33:42.0268045Z  * [new tag]                 v1.12.1-rc5                 -> v1.12.1-rc5
2025-12-04T09:33:42.0269289Z  * [new tag]                 v1.13.0                     -> v1.13.0
2025-12-04T09:33:42.0270309Z  * [new tag]                 v1.13.0-rc1                 -> v1.13.0-rc1
2025-12-04T09:33:42.0271499Z  * [new tag]                 v1.13.0-rc2                 -> v1.13.0-rc2
2025-12-04T09:33:42.0272754Z  * [new tag]                 v1.13.0-rc3                 -> v1.13.0-rc3
2025-12-04T09:33:42.0274040Z  * [new tag]                 v1.13.0-rc4                 -> v1.13.0-rc4
2025-12-04T09:33:42.0274699Z  * [new tag]                 v1.13.0-rc5                 -> v1.13.0-rc5
2025-12-04T09:33:42.0275675Z  * [new tag]                 v1.13.0-rc6                 -> v1.13.0-rc6
2025-12-04T09:33:42.0276782Z  * [new tag]                 v1.13.1                     -> v1.13.1
2025-12-04T09:33:42.0277545Z  * [new tag]                 v1.13.1-rc1                 -> v1.13.1-rc1
2025-12-04T09:33:42.0278627Z  * [new tag]                 v1.2.0                      -> v1.2.0
2025-12-04T09:33:42.0279793Z  * [new tag]                 v1.2.0a0                    -> v1.2.0a0
2025-12-04T09:33:42.0280822Z  * [new tag]                 v1.3.0                      -> v1.3.0
2025-12-04T09:33:42.0281854Z  * [new tag]                 v1.3.0a0                    -> v1.3.0a0
2025-12-04T09:33:42.0282557Z  * [new tag]                 v1.3.1                      -> v1.3.1
2025-12-04T09:33:42.0283659Z  * [new tag]                 v1.4.0                      -> v1.4.0
2025-12-04T09:33:42.0284651Z  * [new tag]                 v1.4.0a0                    -> v1.4.0a0
2025-12-04T09:33:42.0285470Z  * [new tag]                 v1.4.1                      -> v1.4.1
2025-12-04T09:33:42.0286602Z  * [new tag]                 v1.5.0                      -> v1.5.0
2025-12-04T09:33:42.0287742Z  * [new tag]                 v1.5.0-rc1                  -> v1.5.0-rc1
2025-12-04T09:33:42.0288856Z  * [new tag]                 v1.5.0-rc2                  -> v1.5.0-rc2
2025-12-04T09:33:42.0289978Z  * [new tag]                 v1.5.0-rc3                  -> v1.5.0-rc3
2025-12-04T09:33:42.0290820Z  * [new tag]                 v1.5.0-rc4                  -> v1.5.0-rc4
2025-12-04T09:33:42.0291636Z  * [new tag]                 v1.5.0-rc5                  -> v1.5.0-rc5
2025-12-04T09:33:42.0292893Z  * [new tag]                 v1.5.1                      -> v1.5.1
2025-12-04T09:33:42.0293623Z  * [new tag]                 v1.5.1-rc1                  -> v1.5.1-rc1
2025-12-04T09:33:42.0294411Z  * [new tag]                 v1.6.0                      -> v1.6.0
2025-12-04T09:33:42.0295544Z  * [new tag]                 v1.6.0-rc1                  -> v1.6.0-rc1
2025-12-04T09:33:42.0296943Z  * [new tag]                 v1.6.0-rc2                  -> v1.6.0-rc2
2025-12-04T09:33:42.0297928Z  * [new tag]                 v1.6.0-rc3                  -> v1.6.0-rc3
2025-12-04T09:33:42.0298940Z  * [new tag]                 v1.6.0-rc4                  -> v1.6.0-rc4
2025-12-04T09:33:42.0300014Z  * [new tag]                 v1.6.0-rc5                  -> v1.6.0-rc5
2025-12-04T09:33:42.0301012Z  * [new tag]                 v1.6.0-rc6                  -> v1.6.0-rc6
2025-12-04T09:33:42.0301727Z  * [new tag]                 v1.6.0-rc7                  -> v1.6.0-rc7
2025-12-04T09:33:42.0302957Z  * [new tag]                 v1.7.0                      -> v1.7.0
2025-12-04T09:33:42.0304028Z  * [new tag]                 v1.7.0-rc1                  -> v1.7.0-rc1
2025-12-04T09:33:42.0305217Z  * [new tag]                 v1.7.0-rc2                  -> v1.7.0-rc2
2025-12-04T09:33:42.0306357Z  * [new tag]                 v1.7.0-rc3                  -> v1.7.0-rc3
2025-12-04T09:33:42.0307056Z  * [new tag]                 v1.7.0-rc4                  -> v1.7.0-rc4
2025-12-04T09:33:42.0308222Z  * [new tag]                 v1.7.1                      -> v1.7.1
2025-12-04T09:33:42.0309500Z  * [new tag]                 v1.7.1-rc1                  -> v1.7.1-rc1
2025-12-04T09:33:42.0311232Z  * [new tag]                 v1.7.1-rc2                  -> v1.7.1-rc2
2025-12-04T09:33:42.0311926Z  * [new tag]                 v1.7.1-rc3                  -> v1.7.1-rc3
2025-12-04T09:33:42.0313261Z  * [new tag]                 v1.8.0                      -> v1.8.0
2025-12-04T09:33:42.0313960Z  * [new tag]                 v1.8.0-rc1                  -> v1.8.0-rc1
2025-12-04T09:33:42.0315211Z  * [new tag]                 v1.8.0-rc2                  -> v1.8.0-rc2
2025-12-04T09:33:42.0316244Z  * [new tag]                 v1.8.0-rc3                  -> v1.8.0-rc3
2025-12-04T09:33:42.0317240Z  * [new tag]                 v1.8.0-rc4                  -> v1.8.0-rc4
2025-12-04T09:33:42.0317987Z  * [new tag]                 v1.8.0-rc5                  -> v1.8.0-rc5
2025-12-04T09:33:42.0318807Z  * [new tag]                 v1.8.1                      -> v1.8.1
2025-12-04T09:33:42.0320050Z  * [new tag]                 v1.8.1-rc1                  -> v1.8.1-rc1
2025-12-04T09:33:42.0320736Z  * [new tag]                 v1.8.1-rc2                  -> v1.8.1-rc2
2025-12-04T09:33:42.0321548Z  * [new tag]                 v1.8.1-rc3                  -> v1.8.1-rc3
2025-12-04T09:33:42.0323337Z  * [new tag]                 v1.8.2                      -> v1.8.2
2025-12-04T09:33:42.0324036Z  * [new tag]                 v1.8.2-rc1                  -> v1.8.2-rc1
2025-12-04T09:33:42.0325270Z  * [new tag]                 v1.9.0                      -> v1.9.0
2025-12-04T09:33:42.0326349Z  * [new tag]                 v1.9.0-rc1                  -> v1.9.0-rc1
2025-12-04T09:33:42.0327507Z  * [new tag]                 v1.9.0-rc2                  -> v1.9.0-rc2
2025-12-04T09:33:42.0328539Z  * [new tag]                 v1.9.0-rc3                  -> v1.9.0-rc3
2025-12-04T09:33:42.0329282Z  * [new tag]                 v1.9.0-rc4                  -> v1.9.0-rc4
2025-12-04T09:33:42.0330506Z  * [new tag]                 v1.9.1                      -> v1.9.1
2025-12-04T09:33:42.0331744Z  * [new tag]                 v1.9.1-rc1                  -> v1.9.1-rc1
2025-12-04T09:33:42.0332584Z  * [new tag]                 v1.9.1-rc2                  -> v1.9.1-rc2
2025-12-04T09:33:42.0333662Z  * [new tag]                 v2.0.0                      -> v2.0.0
2025-12-04T09:33:42.0334681Z  * [new tag]                 v2.0.0-rc1                  -> v2.0.0-rc1
2025-12-04T09:33:42.0335764Z  * [new tag]                 v2.0.0-rc2                  -> v2.0.0-rc2
2025-12-04T09:33:42.0336951Z  * [new tag]                 v2.0.0-rc3                  -> v2.0.0-rc3
2025-12-04T09:33:42.0338017Z  * [new tag]                 v2.0.0-rc4                  -> v2.0.0-rc4
2025-12-04T09:33:42.0339097Z  * [new tag]                 v2.0.0-rc5                  -> v2.0.0-rc5
2025-12-04T09:33:42.0340010Z  * [new tag]                 v2.0.0-rc6                  -> v2.0.0-rc6
2025-12-04T09:33:42.0341053Z  * [new tag]                 v2.0.1                      -> v2.0.1
2025-12-04T09:33:42.0342169Z  * [new tag]                 v2.0.1-rc1                  -> v2.0.1-rc1
2025-12-04T09:33:42.0342915Z  * [new tag]                 v2.0.1-rc2                  -> v2.0.1-rc2
2025-12-04T09:33:42.0344085Z  * [new tag]                 v2.0.1-rc3                  -> v2.0.1-rc3
2025-12-04T09:33:42.0344723Z  * [new tag]                 v2.0.1-rc4                  -> v2.0.1-rc4
2025-12-04T09:33:42.0346596Z  * [new tag]                 v2.1.0                      -> v2.1.0
2025-12-04T09:33:42.0347648Z  * [new tag]                 v2.1.0-rc1                  -> v2.1.0-rc1
2025-12-04T09:33:42.0348812Z  * [new tag]                 v2.1.0-rc2                  -> v2.1.0-rc2
2025-12-04T09:33:42.0349911Z  * [new tag]                 v2.1.0-rc3                  -> v2.1.0-rc3
2025-12-04T09:33:42.0350987Z  * [new tag]                 v2.1.0-rc4                  -> v2.1.0-rc4
2025-12-04T09:33:42.0352089Z  * [new tag]                 v2.1.0-rc5                  -> v2.1.0-rc5
2025-12-04T09:33:42.0352870Z  * [new tag]                 v2.1.0-rc6                  -> v2.1.0-rc6
2025-12-04T09:33:42.0354155Z  * [new tag]                 v2.1.1                      -> v2.1.1
2025-12-04T09:33:42.0355345Z  * [new tag]                 v2.1.1-rc1                  -> v2.1.1-rc1
2025-12-04T09:33:42.0356368Z  * [new tag]                 v2.1.1-rc2                  -> v2.1.1-rc2
2025-12-04T09:33:42.0357628Z  * [new tag]                 v2.1.1-rc3                  -> v2.1.1-rc3
2025-12-04T09:33:42.0358655Z  * [new tag]                 v2.1.1-rc4                  -> v2.1.1-rc4
2025-12-04T09:33:42.0359751Z  * [new tag]                 v2.1.1-rc5                  -> v2.1.1-rc5
2025-12-04T09:33:42.0360552Z  * [new tag]                 v2.1.1-rc6                  -> v2.1.1-rc6
2025-12-04T09:33:42.0361624Z  * [new tag]                 v2.1.2                      -> v2.1.2
2025-12-04T09:33:42.0362775Z  * [new tag]                 v2.1.2-rc1                  -> v2.1.2-rc1
2025-12-04T09:33:42.0363862Z  * [new tag]                 v2.1.2-rc2                  -> v2.1.2-rc2
2025-12-04T09:33:42.0364604Z  * [new tag]                 v2.1.2-rc3                  -> v2.1.2-rc3
2025-12-04T09:33:42.0365849Z  * [new tag]                 v2.2.0                      -> v2.2.0
2025-12-04T09:33:42.0366827Z  * [new tag]                 v2.2.0-rc1                  -> v2.2.0-rc1
2025-12-04T09:33:42.0367923Z  * [new tag]                 v2.2.0-rc2                  -> v2.2.0-rc2
2025-12-04T09:33:42.0369409Z  * [new tag]                 v2.2.0-rc3                  -> v2.2.0-rc3
2025-12-04T09:33:42.0370500Z  * [new tag]                 v2.2.0-rc4                  -> v2.2.0-rc4
2025-12-04T09:33:42.0371395Z  * [new tag]                 v2.2.0-rc5                  -> v2.2.0-rc5
2025-12-04T09:33:42.0372663Z  * [new tag]                 v2.2.0-rc6                  -> v2.2.0-rc6
2025-12-04T09:33:42.0373444Z  * [new tag]                 v2.2.0-rc7                  -> v2.2.0-rc7
2025-12-04T09:33:42.0374291Z  * [new tag]                 v2.2.0-rc8                  -> v2.2.0-rc8
2025-12-04T09:33:42.0375503Z  * [new tag]                 v2.2.1                      -> v2.2.1
2025-12-04T09:33:42.0376742Z  * [new tag]                 v2.2.1-rc1                  -> v2.2.1-rc1
2025-12-04T09:33:42.0377520Z  * [new tag]                 v2.2.1-rc2                  -> v2.2.1-rc2
2025-12-04T09:33:42.0379024Z  * [new tag]                 v2.2.1-rc3                  -> v2.2.1-rc3
2025-12-04T09:33:42.0379755Z  * [new tag]                 v2.2.2                      -> v2.2.2
2025-12-04T09:33:42.0381220Z  * [new tag]                 v2.2.2-rc1                  -> v2.2.2-rc1
2025-12-04T09:33:42.0381956Z  * [new tag]                 v2.2.2-rc2                  -> v2.2.2-rc2
2025-12-04T09:33:42.0382982Z  * [new tag]                 v2.2.2-rc3                  -> v2.2.2-rc3
2025-12-04T09:33:42.0384188Z  * [new tag]                 v2.3.0                      -> v2.3.0
2025-12-04T09:33:42.0385003Z  * [new tag]                 v2.3.0-rc1                  -> v2.3.0-rc1
2025-12-04T09:33:42.0386267Z  * [new tag]                 v2.3.0-rc10                 -> v2.3.0-rc10
2025-12-04T09:33:42.0387417Z  * [new tag]                 v2.3.0-rc11                 -> v2.3.0-rc11
2025-12-04T09:33:42.0388158Z  * [new tag]                 v2.3.0-rc12                 -> v2.3.0-rc12
2025-12-04T09:33:42.0389308Z  * [new tag]                 v2.3.0-rc2                  -> v2.3.0-rc2
2025-12-04T09:33:42.0390411Z  * [new tag]                 v2.3.0-rc3                  -> v2.3.0-rc3
2025-12-04T09:33:42.0391506Z  * [new tag]                 v2.3.0-rc4                  -> v2.3.0-rc4
2025-12-04T09:33:42.0392493Z  * [new tag]                 v2.3.0-rc5                  -> v2.3.0-rc5
2025-12-04T09:33:42.0393306Z  * [new tag]                 v2.3.0-rc6                  -> v2.3.0-rc6
2025-12-04T09:33:42.0394470Z  * [new tag]                 v2.3.0-rc7                  -> v2.3.0-rc7
2025-12-04T09:33:42.0395552Z  * [new tag]                 v2.3.0-rc8                  -> v2.3.0-rc8
2025-12-04T09:33:42.0396286Z  * [new tag]                 v2.3.0-rc9                  -> v2.3.0-rc9
2025-12-04T09:33:42.0397076Z  * [new tag]                 v2.3.1                      -> v2.3.1
2025-12-04T09:33:42.0398330Z  * [new tag]                 v2.3.1-rc1                  -> v2.3.1-rc1
2025-12-04T09:33:42.0399379Z  * [new tag]                 v2.3.1-rc2                  -> v2.3.1-rc2
2025-12-04T09:33:42.0400470Z  * [new tag]                 v2.3.1-rc3                  -> v2.3.1-rc3
2025-12-04T09:33:42.0401571Z  * [new tag]                 v2.4.0                      -> v2.4.0
2025-12-04T09:33:42.0402681Z  * [new tag]                 v2.4.0-rc1                  -> v2.4.0-rc1
2025-12-04T09:33:42.0403739Z  * [new tag]                 v2.4.0-rc2                  -> v2.4.0-rc2
2025-12-04T09:33:42.0404742Z  * [new tag]                 v2.4.0-rc3                  -> v2.4.0-rc3
2025-12-04T09:33:42.0405789Z  * [new tag]                 v2.4.0-rc4                  -> v2.4.0-rc4
2025-12-04T09:33:42.0406878Z  * [new tag]                 v2.4.0-rc5                  -> v2.4.0-rc5
2025-12-04T09:33:42.0408032Z  * [new tag]                 v2.4.0-rc6                  -> v2.4.0-rc6
2025-12-04T09:33:42.0409073Z  * [new tag]                 v2.4.0-rc7                  -> v2.4.0-rc7
2025-12-04T09:33:42.0410090Z  * [new tag]                 v2.4.0-rc8                  -> v2.4.0-rc8
2025-12-04T09:33:42.0411237Z  * [new tag]                 v2.4.0-rc9                  -> v2.4.0-rc9
2025-12-04T09:33:42.0412201Z  * [new tag]                 v2.4.1                      -> v2.4.1
2025-12-04T09:33:42.0413366Z  * [new tag]                 v2.4.1-rc1                  -> v2.4.1-rc1
2025-12-04T09:33:42.0414559Z  * [new tag]                 v2.4.1-rc2                  -> v2.4.1-rc2
2025-12-04T09:33:42.0415660Z  * [new tag]                 v2.4.1-rc3                  -> v2.4.1-rc3
2025-12-04T09:33:42.0416856Z  * [new tag]                 v2.5.0                      -> v2.5.0
2025-12-04T09:33:42.0417904Z  * [new tag]                 v2.5.0-rc1                  -> v2.5.0-rc1
2025-12-04T09:33:42.0418678Z  * [new tag]                 v2.5.0-rc10                 -> v2.5.0-rc10
2025-12-04T09:33:42.0419776Z  * [new tag]                 v2.5.0-rc2                  -> v2.5.0-rc2
2025-12-04T09:33:42.0420862Z  * [new tag]                 v2.5.0-rc3                  -> v2.5.0-rc3
2025-12-04T09:33:42.0421926Z  * [new tag]                 v2.5.0-rc4                  -> v2.5.0-rc4
2025-12-04T09:33:42.0422994Z  * [new tag]                 v2.5.0-rc5                  -> v2.5.0-rc5
2025-12-04T09:33:42.0424182Z  * [new tag]                 v2.5.0-rc6                  -> v2.5.0-rc6
2025-12-04T09:33:42.0425249Z  * [new tag]                 v2.5.0-rc7                  -> v2.5.0-rc7
2025-12-04T09:33:42.0426348Z  * [new tag]                 v2.5.0-rc8                  -> v2.5.0-rc8
2025-12-04T09:33:42.0427520Z  * [new tag]                 v2.5.0-rc9                  -> v2.5.0-rc9
2025-12-04T09:33:42.0428175Z  * [new tag]                 v2.5.1                      -> v2.5.1
2025-12-04T09:33:42.0429005Z  * [new tag]                 v2.5.1-rc1                  -> v2.5.1-rc1
2025-12-04T09:33:42.0429812Z  * [new tag]                 v2.6.0                      -> v2.6.0
2025-12-04T09:33:42.0431029Z  * [new tag]                 v2.6.0-rc1                  -> v2.6.0-rc1
2025-12-04T09:33:42.0432239Z  * [new tag]                 v2.6.0-rc2                  -> v2.6.0-rc2
2025-12-04T09:33:42.0433328Z  * [new tag]                 v2.6.0-rc3                  -> v2.6.0-rc3
2025-12-04T09:33:42.0434337Z  * [new tag]                 v2.6.0-rc4                  -> v2.6.0-rc4
2025-12-04T09:33:42.0435651Z  * [new tag]                 v2.6.0-rc5                  -> v2.6.0-rc5
2025-12-04T09:33:42.0436857Z  * [new tag]                 v2.6.0-rc6                  -> v2.6.0-rc6
2025-12-04T09:33:42.0437979Z  * [new tag]                 v2.6.0-rc7                  -> v2.6.0-rc7
2025-12-04T09:33:42.0439168Z  * [new tag]                 v2.6.0-rc8                  -> v2.6.0-rc8
2025-12-04T09:33:42.0440275Z  * [new tag]                 v2.6.0-rc9                  -> v2.6.0-rc9
2025-12-04T09:33:42.0441580Z  * [new tag]                 v2.7.0                      -> v2.7.0
2025-12-04T09:33:42.0442659Z  * [new tag]                 v2.7.0-rc1                  -> v2.7.0-rc1
2025-12-04T09:33:42.0443419Z  * [new tag]                 v2.7.0-rc10                 -> v2.7.0-rc10
2025-12-04T09:33:42.0444705Z  * [new tag]                 v2.7.0-rc2                  -> v2.7.0-rc2
2025-12-04T09:33:42.0445879Z  * [new tag]                 v2.7.0-rc3                  -> v2.7.0-rc3
2025-12-04T09:33:42.0447575Z  * [new tag]                 v2.7.0-rc4                  -> v2.7.0-rc4
2025-12-04T09:33:42.0448613Z  * [new tag]                 v2.7.0-rc5                  -> v2.7.0-rc5
2025-12-04T09:33:42.0449764Z  * [new tag]                 v2.7.0-rc6                  -> v2.7.0-rc6
2025-12-04T09:33:42.0450899Z  * [new tag]                 v2.7.0-rc7                  -> v2.7.0-rc7
2025-12-04T09:33:42.0452037Z  * [new tag]                 v2.7.0-rc8                  -> v2.7.0-rc8
2025-12-04T09:33:42.0453235Z  * [new tag]                 v2.7.0-rc9                  -> v2.7.0-rc9
2025-12-04T09:33:42.0453993Z  * [new tag]                 v2.7.1                      -> v2.7.1
2025-12-04T09:33:42.0455190Z  * [new tag]                 v2.7.1-rc1                  -> v2.7.1-rc1
2025-12-04T09:33:42.0456419Z  * [new tag]                 v2.7.1-rc2                  -> v2.7.1-rc2
2025-12-04T09:33:42.0457648Z  * [new tag]                 v2.7.1-rc3                  -> v2.7.1-rc3
2025-12-04T09:33:42.0458780Z  * [new tag]                 v2.7.1-rc4                  -> v2.7.1-rc4
2025-12-04T09:33:42.0459892Z  * [new tag]                 v2.7.1-rc5                  -> v2.7.1-rc5
2025-12-04T09:33:42.0460725Z  * [new tag]                 v2.8.0                      -> v2.8.0
2025-12-04T09:33:42.0461956Z  * [new tag]                 v2.8.0-rc1                  -> v2.8.0-rc1
2025-12-04T09:33:42.0462995Z  * [new tag]                 v2.8.0-rc2                  -> v2.8.0-rc2
2025-12-04T09:33:42.0464337Z  * [new tag]                 v2.8.0-rc3                  -> v2.8.0-rc3
2025-12-04T09:33:42.0465459Z  * [new tag]                 v2.8.0-rc4                  -> v2.8.0-rc4
2025-12-04T09:33:42.0466564Z  * [new tag]                 v2.8.0-rc5                  -> v2.8.0-rc5
2025-12-04T09:33:42.0467730Z  * [new tag]                 v2.8.0-rc6                  -> v2.8.0-rc6
2025-12-04T09:33:42.0468834Z  * [new tag]                 v2.8.0-rc7                  -> v2.8.0-rc7
2025-12-04T09:33:42.0469907Z  * [new tag]                 v2.8.0-rc8                  -> v2.8.0-rc8
2025-12-04T09:33:42.0471172Z  * [new tag]                 v2.9.0                      -> v2.9.0
2025-12-04T09:33:42.0472338Z  * [new tag]                 v2.9.0-rc1                  -> v2.9.0-rc1
2025-12-04T09:33:42.0473576Z  * [new tag]                 v2.9.0-rc10                 -> v2.9.0-rc10
2025-12-04T09:33:42.0474591Z  * [new tag]                 v2.9.0-rc11                 -> v2.9.0-rc11
2025-12-04T09:33:42.0475920Z  * [new tag]                 v2.9.0-rc2                  -> v2.9.0-rc2
2025-12-04T09:33:42.0477048Z  * [new tag]                 v2.9.0-rc3                  -> v2.9.0-rc3
2025-12-04T09:33:42.0478183Z  * [new tag]                 v2.9.0-rc4                  -> v2.9.0-rc4
2025-12-04T09:33:42.0479333Z  * [new tag]                 v2.9.0-rc5                  -> v2.9.0-rc5
2025-12-04T09:33:42.0480659Z  * [new tag]                 v2.9.0-rc6                  -> v2.9.0-rc6
2025-12-04T09:33:42.0481769Z  * [new tag]                 v2.9.0-rc7                  -> v2.9.0-rc7
2025-12-04T09:33:42.0483081Z  * [new tag]                 v2.9.0-rc8                  -> v2.9.0-rc8
2025-12-04T09:33:42.0483909Z  * [new tag]                 v2.9.0-rc9                  -> v2.9.0-rc9
2025-12-04T09:33:42.0484829Z  * [new tag]                 v2.9.1                      -> v2.9.1
2025-12-04T09:33:42.0486068Z  * [new tag]                 v2.9.1-rc1                  -> v2.9.1-rc1
2025-12-04T09:33:42.0487229Z  * [new tag]                 v2.9.1-rc2                  -> v2.9.1-rc2
2025-12-04T09:33:42.0488844Z  * [new tag]                 viable/strict/1759343184    -> viable/strict/1759343184
2025-12-04T09:33:42.0489818Z  * [new tag]                 viable/strict/1759346540    -> viable/strict/1759346540
2025-12-04T09:33:42.0490902Z  * [new tag]                 viable/strict/1759348181    -> viable/strict/1759348181
2025-12-04T09:33:42.0491937Z  * [new tag]                 viable/strict/1759350324    -> viable/strict/1759350324
2025-12-04T09:33:42.0492950Z  * [new tag]                 viable/strict/1759351793    -> viable/strict/1759351793
2025-12-04T09:33:42.0493954Z  * [new tag]                 viable/strict/1759353844    -> viable/strict/1759353844
2025-12-04T09:33:42.0494952Z  * [new tag]                 viable/strict/1759355374    -> viable/strict/1759355374
2025-12-04T09:33:42.0495940Z  * [new tag]                 viable/strict/1759357472    -> viable/strict/1759357472
2025-12-04T09:33:42.0497042Z  * [new tag]                 viable/strict/1759361002    -> viable/strict/1759361002
2025-12-04T09:33:42.0498397Z  * [new tag]                 viable/strict/1759362585    -> viable/strict/1759362585
2025-12-04T09:33:42.0499712Z  * [new tag]                 viable/strict/1759365359    -> viable/strict/1759365359
2025-12-04T09:33:42.0500790Z  * [new tag]                 viable/strict/1759370089    -> viable/strict/1759370089
2025-12-04T09:33:42.0501868Z  * [new tag]                 viable/strict/1759377554    -> viable/strict/1759377554
2025-12-04T09:33:42.0502941Z  * [new tag]                 viable/strict/1759379133    -> viable/strict/1759379133
2025-12-04T09:33:42.0503999Z  * [new tag]                 viable/strict/1759389871    -> viable/strict/1759389871
2025-12-04T09:33:42.0505061Z  * [new tag]                 viable/strict/1759393562    -> viable/strict/1759393562
2025-12-04T09:33:42.0506175Z  * [new tag]                 viable/strict/1759395076    -> viable/strict/1759395076
2025-12-04T09:33:42.0507304Z  * [new tag]                 viable/strict/1759398579    -> viable/strict/1759398579
2025-12-04T09:33:42.0508343Z  * [new tag]                 viable/strict/1759404142    -> viable/strict/1759404142
2025-12-04T09:33:42.0509377Z  * [new tag]                 viable/strict/1759405773    -> viable/strict/1759405773
2025-12-04T09:33:42.0510403Z  * [new tag]                 viable/strict/1759408041    -> viable/strict/1759408041
2025-12-04T09:33:42.0511439Z  * [new tag]                 viable/strict/1759411593    -> viable/strict/1759411593
2025-12-04T09:33:42.0512482Z  * [new tag]                 viable/strict/1759427395    -> viable/strict/1759427395
2025-12-04T09:33:42.0513577Z  * [new tag]                 viable/strict/1759434582    -> viable/strict/1759434582
2025-12-04T09:33:42.0514641Z  * [new tag]                 viable/strict/1759436720    -> viable/strict/1759436720
2025-12-04T09:33:42.0515761Z  * [new tag]                 viable/strict/1759440219    -> viable/strict/1759440219
2025-12-04T09:33:42.0516736Z  * [new tag]                 viable/strict/1759441948    -> viable/strict/1759441948
2025-12-04T09:33:42.0517729Z  * [new tag]                 viable/strict/1759443860    -> viable/strict/1759443860
2025-12-04T09:33:42.0518842Z  * [new tag]                 viable/strict/1759445377    -> viable/strict/1759445377
2025-12-04T09:33:42.0519982Z  * [new tag]                 viable/strict/1759447415    -> viable/strict/1759447415
2025-12-04T09:33:42.0520999Z  * [new tag]                 viable/strict/1759451750    -> viable/strict/1759451750
2025-12-04T09:33:42.0522049Z  * [new tag]                 viable/strict/1759453910    -> viable/strict/1759453910
2025-12-04T09:33:42.0523144Z  * [new tag]                 viable/strict/1759456483    -> viable/strict/1759456483
2025-12-04T09:33:42.0524221Z  * [new tag]                 viable/strict/1759459279    -> viable/strict/1759459279
2025-12-04T09:33:42.0525284Z  * [new tag]                 viable/strict/1759460742    -> viable/strict/1759460742
2025-12-04T09:33:42.0526316Z  * [new tag]                 viable/strict/1759462025    -> viable/strict/1759462025
2025-12-04T09:33:42.0527428Z  * [new tag]                 viable/strict/1759469086    -> viable/strict/1759469086
2025-12-04T09:33:42.0528472Z  * [new tag]                 viable/strict/1759470581    -> viable/strict/1759470581
2025-12-04T09:33:42.0529504Z  * [new tag]                 viable/strict/1759472786    -> viable/strict/1759472786
2025-12-04T09:33:42.0530610Z  * [new tag]                 viable/strict/1759476294    -> viable/strict/1759476294
2025-12-04T09:33:42.0531626Z  * [new tag]                 viable/strict/1759479963    -> viable/strict/1759479963
2025-12-04T09:33:42.0532660Z  * [new tag]                 viable/strict/1759492177    -> viable/strict/1759492177
2025-12-04T09:33:42.0533693Z  * [new tag]                 viable/strict/1759519278    -> viable/strict/1759519278
2025-12-04T09:33:42.0534733Z  * [new tag]                 viable/strict/1759524580    -> viable/strict/1759524580
2025-12-04T09:33:42.0535726Z  * [new tag]                 viable/strict/1759528193    -> viable/strict/1759528193
2025-12-04T09:33:42.0537119Z  * [new tag]                 viable/strict/1759533797    -> viable/strict/1759533797
2025-12-04T09:33:42.0538180Z  * [new tag]                 viable/strict/1759542780    -> viable/strict/1759542780
2025-12-04T09:33:42.0539277Z  * [new tag]                 viable/strict/1759549779    -> viable/strict/1759549779
2025-12-04T09:33:42.0540343Z  * [new tag]                 viable/strict/1759555455    -> viable/strict/1759555455
2025-12-04T09:33:42.0541371Z  * [new tag]                 viable/strict/1759559176    -> viable/strict/1759559176
2025-12-04T09:33:42.0542948Z  * [new tag]                 viable/strict/1759560629    -> viable/strict/1759560629
2025-12-04T09:33:42.0543967Z  * [new tag]                 viable/strict/1759569848    -> viable/strict/1759569848
2025-12-04T09:33:42.0545262Z  * [new tag]                 viable/strict/1759571382    -> viable/strict/1759571382
2025-12-04T09:33:42.0546259Z  * [new tag]                 viable/strict/1759573474    -> viable/strict/1759573474
2025-12-04T09:33:42.0547315Z  * [new tag]                 viable/strict/1759618187    -> viable/strict/1759618187
2025-12-04T09:33:42.0548378Z  * [new tag]                 viable/strict/1759626742    -> viable/strict/1759626742
2025-12-04T09:33:42.0549479Z  * [new tag]                 viable/strict/1759632427    -> viable/strict/1759632427
2025-12-04T09:33:42.0550554Z  * [new tag]                 viable/strict/1759634971    -> viable/strict/1759634971
2025-12-04T09:33:42.0551645Z  * [new tag]                 viable/strict/1759661382    -> viable/strict/1759661382
2025-12-04T09:33:42.0552735Z  * [new tag]                 viable/strict/1759663294    -> viable/strict/1759663294
2025-12-04T09:33:42.0553749Z  * [new tag]                 viable/strict/1759708178    -> viable/strict/1759708178
2025-12-04T09:33:42.0554943Z  * [new tag]                 viable/strict/1759715695    -> viable/strict/1759715695
2025-12-04T09:33:42.0555940Z  * [new tag]                 viable/strict/1759728293    -> viable/strict/1759728293
2025-12-04T09:33:42.0556988Z  * [new tag]                 viable/strict/1759735513    -> viable/strict/1759735513
2025-12-04T09:33:42.0558089Z  * [new tag]                 viable/strict/1759739177    -> viable/strict/1759739177
2025-12-04T09:33:42.0559113Z  * [new tag]                 viable/strict/1759758635    -> viable/strict/1759758635
2025-12-04T09:33:42.0560222Z  * [new tag]                 viable/strict/1759765784    -> viable/strict/1759765784
2025-12-04T09:33:42.0561229Z  * [new tag]                 viable/strict/1759767948    -> viable/strict/1759767948
2025-12-04T09:33:42.0562319Z  * [new tag]                 viable/strict/1759771461    -> viable/strict/1759771461
2025-12-04T09:33:42.0563140Z  * [new tag]                 viable/strict/1759776706    -> viable/strict/1759776706
2025-12-04T09:33:42.0564304Z  * [new tag]                 viable/strict/1759782317    -> viable/strict/1759782317
2025-12-04T09:33:42.0565468Z  * [new tag]                 viable/strict/1759783777    -> viable/strict/1759783777
2025-12-04T09:33:42.0566651Z  * [new tag]                 viable/strict/1759785815    -> viable/strict/1759785815
2025-12-04T09:33:42.0567768Z  * [new tag]                 viable/strict/1759789459    -> viable/strict/1759789459
2025-12-04T09:33:42.0568804Z  * [new tag]                 viable/strict/1759790974    -> viable/strict/1759790974
2025-12-04T09:33:42.0569634Z  * [new tag]                 viable/strict/1759794583    -> viable/strict/1759794583
2025-12-04T09:33:42.0570847Z  * [new tag]                 viable/strict/1759797408    -> viable/strict/1759797408
2025-12-04T09:33:42.0572232Z  * [new tag]                 viable/strict/1759799518    -> viable/strict/1759799518
2025-12-04T09:33:42.0573283Z  * [new tag]                 viable/strict/1759804909    -> viable/strict/1759804909
2025-12-04T09:33:42.0574405Z  * [new tag]                 viable/strict/1759807643    -> viable/strict/1759807643
2025-12-04T09:33:42.0575455Z  * [new tag]                 viable/strict/1759809089    -> viable/strict/1759809089
2025-12-04T09:33:42.0576555Z  * [new tag]                 viable/strict/1759811145    -> viable/strict/1759811145
2025-12-04T09:33:42.0577693Z  * [new tag]                 viable/strict/1759812581    -> viable/strict/1759812581
2025-12-04T09:33:42.0578744Z  * [new tag]                 viable/strict/1759814683    -> viable/strict/1759814683
2025-12-04T09:33:42.0579848Z  * [new tag]                 viable/strict/1759821889    -> viable/strict/1759821889
2025-12-04T09:33:42.0580971Z  * [new tag]                 viable/strict/1759823376    -> viable/strict/1759823376
2025-12-04T09:33:42.0581994Z  * [new tag]                 viable/strict/1759827107    -> viable/strict/1759827107
2025-12-04T09:33:42.0583042Z  * [new tag]                 viable/strict/1759830577    -> viable/strict/1759830577
2025-12-04T09:33:42.0584199Z  * [new tag]                 viable/strict/1759832720    -> viable/strict/1759832720
2025-12-04T09:33:42.0585218Z  * [new tag]                 viable/strict/1759842063    -> viable/strict/1759842063
2025-12-04T09:33:42.0586279Z  * [new tag]                 viable/strict/1759847121    -> viable/strict/1759847121
2025-12-04T09:33:42.0587631Z  * [new tag]                 viable/strict/1759850721    -> viable/strict/1759850721
2025-12-04T09:33:42.0588770Z  * [new tag]                 viable/strict/1759857870    -> viable/strict/1759857870
2025-12-04T09:33:42.0589847Z  * [new tag]                 viable/strict/1759863143    -> viable/strict/1759863143
2025-12-04T09:33:42.0590880Z  * [new tag]                 viable/strict/1759875874    -> viable/strict/1759875874
2025-12-04T09:33:42.0591674Z  * [new tag]                 viable/strict/1759877385    -> viable/strict/1759877385
2025-12-04T09:33:42.0592854Z  * [new tag]                 viable/strict/1759883801    -> viable/strict/1759883801
2025-12-04T09:33:42.0594096Z  * [new tag]                 viable/strict/1759885922    -> viable/strict/1759885922
2025-12-04T09:33:42.0595060Z  * [new tag]                 viable/strict/1759888488    -> viable/strict/1759888488
2025-12-04T09:33:42.0596063Z  * [new tag]                 viable/strict/1759895471    -> viable/strict/1759895471
2025-12-04T09:33:42.0597073Z  * [new tag]                 viable/strict/1759904803    -> viable/strict/1759904803
2025-12-04T09:33:42.0598340Z  * [new tag]                 viable/strict/1759908300    -> viable/strict/1759908300
2025-12-04T09:33:42.0599490Z  * [new tag]                 viable/strict/1759915520    -> viable/strict/1759915520
2025-12-04T09:33:42.0600531Z  * [new tag]                 viable/strict/1759916978    -> viable/strict/1759916978
2025-12-04T09:33:42.0601360Z  * [new tag]                 viable/strict/1759930024    -> viable/strict/1759930024
2025-12-04T09:33:42.0602503Z  * [new tag]                 viable/strict/1759948122    -> viable/strict/1759948122
2025-12-04T09:33:42.0603616Z  * [new tag]                 viable/strict/1759952983    -> viable/strict/1759952983
2025-12-04T09:33:42.0604700Z  * [new tag]                 viable/strict/1759955121    -> viable/strict/1759955121
2025-12-04T09:33:42.0605748Z  * [new tag]                 viable/strict/1759962298    -> viable/strict/1759962298
2025-12-04T09:33:42.0606867Z  * [new tag]                 viable/strict/1759965837    -> viable/strict/1759965837
2025-12-04T09:33:42.0607995Z  * [new tag]                 viable/strict/1759970213    -> viable/strict/1759970213
2025-12-04T09:33:42.0609001Z  * [new tag]                 viable/strict/1759974894    -> viable/strict/1759974894
2025-12-04T09:33:42.0610021Z  * [new tag]                 viable/strict/1759977763    -> viable/strict/1759977763
2025-12-04T09:33:42.0611160Z  * [new tag]                 viable/strict/1759979241    -> viable/strict/1759979241
2025-12-04T09:33:42.0612726Z  * [new tag]                 viable/strict/1759985417    -> viable/strict/1759985417
2025-12-04T09:33:42.0613809Z  * [new tag]                 viable/strict/1759987490    -> viable/strict/1759987490
2025-12-04T09:33:42.0614864Z  * [new tag]                 viable/strict/1759996180    -> viable/strict/1759996180
2025-12-04T09:33:42.0615871Z  * [new tag]                 viable/strict/1760065682    -> viable/strict/1760065682
2025-12-04T09:33:42.0617093Z  * [new tag]                 viable/strict/1760066894    -> viable/strict/1760066894
2025-12-04T09:33:42.0618200Z  * [new tag]                 viable/strict/1760070345    -> viable/strict/1760070345
2025-12-04T09:33:42.0619416Z  * [new tag]                 viable/strict/1760089782    -> viable/strict/1760089782
2025-12-04T09:33:42.0620499Z  * [new tag]                 viable/strict/1760091921    -> viable/strict/1760091921
2025-12-04T09:33:42.0621543Z  * [new tag]                 viable/strict/1760127924    -> viable/strict/1760127924
2025-12-04T09:33:42.0622612Z  * [new tag]                 viable/strict/1760129489    -> viable/strict/1760129489
2025-12-04T09:33:42.0623839Z  * [new tag]                 viable/strict/1760132980    -> viable/strict/1760132980
2025-12-04T09:33:42.0624982Z  * [new tag]                 viable/strict/1760135060    -> viable/strict/1760135060
2025-12-04T09:33:42.0626023Z  * [new tag]                 viable/strict/1760215782    -> viable/strict/1760215782
2025-12-04T09:33:42.0627104Z  * [new tag]                 viable/strict/1760273849    -> viable/strict/1760273849
2025-12-04T09:33:42.0628139Z  * [new tag]                 viable/strict/1760275517    -> viable/strict/1760275517
2025-12-04T09:33:42.0629216Z  * [new tag]                 viable/strict/1760276979    -> viable/strict/1760276979
2025-12-04T09:33:42.0630306Z  * [new tag]                 viable/strict/1760279007    -> viable/strict/1760279007
2025-12-04T09:33:42.0631174Z  * [new tag]                 viable/strict/1760286328    -> viable/strict/1760286328
2025-12-04T09:33:42.0632058Z  * [new tag]                 viable/strict/1760493304    -> viable/strict/1760493304
2025-12-04T09:33:42.0633223Z  * [new tag]                 viable/strict/1760496298    -> viable/strict/1760496298
2025-12-04T09:33:42.0634225Z  * [new tag]                 viable/strict/1760518396    -> viable/strict/1760518396
2025-12-04T09:33:42.0635273Z  * [new tag]                 viable/strict/1760534864    -> viable/strict/1760534864
2025-12-04T09:33:42.0636302Z  * [new tag]                 viable/strict/1760549062    -> viable/strict/1760549062
2025-12-04T09:33:42.0637491Z  * [new tag]                 viable/strict/1760552799    -> viable/strict/1760552799
2025-12-04T09:33:42.0638555Z  * [new tag]                 viable/strict/1760554355    -> viable/strict/1760554355
2025-12-04T09:33:42.0639662Z  * [new tag]                 viable/strict/1760556275    -> viable/strict/1760556275
2025-12-04T09:33:42.0640746Z  * [new tag]                 viable/strict/1760564979    -> viable/strict/1760564979
2025-12-04T09:33:42.0641932Z  * [new tag]                 viable/strict/1760567049    -> viable/strict/1760567049
2025-12-04T09:33:42.0643421Z  * [new tag]                 viable/strict/1760568585    -> viable/strict/1760568585
2025-12-04T09:33:42.0644424Z  * [new tag]                 viable/strict/1760570630    -> viable/strict/1760570630
2025-12-04T09:33:42.0645496Z  * [new tag]                 viable/strict/1760572180    -> viable/strict/1760572180
2025-12-04T09:33:42.0646542Z  * [new tag]                 viable/strict/1760575094    -> viable/strict/1760575094
2025-12-04T09:33:42.0647724Z  * [new tag]                 viable/strict/1760579709    -> viable/strict/1760579709
2025-12-04T09:33:42.0649287Z  * [new tag]                 viable/strict/1760582614    -> viable/strict/1760582614
2025-12-04T09:33:42.0650390Z  * [new tag]                 viable/strict/1760586815    -> viable/strict/1760586815
2025-12-04T09:33:42.0651239Z  * [new tag]                 viable/strict/1760588829    -> viable/strict/1760588829
2025-12-04T09:33:42.0652344Z  * [new tag]                 viable/strict/1760590200    -> viable/strict/1760590200
2025-12-04T09:33:42.0653492Z  * [new tag]                 viable/strict/1760592311    -> viable/strict/1760592311
2025-12-04T09:33:42.0654534Z  * [new tag]                 viable/strict/1760619733    -> viable/strict/1760619733
2025-12-04T09:33:42.0655390Z  * [new tag]                 viable/strict/1760628335    -> viable/strict/1760628335
2025-12-04T09:33:42.0656524Z  * [new tag]                 viable/strict/1760635490    -> viable/strict/1760635490
2025-12-04T09:33:42.0657702Z  * [new tag]                 viable/strict/1760640743    -> viable/strict/1760640743
2025-12-04T09:33:42.0658855Z  * [new tag]                 viable/strict/1760642528    -> viable/strict/1760642528
2025-12-04T09:33:42.0659993Z  * [new tag]                 viable/strict/1760646330    -> viable/strict/1760646330
2025-12-04T09:33:42.0660857Z  * [new tag]                 viable/strict/1760666101    -> viable/strict/1760666101
2025-12-04T09:33:42.0662010Z  * [new tag]                 viable/strict/1760668990    -> viable/strict/1760668990
2025-12-04T09:33:42.0663057Z  * [new tag]                 viable/strict/1760670600    -> viable/strict/1760670600
2025-12-04T09:33:42.0664145Z  * [new tag]                 viable/strict/1760671704    -> viable/strict/1760671704
2025-12-04T09:33:42.0665205Z  * [new tag]                 viable/strict/1760673121    -> viable/strict/1760673121
2025-12-04T09:33:42.0666253Z  * [new tag]                 viable/strict/1760675352    -> viable/strict/1760675352
2025-12-04T09:33:42.0667347Z  * [new tag]                 viable/strict/1760696731    -> viable/strict/1760696731
2025-12-04T09:33:42.0670082Z  * [new tag]                 viable/strict/1760723515    -> viable/strict/1760723515
2025-12-04T09:33:42.0671292Z  * [new tag]                 viable/strict/1760727234    -> viable/strict/1760727234
2025-12-04T09:33:42.0672472Z  * [new tag]                 viable/strict/1760730578    -> viable/strict/1760730578
2025-12-04T09:33:42.0673570Z  * [new tag]                 viable/strict/1760732726    -> viable/strict/1760732726
2025-12-04T09:33:42.0674814Z  * [new tag]                 viable/strict/1760734180    -> viable/strict/1760734180
2025-12-04T09:33:42.0675653Z  * [new tag]                 viable/strict/1760736251    -> viable/strict/1760736251
2025-12-04T09:33:42.0676807Z  * [new tag]                 viable/strict/1760737772    -> viable/strict/1760737772
2025-12-04T09:33:42.0677930Z  * [new tag]                 viable/strict/1760758005    -> viable/strict/1760758005
2025-12-04T09:33:42.0678948Z  * [new tag]                 viable/strict/1760761532    -> viable/strict/1760761532
2025-12-04T09:33:42.0680043Z  * [new tag]                 viable/strict/1760802581    -> viable/strict/1760802581
2025-12-04T09:33:42.0681103Z  * [new tag]                 viable/strict/1760827772    -> viable/strict/1760827772
2025-12-04T09:33:42.0682143Z  * [new tag]                 viable/strict/1760834524    -> viable/strict/1760834524
2025-12-04T09:33:42.0683275Z  * [new tag]                 viable/strict/1760845009    -> viable/strict/1760845009
2025-12-04T09:33:42.0684852Z  * [new tag]                 viable/strict/1760876836    -> viable/strict/1760876836
2025-12-04T09:33:42.0685978Z  * [new tag]                 viable/strict/1760880329    -> viable/strict/1760880329
2025-12-04T09:33:42.0687045Z  * [new tag]                 viable/strict/1760888987    -> viable/strict/1760888987
2025-12-04T09:33:42.0687974Z  * [new tag]                 viable/strict/1760912664    -> viable/strict/1760912664
2025-12-04T09:33:42.0689104Z  * [new tag]                 viable/strict/1760925321    -> viable/strict/1760925321
2025-12-04T09:33:42.0690147Z  * [new tag]                 viable/strict/1760931488    -> viable/strict/1760931488
2025-12-04T09:33:42.0691261Z  * [new tag]                 viable/strict/1760932693    -> viable/strict/1760932693
2025-12-04T09:33:42.0692313Z  * [new tag]                 viable/strict/1761004184    -> viable/strict/1761004184
2025-12-04T09:33:42.0693374Z  * [new tag]                 viable/strict/1761014748    -> viable/strict/1761014748
2025-12-04T09:33:42.0694472Z  * [new tag]                 viable/strict/1761017491    -> viable/strict/1761017491
2025-12-04T09:33:42.0695617Z  * [new tag]                 viable/strict/1761018806    -> viable/strict/1761018806
2025-12-04T09:33:42.0696862Z  * [new tag]                 viable/strict/1761020754    -> viable/strict/1761020754
2025-12-04T09:33:42.0697926Z  * [new tag]                 viable/strict/1761024303    -> viable/strict/1761024303
2025-12-04T09:33:42.0698956Z  * [new tag]                 viable/strict/1761029582    -> viable/strict/1761029582
2025-12-04T09:33:42.0699984Z  * [new tag]                 viable/strict/1761031535    -> viable/strict/1761031535
2025-12-04T09:33:42.0701333Z  * [new tag]                 viable/strict/1761035196    -> viable/strict/1761035196
2025-12-04T09:33:42.0702574Z  * [new tag]                 viable/strict/1761045825    -> viable/strict/1761045825
2025-12-04T09:33:42.0703680Z  * [new tag]                 viable/strict/1761054796    -> viable/strict/1761054796
2025-12-04T09:33:42.0704787Z  * [new tag]                 viable/strict/1761060314    -> viable/strict/1761060314
2025-12-04T09:33:42.0705895Z  * [new tag]                 viable/strict/1761071198    -> viable/strict/1761071198
2025-12-04T09:33:42.0707014Z  * [new tag]                 viable/strict/1761074628    -> viable/strict/1761074628
2025-12-04T09:33:42.0708060Z  * [new tag]                 viable/strict/1761078351    -> viable/strict/1761078351
2025-12-04T09:33:42.0709116Z  * [new tag]                 viable/strict/1761079822    -> viable/strict/1761079822
2025-12-04T09:33:42.0710179Z  * [new tag]                 viable/strict/1761081873    -> viable/strict/1761081873
2025-12-04T09:33:42.0711281Z  * [new tag]                 viable/strict/1761083392    -> viable/strict/1761083392
2025-12-04T09:33:42.0712387Z  * [new tag]                 viable/strict/1761085465    -> viable/strict/1761085465
2025-12-04T09:33:42.0713531Z  * [new tag]                 viable/strict/1761089099    -> viable/strict/1761089099
2025-12-04T09:33:42.0714718Z  * [new tag]                 viable/strict/1761095535    -> viable/strict/1761095535
2025-12-04T09:33:42.0715543Z  * [new tag]                 viable/strict/1761098119    -> viable/strict/1761098119
2025-12-04T09:33:42.0717211Z  * [new tag]                 viable/strict/1761101330    -> viable/strict/1761101330
2025-12-04T09:33:42.0718315Z  * [new tag]                 viable/strict/1761114425    -> viable/strict/1761114425
2025-12-04T09:33:42.0719369Z  * [new tag]                 viable/strict/1761116036    -> viable/strict/1761116036
2025-12-04T09:33:42.0720419Z  * [new tag]                 viable/strict/1761119379    -> viable/strict/1761119379
2025-12-04T09:33:42.0721458Z  * [new tag]                 viable/strict/1761121601    -> viable/strict/1761121601
2025-12-04T09:33:42.0722472Z  * [new tag]                 viable/strict/1761123234    -> viable/strict/1761123234
2025-12-04T09:33:42.0723482Z  * [new tag]                 viable/strict/1761126621    -> viable/strict/1761126621
2025-12-04T09:33:42.0724555Z  * [new tag]                 viable/strict/1761132259    -> viable/strict/1761132259
2025-12-04T09:33:42.0725776Z  * [new tag]                 viable/strict/1761146746    -> viable/strict/1761146746
2025-12-04T09:33:42.0726848Z  * [new tag]                 viable/strict/1761164752    -> viable/strict/1761164752
2025-12-04T09:33:42.0727784Z  * [new tag]                 viable/strict/1761166198    -> viable/strict/1761166198
2025-12-04T09:33:42.0728945Z  * [new tag]                 viable/strict/1761175424    -> viable/strict/1761175424
2025-12-04T09:33:42.0729996Z  * [new tag]                 viable/strict/1761176983    -> viable/strict/1761176983
2025-12-04T09:33:42.0731412Z  * [new tag]                 viable/strict/1761179891    -> viable/strict/1761179891
2025-12-04T09:33:42.0732456Z  * [new tag]                 viable/strict/1761181930    -> viable/strict/1761181930
2025-12-04T09:33:42.0733510Z  * [new tag]                 viable/strict/1761184516    -> viable/strict/1761184516
2025-12-04T09:33:42.0734597Z  * [new tag]                 viable/strict/1761190179    -> viable/strict/1761190179
2025-12-04T09:33:42.0735700Z  * [new tag]                 viable/strict/1761193558    -> viable/strict/1761193558
2025-12-04T09:33:42.0736789Z  * [new tag]                 viable/strict/1761207990    -> viable/strict/1761207990
2025-12-04T09:33:42.0737907Z  * [new tag]                 viable/strict/1761229539    -> viable/strict/1761229539
2025-12-04T09:33:42.0739223Z  * [new tag]                 viable/strict/1761244031    -> viable/strict/1761244031
2025-12-04T09:33:42.0740300Z  * [new tag]                 viable/strict/1761248986    -> viable/strict/1761248986
2025-12-04T09:33:42.0741333Z  * [new tag]                 viable/strict/1761259791    -> viable/strict/1761259791
2025-12-04T09:33:42.0742344Z  * [new tag]                 viable/strict/1761266139    -> viable/strict/1761266139
2025-12-04T09:33:42.0743528Z  * [new tag]                 viable/strict/1761268316    -> viable/strict/1761268316
2025-12-04T09:33:42.0744591Z  * [new tag]                 viable/strict/1761273805    -> viable/strict/1761273805
2025-12-04T09:33:42.0745570Z  * [new tag]                 viable/strict/1761275261    -> viable/strict/1761275261
2025-12-04T09:33:42.0746740Z  * [new tag]                 viable/strict/1761277913    -> viable/strict/1761277913
2025-12-04T09:33:42.0747824Z  * [new tag]                 viable/strict/1761290701    -> viable/strict/1761290701
2025-12-04T09:33:42.0749005Z  * [new tag]                 viable/strict/1761294396    -> viable/strict/1761294396
2025-12-04T09:33:42.0750000Z  * [new tag]                 viable/strict/1761303047    -> viable/strict/1761303047
2025-12-04T09:33:42.0751060Z  * [new tag]                 viable/strict/1761335388    -> viable/strict/1761335388
2025-12-04T09:33:42.0752121Z  * [new tag]                 viable/strict/1761337551    -> viable/strict/1761337551
2025-12-04T09:33:42.0753251Z  * [new tag]                 viable/strict/1761339007    -> viable/strict/1761339007
2025-12-04T09:33:42.0754390Z  * [new tag]                 viable/strict/1761341050    -> viable/strict/1761341050
2025-12-04T09:33:42.0755915Z  * [new tag]                 viable/strict/1761346188    -> viable/strict/1761346188
2025-12-04T09:33:42.0757131Z  * [new tag]                 viable/strict/1761349792    -> viable/strict/1761349792
2025-12-04T09:33:42.0758173Z  * [new tag]                 viable/strict/1761352620    -> viable/strict/1761352620
2025-12-04T09:33:42.0759260Z  * [new tag]                 viable/strict/1761354730    -> viable/strict/1761354730
2025-12-04T09:33:42.0760330Z  * [new tag]                 viable/strict/1761357298    -> viable/strict/1761357298
2025-12-04T09:33:42.0761396Z  * [new tag]                 viable/strict/1761360201    -> viable/strict/1761360201
2025-12-04T09:33:42.0762484Z  * [new tag]                 viable/strict/1761361753    -> viable/strict/1761361753
2025-12-04T09:33:42.0763553Z  * [new tag]                 viable/strict/1761364351    -> viable/strict/1761364351
2025-12-04T09:33:42.0764611Z  * [new tag]                 viable/strict/1761366338    -> viable/strict/1761366338
2025-12-04T09:33:42.0765832Z  * [new tag]                 viable/strict/1761367802    -> viable/strict/1761367802
2025-12-04T09:33:42.0767038Z  * [new tag]                 viable/strict/1761369889    -> viable/strict/1761369889
2025-12-04T09:33:42.0768173Z  * [new tag]                 viable/strict/1761371385    -> viable/strict/1761371385
2025-12-04T09:33:42.0769247Z  * [new tag]                 viable/strict/1761373581    -> viable/strict/1761373581
2025-12-04T09:33:42.0770486Z  * [new tag]                 viable/strict/1761375054    -> viable/strict/1761375054
2025-12-04T09:33:42.0771818Z  * [new tag]                 viable/strict/1761421785    -> viable/strict/1761421785
2025-12-04T09:33:42.0773022Z  * [new tag]                 viable/strict/1761434614    -> viable/strict/1761434614
2025-12-04T09:33:42.0774522Z  * [new tag]                 viable/strict/1761439254    -> viable/strict/1761439254
2025-12-04T09:33:42.0775577Z  * [new tag]                 viable/strict/1761454187    -> viable/strict/1761454187
2025-12-04T09:33:42.0776786Z  * [new tag]                 viable/strict/1761459991    -> viable/strict/1761459991
2025-12-04T09:33:42.0778122Z  * [new tag]                 viable/strict/1761470668    -> viable/strict/1761470668
2025-12-04T09:33:42.0779721Z  * [new tag]                 viable/strict/1761472188    -> viable/strict/1761472188
2025-12-04T09:33:42.0780797Z  * [new tag]                 viable/strict/1761503178    -> viable/strict/1761503178
2025-12-04T09:33:42.0781905Z  * [new tag]                 viable/strict/1761517492    -> viable/strict/1761517492
2025-12-04T09:33:42.0782992Z  * [new tag]                 viable/strict/1761518981    -> viable/strict/1761518981
2025-12-04T09:33:42.0784115Z  * [new tag]                 viable/strict/1761533609    -> viable/strict/1761533609
2025-12-04T09:33:42.0784984Z  * [new tag]                 viable/strict/1761546438    -> viable/strict/1761546438
2025-12-04T09:33:42.0786255Z  * [new tag]                 viable/strict/1761548133    -> viable/strict/1761548133
2025-12-04T09:33:42.0787620Z  * [new tag]                 viable/strict/1761555186    -> viable/strict/1761555186
2025-12-04T09:33:42.0788772Z  * [new tag]                 viable/strict/1761557178    -> viable/strict/1761557178
2025-12-04T09:33:42.0789861Z  * [new tag]                 viable/strict/1761560772    -> viable/strict/1761560772
2025-12-04T09:33:42.0790942Z  * [new tag]                 viable/strict/1761562266    -> viable/strict/1761562266
2025-12-04T09:33:42.0792088Z  * [new tag]                 viable/strict/1761564260    -> viable/strict/1761564260
2025-12-04T09:33:42.0793152Z  * [new tag]                 viable/strict/1761568072    -> viable/strict/1761568072
2025-12-04T09:33:42.0794209Z  * [new tag]                 viable/strict/1761571683    -> viable/strict/1761571683
2025-12-04T09:33:42.0795205Z  * [new tag]                 viable/strict/1761580199    -> viable/strict/1761580199
2025-12-04T09:33:42.0796248Z  * [new tag]                 viable/strict/1761587383    -> viable/strict/1761587383
2025-12-04T09:33:42.0797371Z  * [new tag]                 viable/strict/1761591165    -> viable/strict/1761591165
2025-12-04T09:33:42.0798451Z  * [new tag]                 viable/strict/1761594575    -> viable/strict/1761594575
2025-12-04T09:33:42.0799494Z  * [new tag]                 viable/strict/1761596710    -> viable/strict/1761596710
2025-12-04T09:33:42.0800700Z  * [new tag]                 viable/strict/1761598189    -> viable/strict/1761598189
2025-12-04T09:33:42.0801783Z  * [new tag]                 viable/strict/1761600254    -> viable/strict/1761600254
2025-12-04T09:33:42.0802899Z  * [new tag]                 viable/strict/1761603879    -> viable/strict/1761603879
2025-12-04T09:33:42.0804044Z  * [new tag]                 viable/strict/1761605429    -> viable/strict/1761605429
2025-12-04T09:33:42.0805269Z  * [new tag]                 viable/strict/1761607468    -> viable/strict/1761607468
2025-12-04T09:33:42.0806358Z  * [new tag]                 viable/strict/1761608983    -> viable/strict/1761608983
2025-12-04T09:33:42.0807442Z  * [new tag]                 viable/strict/1761611846    -> viable/strict/1761611846
2025-12-04T09:33:42.0808592Z  * [new tag]                 viable/strict/1761613922    -> viable/strict/1761613922
2025-12-04T09:33:42.0809443Z  * [new tag]                 viable/strict/1761616504    -> viable/strict/1761616504
2025-12-04T09:33:42.0810452Z  * [new tag]                 viable/strict/1761619599    -> viable/strict/1761619599
2025-12-04T09:33:42.0811518Z  * [new tag]                 viable/strict/1761686693    -> viable/strict/1761686693
2025-12-04T09:33:42.0812572Z  * [new tag]                 viable/strict/1761688179    -> viable/strict/1761688179
2025-12-04T09:33:42.0813656Z  * [new tag]                 viable/strict/1761691973    -> viable/strict/1761691973
2025-12-04T09:33:42.0814903Z  * [new tag]                 viable/strict/1761693884    -> viable/strict/1761693884
2025-12-04T09:33:42.0816013Z  * [new tag]                 viable/strict/1761695389    -> viable/strict/1761695389
2025-12-04T09:33:42.0817277Z  * [new tag]                 viable/strict/1761698408    -> viable/strict/1761698408
2025-12-04T09:33:42.0818334Z  * [new tag]                 viable/strict/1761702931    -> viable/strict/1761702931
2025-12-04T09:33:42.0819457Z  * [new tag]                 viable/strict/1761706307    -> viable/strict/1761706307
2025-12-04T09:33:42.0820554Z  * [new tag]                 viable/strict/1761709065    -> viable/strict/1761709065
2025-12-04T09:33:42.0821779Z  * [new tag]                 viable/strict/1761710285    -> viable/strict/1761710285
2025-12-04T09:33:42.0822889Z  * [new tag]                 viable/strict/1761711983    -> viable/strict/1761711983
2025-12-04T09:33:42.0824036Z  * [new tag]                 viable/strict/1761713514    -> viable/strict/1761713514
2025-12-04T09:33:42.0825274Z  * [new tag]                 viable/strict/1761715523    -> viable/strict/1761715523
2025-12-04T09:33:42.0826411Z  * [new tag]                 viable/strict/1761727973    -> viable/strict/1761727973
2025-12-04T09:33:42.0827569Z  * [new tag]                 viable/strict/1761751558    -> viable/strict/1761751558
2025-12-04T09:33:42.0829143Z  * [new tag]                 viable/strict/1761755187    -> viable/strict/1761755187
2025-12-04T09:33:42.0830313Z  * [new tag]                 viable/strict/1761756826    -> viable/strict/1761756826
2025-12-04T09:33:42.0831488Z  * [new tag]                 viable/strict/1761769551    -> viable/strict/1761769551
2025-12-04T09:33:42.0832707Z  * [new tag]                 viable/strict/1761771032    -> viable/strict/1761771032
2025-12-04T09:33:42.0833589Z  * [new tag]                 viable/strict/1761773101    -> viable/strict/1761773101
2025-12-04T09:33:42.0834784Z  * [new tag]                 viable/strict/1761781792    -> viable/strict/1761781792
2025-12-04T09:33:42.0836015Z  * [new tag]                 viable/strict/1761784788    -> viable/strict/1761784788
2025-12-04T09:33:42.0837178Z  * [new tag]                 viable/strict/1761786740    -> viable/strict/1761786740
2025-12-04T09:33:42.0838329Z  * [new tag]                 viable/strict/1761789332    -> viable/strict/1761789332
2025-12-04T09:33:42.0839966Z  * [new tag]                 viable/strict/1761792569    -> viable/strict/1761792569
2025-12-04T09:33:42.0841218Z  * [new tag]                 viable/strict/1761795289    -> viable/strict/1761795289
2025-12-04T09:33:42.0842366Z  * [new tag]                 viable/strict/1761798345    -> viable/strict/1761798345
2025-12-04T09:33:42.0843443Z  * [new tag]                 viable/strict/1761799827    -> viable/strict/1761799827
2025-12-04T09:33:42.0844610Z  * [new tag]                 viable/strict/1761805604    -> viable/strict/1761805604
2025-12-04T09:33:42.0845710Z  * [new tag]                 viable/strict/1761807202    -> viable/strict/1761807202
2025-12-04T09:33:42.0846863Z  * [new tag]                 viable/strict/1761809094    -> viable/strict/1761809094
2025-12-04T09:33:42.0847980Z  * [new tag]                 viable/strict/1761810576    -> viable/strict/1761810576
2025-12-04T09:33:42.0849139Z  * [new tag]                 viable/strict/1761812771    -> viable/strict/1761812771
2025-12-04T09:33:42.0850498Z  * [new tag]                 viable/strict/1761814363    -> viable/strict/1761814363
2025-12-04T09:33:42.0851640Z  * [new tag]                 viable/strict/1761857410    -> viable/strict/1761857410
2025-12-04T09:33:42.0852825Z  * [new tag]                 viable/strict/1761860985    -> viable/strict/1761860985
2025-12-04T09:33:42.0853913Z  * [new tag]                 viable/strict/1761863094    -> viable/strict/1761863094
2025-12-04T09:33:42.0854997Z  * [new tag]                 viable/strict/1761864590    -> viable/strict/1761864590
2025-12-04T09:33:42.0856094Z  * [new tag]                 viable/strict/1761866675    -> viable/strict/1761866675
2025-12-04T09:33:42.0857620Z  * [new tag]                 viable/strict/1761868178    -> viable/strict/1761868178
2025-12-04T09:33:42.0858723Z  * [new tag]                 viable/strict/1761871111    -> viable/strict/1761871111
2025-12-04T09:33:42.0859854Z  * [new tag]                 viable/strict/1761873126    -> viable/strict/1761873126
2025-12-04T09:33:42.0861051Z  * [new tag]                 viable/strict/1761875714    -> viable/strict/1761875714
2025-12-04T09:33:42.0862203Z  * [new tag]                 viable/strict/1761878924    -> viable/strict/1761878924
2025-12-04T09:33:42.0863351Z  * [new tag]                 viable/strict/1761881727    -> viable/strict/1761881727
2025-12-04T09:33:42.0864437Z  * [new tag]                 viable/strict/1761882959    -> viable/strict/1761882959
2025-12-04T09:33:42.0865589Z  * [new tag]                 viable/strict/1761886268    -> viable/strict/1761886268
2025-12-04T09:33:42.0866721Z  * [new tag]                 viable/strict/1761893641    -> viable/strict/1761893641
2025-12-04T09:33:42.0867820Z  * [new tag]                 viable/strict/1761931517    -> viable/strict/1761931517
2025-12-04T09:33:42.0868983Z  * [new tag]                 viable/strict/1761933080    -> viable/strict/1761933080
2025-12-04T09:33:42.0870076Z  * [new tag]                 viable/strict/1761935217    -> viable/strict/1761935217
2025-12-04T09:33:42.0871377Z  * [new tag]                 viable/strict/1761938533    -> viable/strict/1761938533
2025-12-04T09:33:42.0872548Z  * [new tag]                 viable/strict/1761940184    -> viable/strict/1761940184
2025-12-04T09:33:42.0873660Z  * [new tag]                 viable/strict/1761942338    -> viable/strict/1761942338
2025-12-04T09:33:42.0874783Z  * [new tag]                 viable/strict/1761946100    -> viable/strict/1761946100
2025-12-04T09:33:42.0875909Z  * [new tag]                 viable/strict/1761947374    -> viable/strict/1761947374
2025-12-04T09:33:42.0878891Z  * [new tag]                 viable/strict/1761950978    -> viable/strict/1761950978
2025-12-04T09:33:42.0879284Z  * [new tag]                 viable/strict/1761957727    -> viable/strict/1761957727
2025-12-04T09:33:42.0879978Z  * [new tag]                 viable/strict/1761959532    -> viable/strict/1761959532
2025-12-04T09:33:42.0880225Z  * [new tag]                 viable/strict/1761965366    -> viable/strict/1761965366
2025-12-04T09:33:42.0881630Z  * [new tag]                 viable/strict/1761968066    -> viable/strict/1761968066
2025-12-04T09:33:42.0882717Z  * [new tag]                 viable/strict/1761969322    -> viable/strict/1761969322
2025-12-04T09:33:42.0883800Z  * [new tag]                 viable/strict/1761974723    -> viable/strict/1761974723
2025-12-04T09:33:42.0884909Z  * [new tag]                 viable/strict/1761981837    -> viable/strict/1761981837
2025-12-04T09:33:42.0886141Z  * [new tag]                 viable/strict/1761985546    -> viable/strict/1761985546
2025-12-04T09:33:42.0887238Z  * [new tag]                 viable/strict/1761987030    -> viable/strict/1761987030
2025-12-04T09:33:42.0888406Z  * [new tag]                 viable/strict/1762003554    -> viable/strict/1762003554
2025-12-04T09:33:42.0889508Z  * [new tag]                 viable/strict/1762021560    -> viable/strict/1762021560
2025-12-04T09:33:42.0890620Z  * [new tag]                 viable/strict/1762032190    -> viable/strict/1762032190
2025-12-04T09:33:42.0891760Z  * [new tag]                 viable/strict/1762040981    -> viable/strict/1762040981
2025-12-04T09:33:42.0892900Z  * [new tag]                 viable/strict/1762048525    -> viable/strict/1762048525
2025-12-04T09:33:42.0894098Z  * [new tag]                 viable/strict/1762104223    -> viable/strict/1762104223
2025-12-04T09:33:42.0895140Z  * [new tag]                 viable/strict/1762105778    -> viable/strict/1762105778
2025-12-04T09:33:42.0896354Z  * [new tag]                 viable/strict/1762115109    -> viable/strict/1762115109
2025-12-04T09:33:42.0897535Z  * [new tag]                 viable/strict/1762125840    -> viable/strict/1762125840
2025-12-04T09:33:42.0898416Z  * [new tag]                 viable/strict/1762127377    -> viable/strict/1762127377
2025-12-04T09:33:42.0900007Z  * [new tag]                 viable/strict/1762134925    -> viable/strict/1762134925
2025-12-04T09:33:42.0901024Z  * [new tag]                 viable/strict/1762138338    -> viable/strict/1762138338
2025-12-04T09:33:42.0902170Z  * [new tag]                 viable/strict/1762148993    -> viable/strict/1762148993
2025-12-04T09:33:42.0903785Z  * [new tag]                 viable/strict/1762152871    -> viable/strict/1762152871
2025-12-04T09:33:42.0904933Z  * [new tag]                 viable/strict/1762156183    -> viable/strict/1762156183
2025-12-04T09:33:42.0906023Z  * [new tag]                 viable/strict/1762163457    -> viable/strict/1762163457
2025-12-04T09:33:42.0907165Z  * [new tag]                 viable/strict/1762165569    -> viable/strict/1762165569
2025-12-04T09:33:42.0908428Z  * [new tag]                 viable/strict/1762169035    -> viable/strict/1762169035
2025-12-04T09:33:42.0909602Z  * [new tag]                 viable/strict/1762174936    -> viable/strict/1762174936
2025-12-04T09:33:42.0910739Z  * [new tag]                 viable/strict/1762194412    -> viable/strict/1762194412
2025-12-04T09:33:42.0911805Z  * [new tag]                 viable/strict/1762195876    -> viable/strict/1762195876
2025-12-04T09:33:42.0912930Z  * [new tag]                 viable/strict/1762197788    -> viable/strict/1762197788
2025-12-04T09:33:42.0914099Z  * [new tag]                 viable/strict/1762199389    -> viable/strict/1762199389
2025-12-04T09:33:42.0915619Z  * [new tag]                 viable/strict/1762206585    -> viable/strict/1762206585
2025-12-04T09:33:42.0916869Z  * [new tag]                 viable/strict/1762210184    -> viable/strict/1762210184
2025-12-04T09:33:42.0917884Z  * [new tag]                 viable/strict/1762218736    -> viable/strict/1762218736
2025-12-04T09:33:42.0919039Z  * [new tag]                 viable/strict/1762224529    -> viable/strict/1762224529
2025-12-04T09:33:42.0920289Z  * [new tag]                 viable/strict/1762227253    -> viable/strict/1762227253
2025-12-04T09:33:42.0921125Z  * [new tag]                 viable/strict/1762228515    -> viable/strict/1762228515
2025-12-04T09:33:42.0922447Z  * [new tag]                 viable/strict/1762230349    -> viable/strict/1762230349
2025-12-04T09:33:42.0923550Z  * [new tag]                 viable/strict/1762231859    -> viable/strict/1762231859
2025-12-04T09:33:42.0924707Z  * [new tag]                 viable/strict/1762233925    -> viable/strict/1762233925
2025-12-04T09:33:42.0925972Z  * [new tag]                 viable/strict/1762237630    -> viable/strict/1762237630
2025-12-04T09:33:42.0926946Z  * [new tag]                 viable/strict/1762253522    -> viable/strict/1762253522
2025-12-04T09:33:42.0928304Z  * [new tag]                 viable/strict/1762278588    -> viable/strict/1762278588
2025-12-04T09:33:42.0929446Z  * [new tag]                 viable/strict/1762284203    -> viable/strict/1762284203
2025-12-04T09:33:42.0930602Z  * [new tag]                 viable/strict/1762289446    -> viable/strict/1762289446
2025-12-04T09:33:42.0931719Z  * [new tag]                 viable/strict/1762291515    -> viable/strict/1762291515
2025-12-04T09:33:42.0932934Z  * [new tag]                 viable/strict/1762295100    -> viable/strict/1762295100
2025-12-04T09:33:42.0933815Z  * [new tag]                 viable/strict/1762296590    -> viable/strict/1762296590
2025-12-04T09:33:42.0934871Z  * [new tag]                 viable/strict/1762300179    -> viable/strict/1762300179
2025-12-04T09:33:42.0936110Z  * [new tag]                 viable/strict/1762303207    -> viable/strict/1762303207
2025-12-04T09:33:42.0937053Z  * [new tag]                 viable/strict/1762386584    -> viable/strict/1762386584
2025-12-04T09:33:42.0938276Z  * [new tag]                 viable/strict/1762391537    -> viable/strict/1762391537
2025-12-04T09:33:42.0939157Z  * [new tag]                 viable/strict/1762394119    -> viable/strict/1762394119
2025-12-04T09:33:42.0940783Z  * [new tag]                 viable/strict/1762397437    -> viable/strict/1762397437
2025-12-04T09:33:42.0941895Z  * [new tag]                 viable/strict/1762400256    -> viable/strict/1762400256
2025-12-04T09:33:42.0942987Z  * [new tag]                 viable/strict/1762401469    -> viable/strict/1762401469
2025-12-04T09:33:42.0944204Z  * [new tag]                 viable/strict/1762408195    -> viable/strict/1762408195
2025-12-04T09:33:42.0945414Z  * [new tag]                 viable/strict/1762410411    -> viable/strict/1762410411
2025-12-04T09:33:42.0946503Z  * [new tag]                 viable/strict/1762417613    -> viable/strict/1762417613
2025-12-04T09:33:42.0947638Z  * [new tag]                 viable/strict/1762419198    -> viable/strict/1762419198
2025-12-04T09:33:42.0948773Z  * [new tag]                 viable/strict/1762422656    -> viable/strict/1762422656
2025-12-04T09:33:42.0950364Z  * [new tag]                 viable/strict/1762424746    -> viable/strict/1762424746
2025-12-04T09:33:42.0951576Z  * [new tag]                 viable/strict/1762446386    -> viable/strict/1762446386
2025-12-04T09:33:42.0952853Z  * [new tag]                 viable/strict/1762449912    -> viable/strict/1762449912
2025-12-04T09:33:42.0953987Z  * [new tag]                 viable/strict/1762457031    -> viable/strict/1762457031
2025-12-04T09:33:42.0955142Z  * [new tag]                 viable/strict/1762462441    -> viable/strict/1762462441
2025-12-04T09:33:42.0956258Z  * [new tag]                 viable/strict/1762467909    -> viable/strict/1762467909
2025-12-04T09:33:42.0957420Z  * [new tag]                 viable/strict/1762471493    -> viable/strict/1762471493
2025-12-04T09:33:42.0958613Z  * [new tag]                 viable/strict/1762475990    -> viable/strict/1762475990
2025-12-04T09:33:42.0959851Z  * [new tag]                 viable/strict/1762477933    -> viable/strict/1762477933
2025-12-04T09:33:42.0960966Z  * [new tag]                 viable/strict/1762491053    -> viable/strict/1762491053
2025-12-04T09:33:42.0962238Z  * [new tag]                 viable/strict/1762493118    -> viable/strict/1762493118
2025-12-04T09:33:42.0963116Z  * [new tag]                 viable/strict/1762498442    -> viable/strict/1762498442
2025-12-04T09:33:42.0964428Z  * [new tag]                 viable/strict/1762501778    -> viable/strict/1762501778
2025-12-04T09:33:42.0965539Z  * [new tag]                 viable/strict/1762504001    -> viable/strict/1762504001
2025-12-04T09:33:42.0966779Z  * [new tag]                 viable/strict/1762505583    -> viable/strict/1762505583
2025-12-04T09:33:42.0968000Z  * [new tag]                 viable/strict/1762507523    -> viable/strict/1762507523
2025-12-04T09:33:42.0969241Z  * [new tag]                 viable/strict/1762511140    -> viable/strict/1762511140
2025-12-04T09:33:42.0970593Z  * [new tag]                 viable/strict/1762512632    -> viable/strict/1762512632
2025-12-04T09:33:42.0972035Z  * [new tag]                 viable/strict/1762520467    -> viable/strict/1762520467
2025-12-04T09:33:42.0973200Z  * [new tag]                 viable/strict/1762522016    -> viable/strict/1762522016
2025-12-04T09:33:42.0974315Z  * [new tag]                 viable/strict/1762530591    -> viable/strict/1762530591
2025-12-04T09:33:42.0975531Z  * [new tag]                 viable/strict/1762543405    -> viable/strict/1762543405
2025-12-04T09:33:42.0976477Z  * [new tag]                 viable/strict/1762544998    -> viable/strict/1762544998
2025-12-04T09:33:42.0977736Z  * [new tag]                 viable/strict/1762552182    -> viable/strict/1762552182
2025-12-04T09:33:42.0979333Z  * [new tag]                 viable/strict/1762554297    -> viable/strict/1762554297
2025-12-04T09:33:42.0980203Z  * [new tag]                 viable/strict/1762559381    -> viable/strict/1762559381
2025-12-04T09:33:42.0981581Z  * [new tag]                 viable/strict/1762562222    -> viable/strict/1762562222
2025-12-04T09:33:42.0982793Z  * [new tag]                 viable/strict/1762564319    -> viable/strict/1762564319
2025-12-04T09:33:42.0983665Z  * [new tag]                 viable/strict/1762566904    -> viable/strict/1762566904
2025-12-04T09:33:42.0984898Z  * [new tag]                 viable/strict/1762569781    -> viable/strict/1762569781
2025-12-04T09:33:42.0985893Z  * [new tag]                 viable/strict/1762575940    -> viable/strict/1762575940
2025-12-04T09:33:42.0987177Z  * [new tag]                 viable/strict/1762580974    -> viable/strict/1762580974
2025-12-04T09:33:42.0988304Z  * [new tag]                 viable/strict/1762583185    -> viable/strict/1762583185
2025-12-04T09:33:42.0989429Z  * [new tag]                 viable/strict/1762586647    -> viable/strict/1762586647
2025-12-04T09:33:42.0990670Z  * [new tag]                 viable/strict/1762588183    -> viable/strict/1762588183
2025-12-04T09:33:42.0991837Z  * [new tag]                 viable/strict/1762593886    -> viable/strict/1762593886
2025-12-04T09:33:42.0993003Z  * [new tag]                 viable/strict/1762650743    -> viable/strict/1762650743
2025-12-04T09:33:42.0994201Z  * [new tag]                 viable/strict/1762653328    -> viable/strict/1762653328
2025-12-04T09:33:42.0995861Z  * [new tag]                 viable/strict/1762659342    -> viable/strict/1762659342
2025-12-04T09:33:42.0996917Z  * [new tag]                 viable/strict/1762662360    -> viable/strict/1762662360
2025-12-04T09:33:42.0997779Z  * [new tag]                 viable/strict/1762667377    -> viable/strict/1762667377
2025-12-04T09:33:42.0998923Z  * [new tag]                 viable/strict/1762671090    -> viable/strict/1762671090
2025-12-04T09:33:42.1000049Z  * [new tag]                 viable/strict/1762680284    -> viable/strict/1762680284
2025-12-04T09:33:42.1001188Z  * [new tag]                 viable/strict/1762683900    -> viable/strict/1762683900
2025-12-04T09:33:42.1002331Z  * [new tag]                 viable/strict/1762705541    -> viable/strict/1762705541
2025-12-04T09:33:42.1003443Z  * [new tag]                 viable/strict/1762709004    -> viable/strict/1762709004
2025-12-04T09:33:42.1004761Z  * [new tag]                 viable/strict/1762746004    -> viable/strict/1762746004
2025-12-04T09:33:42.1005928Z  * [new tag]                 viable/strict/1762748799    -> viable/strict/1762748799
2025-12-04T09:33:42.1007046Z  * [new tag]                 viable/strict/1762759504    -> viable/strict/1762759504
2025-12-04T09:33:42.1008442Z  * [new tag]                 viable/strict/1762760973    -> viable/strict/1762760973
2025-12-04T09:33:42.1009427Z  * [new tag]                 viable/strict/1762775374    -> viable/strict/1762775374
2025-12-04T09:33:42.1010629Z  * [new tag]                 viable/strict/1762777661    -> viable/strict/1762777661
2025-12-04T09:33:42.1011740Z  * [new tag]                 viable/strict/1762779774    -> viable/strict/1762779774
2025-12-04T09:33:42.1013117Z  * [new tag]                 viable/strict/1762781259    -> viable/strict/1762781259
2025-12-04T09:33:42.1014201Z  * [new tag]                 viable/strict/1762793628    -> viable/strict/1762793628
2025-12-04T09:33:42.1015425Z  * [new tag]                 viable/strict/1762800711    -> viable/strict/1762800711
2025-12-04T09:33:42.1016607Z  * [new tag]                 viable/strict/1762809894    -> viable/strict/1762809894
2025-12-04T09:33:42.1017794Z  * [new tag]                 viable/strict/1762811384    -> viable/strict/1762811384
2025-12-04T09:33:42.1018999Z  * [new tag]                 viable/strict/1762813841    -> viable/strict/1762813841
2025-12-04T09:33:42.1020203Z  * [new tag]                 viable/strict/1762815047    -> viable/strict/1762815047
2025-12-04T09:33:42.1022202Z  * [new tag]                 viable/strict/1762817094    -> viable/strict/1762817094
2025-12-04T09:33:42.1023284Z  * [new tag]                 viable/strict/1762818582    -> viable/strict/1762818582
2025-12-04T09:33:42.1024255Z  * [new tag]                 viable/strict/1762821623    -> viable/strict/1762821623
2025-12-04T09:33:42.1025214Z  * [new tag]                 viable/strict/1762823531    -> viable/strict/1762823531
2025-12-04T09:33:42.1026318Z  * [new tag]                 viable/strict/1762849583    -> viable/strict/1762849583
2025-12-04T09:33:42.1027552Z  * [new tag]                 viable/strict/1762851200    -> viable/strict/1762851200
2025-12-04T09:33:42.1028715Z  * [new tag]                 viable/strict/1762854603    -> viable/strict/1762854603
2025-12-04T09:33:42.1029893Z  * [new tag]                 viable/strict/1762858276    -> viable/strict/1762858276
2025-12-04T09:33:42.1031125Z  * [new tag]                 viable/strict/1762860891    -> viable/strict/1762860891
2025-12-04T09:33:42.1032958Z  * [new tag]                 viable/strict/1762866174    -> viable/strict/1762866174
2025-12-04T09:33:42.1034018Z  * [new tag]                 viable/strict/1762867653    -> viable/strict/1762867653
2025-12-04T09:33:42.1035196Z  * [new tag]                 viable/strict/1762872669    -> viable/strict/1762872669
2025-12-04T09:33:42.1036185Z  * [new tag]                 viable/strict/1762878380    -> viable/strict/1762878380
2025-12-04T09:33:42.1037270Z  * [new tag]                 viable/strict/1762889003    -> viable/strict/1762889003
2025-12-04T09:33:42.1038515Z  * [new tag]                 viable/strict/1762890589    -> viable/strict/1762890589
2025-12-04T09:33:42.1039636Z  * [new tag]                 viable/strict/1762892743    -> viable/strict/1762892743
2025-12-04T09:33:42.1040759Z  * [new tag]                 viable/strict/1762894271    -> viable/strict/1762894271
2025-12-04T09:33:42.1041817Z  * [new tag]                 viable/strict/1762896287    -> viable/strict/1762896287
2025-12-04T09:33:42.1042810Z  * [new tag]                 viable/strict/1762915871    -> viable/strict/1762915871
2025-12-04T09:33:42.1043881Z  * [new tag]                 viable/strict/1762918569    -> viable/strict/1762918569
2025-12-04T09:33:42.1044821Z  * [new tag]                 viable/strict/1762919776    -> viable/strict/1762919776
2025-12-04T09:33:42.1045960Z  * [new tag]                 viable/strict/1762923072    -> viable/strict/1762923072
2025-12-04T09:33:42.1047272Z  * [new tag]                 viable/strict/1762928826    -> viable/strict/1762928826
2025-12-04T09:33:42.1048456Z  * [new tag]                 viable/strict/1762930451    -> viable/strict/1762930451
2025-12-04T09:33:42.1049535Z  * [new tag]                 viable/strict/1762933780    -> viable/strict/1762933780
2025-12-04T09:33:42.1051333Z  * [new tag]                 viable/strict/1762937638    -> viable/strict/1762937638
2025-12-04T09:33:42.1052735Z  * [new tag]                 viable/strict/1762939545    -> viable/strict/1762939545
2025-12-04T09:33:42.1053788Z  * [new tag]                 viable/strict/1762962692    -> viable/strict/1762962692
2025-12-04T09:33:42.1055567Z  * [new tag]                 viable/strict/1762979143    -> viable/strict/1762979143
2025-12-04T09:33:42.1056695Z  * [new tag]                 viable/strict/1762984188    -> viable/strict/1762984188
2025-12-04T09:33:42.1057746Z  * [new tag]                 viable/strict/1762986306    -> viable/strict/1762986306
2025-12-04T09:33:42.1058908Z  * [new tag]                 viable/strict/1762989903    -> viable/strict/1762989903
2025-12-04T09:33:42.1060047Z  * [new tag]                 viable/strict/1762991377    -> viable/strict/1762991377
2025-12-04T09:33:42.1061177Z  * [new tag]                 viable/strict/1762998921    -> viable/strict/1762998921
2025-12-04T09:33:42.1062471Z  * [new tag]                 viable/strict/1763002287    -> viable/strict/1763002287
2025-12-04T09:33:42.1063678Z  * [new tag]                 viable/strict/1763016840    -> viable/strict/1763016840
2025-12-04T09:33:42.1064834Z  * [new tag]                 viable/strict/1763020180    -> viable/strict/1763020180
2025-12-04T09:33:42.1066041Z  * [new tag]                 viable/strict/1763027421    -> viable/strict/1763027421
2025-12-04T09:33:42.1067403Z  * [new tag]                 viable/strict/1763031120    -> viable/strict/1763031120
2025-12-04T09:33:42.1068472Z  * [new tag]                 viable/strict/1763036861    -> viable/strict/1763036861
2025-12-04T09:33:42.1069723Z  * [new tag]                 viable/strict/1763038993    -> viable/strict/1763038993
2025-12-04T09:33:42.1071185Z  * [new tag]                 viable/strict/1763054703    -> viable/strict/1763054703
2025-12-04T09:33:42.1074276Z  * [new tag]                 viable/strict/1763067061    -> viable/strict/1763067061
2025-12-04T09:33:42.1075466Z  * [new tag]                 viable/strict/1763070847    -> viable/strict/1763070847
2025-12-04T09:33:42.1076642Z  * [new tag]                 viable/strict/1763072706    -> viable/strict/1763072706
2025-12-04T09:33:42.1077865Z  * [new tag]                 viable/strict/1763076302    -> viable/strict/1763076302
2025-12-04T09:33:42.1079051Z  * [new tag]                 viable/strict/1763080816    -> viable/strict/1763080816
2025-12-04T09:33:42.1080153Z  * [new tag]                 viable/strict/1763082732    -> viable/strict/1763082732
2025-12-04T09:33:42.1081279Z  * [new tag]                 viable/strict/1763085329    -> viable/strict/1763085329
2025-12-04T09:33:42.1082429Z  * [new tag]                 viable/strict/1763088623    -> viable/strict/1763088623
2025-12-04T09:33:42.1083782Z  * [new tag]                 viable/strict/1763091402    -> viable/strict/1763091402
2025-12-04T09:33:42.1084850Z  * [new tag]                 viable/strict/1763092602    -> viable/strict/1763092602
2025-12-04T09:33:42.1085990Z  * [new tag]                 viable/strict/1763094355    -> viable/strict/1763094355
2025-12-04T09:33:42.1087152Z  * [new tag]                 viable/strict/1763099390    -> viable/strict/1763099390
2025-12-04T09:33:42.1088294Z  * [new tag]                 viable/strict/1763101608    -> viable/strict/1763101608
2025-12-04T09:33:42.1089513Z  * [new tag]                 viable/strict/1763105102    -> viable/strict/1763105102
2025-12-04T09:33:42.1090729Z  * [new tag]                 viable/strict/1763112347    -> viable/strict/1763112347
2025-12-04T09:33:42.1091877Z  * [new tag]                 viable/strict/1763119471    -> viable/strict/1763119471
2025-12-04T09:33:42.1093006Z  * [new tag]                 viable/strict/1763126835    -> viable/strict/1763126835
2025-12-04T09:33:42.1093833Z  * [new tag]                 viable/strict/1763149779    -> viable/strict/1763149779
2025-12-04T09:33:42.1094980Z  * [new tag]                 viable/strict/1763164178    -> viable/strict/1763164178
2025-12-04T09:33:42.1096146Z  * [new tag]                 viable/strict/1763167104    -> viable/strict/1763167104
2025-12-04T09:33:42.1097390Z  * [new tag]                 viable/strict/1763169132    -> viable/strict/1763169132
2025-12-04T09:33:42.1098495Z  * [new tag]                 viable/strict/1763171708    -> viable/strict/1763171708
2025-12-04T09:33:42.1099620Z  * [new tag]                 viable/strict/1763174759    -> viable/strict/1763174759
2025-12-04T09:33:42.1100806Z  * [new tag]                 viable/strict/1763180744    -> viable/strict/1763180744
2025-12-04T09:33:42.1101977Z  * [new tag]                 viable/strict/1763182227    -> viable/strict/1763182227
2025-12-04T09:33:42.1103118Z  * [new tag]                 viable/strict/1763184309    -> viable/strict/1763184309
2025-12-04T09:33:42.1104936Z  * [new tag]                 viable/strict/1763187991    -> viable/strict/1763187991
2025-12-04T09:33:42.1105940Z  * [new tag]                 viable/strict/1763191445    -> viable/strict/1763191445
2025-12-04T09:33:42.1107649Z  * [new tag]                 viable/strict/1763195152    -> viable/strict/1763195152
2025-12-04T09:33:42.1108513Z  * [new tag]                 viable/strict/1763205769    -> viable/strict/1763205769
2025-12-04T09:33:42.1109689Z  * [new tag]                 viable/strict/1763246990    -> viable/strict/1763246990
2025-12-04T09:33:42.1110882Z  * [new tag]                 viable/strict/1763261578    -> viable/strict/1763261578
2025-12-04T09:33:42.1111959Z  * [new tag]                 viable/strict/1763286573    -> viable/strict/1763286573
2025-12-04T09:33:42.1112926Z  * [new tag]                 viable/strict/1763292167    -> viable/strict/1763292167
2025-12-04T09:33:42.1114119Z  * [new tag]                 viable/strict/1763333386    -> viable/strict/1763333386
2025-12-04T09:33:42.1115229Z  * [new tag]                 viable/strict/1763340082    -> viable/strict/1763340082
2025-12-04T09:33:42.1117352Z  * [new tag]                 viable/strict/1763364324    -> viable/strict/1763364324
2025-12-04T09:33:42.1118343Z  * [new tag]                 viable/strict/1763371569    -> viable/strict/1763371569
2025-12-04T09:33:42.1119481Z  * [new tag]                 viable/strict/1763373067    -> viable/strict/1763373067
2025-12-04T09:33:42.1120601Z  * [new tag]                 viable/strict/1763375157    -> viable/strict/1763375157
2025-12-04T09:33:42.1121771Z  * [new tag]                 viable/strict/1763382462    -> viable/strict/1763382462
2025-12-04T09:33:42.1122982Z  * [new tag]                 viable/strict/1763394661    -> viable/strict/1763394661
2025-12-04T09:33:42.1124442Z  * [new tag]                 viable/strict/1763396797    -> viable/strict/1763396797
2025-12-04T09:33:42.1125553Z  * [new tag]                 viable/strict/1763398542    -> viable/strict/1763398542
2025-12-04T09:33:42.1126697Z  * [new tag]                 viable/strict/1763401807    -> viable/strict/1763401807
2025-12-04T09:33:42.1127696Z  * [new tag]                 viable/strict/1763414698    -> viable/strict/1763414698
2025-12-04T09:33:42.1128912Z  * [new tag]                 viable/strict/1763419807    -> viable/strict/1763419807
2025-12-04T09:33:42.1130051Z  * [new tag]                 viable/strict/1763426369    -> viable/strict/1763426369
2025-12-04T09:33:42.1131291Z  * [new tag]                 viable/strict/1763428331    -> viable/strict/1763428331
2025-12-04T09:33:42.1132502Z  * [new tag]                 viable/strict/1763430922    -> viable/strict/1763430922
2025-12-04T09:33:42.1134064Z  * [new tag]                 viable/strict/1763434184    -> viable/strict/1763434184
2025-12-04T09:33:42.1135100Z  * [new tag]                 viable/strict/1763439973    -> viable/strict/1763439973
2025-12-04T09:33:42.1136435Z  * [new tag]                 viable/strict/1763444995    -> viable/strict/1763444995
2025-12-04T09:33:42.1137593Z  * [new tag]                 viable/strict/1763447206    -> viable/strict/1763447206
2025-12-04T09:33:42.1138825Z  * [new tag]                 viable/strict/1763448826    -> viable/strict/1763448826
2025-12-04T09:33:42.1139974Z  * [new tag]                 viable/strict/1763450717    -> viable/strict/1763450717
2025-12-04T09:33:42.1141152Z  * [new tag]                 viable/strict/1763452183    -> viable/strict/1763452183
2025-12-04T09:33:42.1142539Z  * [new tag]                 viable/strict/1763457945    -> viable/strict/1763457945
2025-12-04T09:33:42.1143541Z  * [new tag]                 viable/strict/1763459439    -> viable/strict/1763459439
2025-12-04T09:33:42.1144532Z  * [new tag]                 viable/strict/1763461556    -> viable/strict/1763461556
2025-12-04T09:33:42.1145952Z  * [new tag]                 viable/strict/1763463103    -> viable/strict/1763463103
2025-12-04T09:33:42.1147052Z  * [new tag]                 viable/strict/1763465100    -> viable/strict/1763465100
2025-12-04T09:33:42.1148073Z  * [new tag]                 viable/strict/1763468866    -> viable/strict/1763468866
2025-12-04T09:33:42.1149021Z  * [new tag]                 viable/strict/1763493823    -> viable/strict/1763493823
2025-12-04T09:33:42.1149995Z  * [new tag]                 viable/strict/1763496249    -> viable/strict/1763496249
2025-12-04T09:33:42.1151162Z  * [new tag]                 viable/strict/1763502620    -> viable/strict/1763502620
2025-12-04T09:33:42.1152338Z  * [new tag]                 viable/strict/1763504715    -> viable/strict/1763504715
2025-12-04T09:33:42.1153534Z  * [new tag]                 viable/strict/1763506208    -> viable/strict/1763506208
2025-12-04T09:33:42.1154671Z  * [new tag]                 viable/strict/1763520590    -> viable/strict/1763520590
2025-12-04T09:33:42.1155890Z  * [new tag]                 viable/strict/1763523357    -> viable/strict/1763523357
2025-12-04T09:33:42.1157082Z  * [new tag]                 viable/strict/1763529922    -> viable/strict/1763529922
2025-12-04T09:33:42.1158326Z  * [new tag]                 viable/strict/1763531408    -> viable/strict/1763531408
2025-12-04T09:33:42.1159455Z  * [new tag]                 viable/strict/1763533622    -> viable/strict/1763533622
2025-12-04T09:33:42.1160618Z  * [new tag]                 viable/strict/1763538576    -> viable/strict/1763538576
2025-12-04T09:33:42.1161828Z  * [new tag]                 viable/strict/1763545823    -> viable/strict/1763545823
2025-12-04T09:33:42.1162785Z  * [new tag]                 viable/strict/1763547951    -> viable/strict/1763547951
2025-12-04T09:33:42.1164159Z  * [new tag]                 viable/strict/1763551477    -> viable/strict/1763551477
2025-12-04T09:33:42.1165207Z  * [new tag]                 viable/strict/1763552982    -> viable/strict/1763552982
2025-12-04T09:33:42.1166379Z  * [new tag]                 viable/strict/1763594698    -> viable/strict/1763594698
2025-12-04T09:33:42.1167505Z  * [new tag]                 viable/strict/1763596178    -> viable/strict/1763596178
2025-12-04T09:33:42.1168651Z  * [new tag]                 viable/strict/1763599155    -> viable/strict/1763599155
2025-12-04T09:33:42.1169838Z  * [new tag]                 viable/strict/1763603717    -> viable/strict/1763603717
2025-12-04T09:33:42.1171177Z  * [new tag]                 viable/strict/1763606923    -> viable/strict/1763606923
2025-12-04T09:33:42.1172572Z  * [new tag]                 viable/strict/1763609715    -> viable/strict/1763609715
2025-12-04T09:33:42.1173526Z  * [new tag]                 viable/strict/1763612757    -> viable/strict/1763612757
2025-12-04T09:33:42.1174705Z  * [new tag]                 viable/strict/1763616325    -> viable/strict/1763616325
2025-12-04T09:33:42.1175867Z  * [new tag]                 viable/strict/1763623509    -> viable/strict/1763623509
2025-12-04T09:33:42.1177379Z  * [new tag]                 viable/strict/1763624984    -> viable/strict/1763624984
2025-12-04T09:33:42.1178621Z  * [new tag]                 viable/strict/1763628796    -> viable/strict/1763628796
2025-12-04T09:33:42.1179612Z  * [new tag]                 viable/strict/1763634343    -> viable/strict/1763634343
2025-12-04T09:33:42.1180717Z  * [new tag]                 viable/strict/1763635867    -> viable/strict/1763635867
2025-12-04T09:33:42.1182144Z  * [new tag]                 viable/strict/1763639382    -> viable/strict/1763639382
2025-12-04T09:33:42.1183291Z  * [new tag]                 viable/strict/1763646626    -> viable/strict/1763646626
2025-12-04T09:33:42.1184725Z  * [new tag]                 viable/strict/1763655997    -> viable/strict/1763655997
2025-12-04T09:33:42.1185778Z  * [new tag]                 viable/strict/1763659444    -> viable/strict/1763659444
2025-12-04T09:33:42.1187009Z  * [new tag]                 viable/strict/1763660992    -> viable/strict/1763660992
2025-12-04T09:33:42.1188116Z  * [new tag]                 viable/strict/1763663201    -> viable/strict/1763663201
2025-12-04T09:33:42.1189335Z  * [new tag]                 viable/strict/1763670362    -> viable/strict/1763670362
2025-12-04T09:33:42.1190305Z  * [new tag]                 viable/strict/1763675378    -> viable/strict/1763675378
2025-12-04T09:33:42.1191437Z  * [new tag]                 viable/strict/1763693343    -> viable/strict/1763693343
2025-12-04T09:33:42.1192558Z  * [new tag]                 viable/strict/1763696088    -> viable/strict/1763696088
2025-12-04T09:33:42.1194059Z  * [new tag]                 viable/strict/1763697343    -> viable/strict/1763697343
2025-12-04T09:33:42.1195089Z  * [new tag]                 viable/strict/1763699165    -> viable/strict/1763699165
2025-12-04T09:33:42.1196290Z  * [new tag]                 viable/strict/1763700660    -> viable/strict/1763700660
2025-12-04T09:33:42.1197358Z  * [new tag]                 viable/strict/1763704209    -> viable/strict/1763704209
2025-12-04T09:33:42.1198732Z  * [new tag]                 viable/strict/1763706411    -> viable/strict/1763706411
2025-12-04T09:33:42.1199711Z  * [new tag]                 viable/strict/1763708082    -> viable/strict/1763708082
2025-12-04T09:33:42.1200766Z  * [new tag]                 viable/strict/1763711381    -> viable/strict/1763711381
2025-12-04T09:33:42.1201808Z  * [new tag]                 viable/strict/1763713593    -> viable/strict/1763713593
2025-12-04T09:33:42.1202931Z  * [new tag]                 viable/strict/1763715201    -> viable/strict/1763715201
2025-12-04T09:33:42.1204064Z  * [new tag]                 viable/strict/1763733017    -> viable/strict/1763733017
2025-12-04T09:33:42.1205240Z  * [new tag]                 viable/strict/1763735108    -> viable/strict/1763735108
2025-12-04T09:33:42.1206385Z  * [new tag]                 viable/strict/1763749579    -> viable/strict/1763749579
2025-12-04T09:33:42.1207503Z  * [new tag]                 viable/strict/1763751113    -> viable/strict/1763751113
2025-12-04T09:33:42.1209278Z  * [new tag]                 viable/strict/1763753035    -> viable/strict/1763753035
2025-12-04T09:33:42.1210386Z  * [new tag]                 viable/strict/1763754578    -> viable/strict/1763754578
2025-12-04T09:33:42.1211533Z  * [new tag]                 viable/strict/1763756748    -> viable/strict/1763756748
2025-12-04T09:33:42.1212717Z  * [new tag]                 viable/strict/1763758205    -> viable/strict/1763758205
2025-12-04T09:33:42.1213697Z  * [new tag]                 viable/strict/1763764050    -> viable/strict/1763764050
2025-12-04T09:33:42.1214890Z  * [new tag]                 viable/strict/1763771887    -> viable/strict/1763771887
2025-12-04T09:33:42.1216252Z  * [new tag]                 viable/strict/1763773920    -> viable/strict/1763773920
2025-12-04T09:33:42.1217402Z  * [new tag]                 viable/strict/1763776501    -> viable/strict/1763776501
2025-12-04T09:33:42.1218520Z  * [new tag]                 viable/strict/1763779437    -> viable/strict/1763779437
2025-12-04T09:33:42.1219995Z  * [new tag]                 viable/strict/1763781038    -> viable/strict/1763781038
2025-12-04T09:33:42.1221071Z  * [new tag]                 viable/strict/1763782245    -> viable/strict/1763782245
2025-12-04T09:33:42.1222117Z  * [new tag]                 viable/strict/1763785568    -> viable/strict/1763785568
2025-12-04T09:33:42.1223355Z  * [new tag]                 viable/strict/1763787006    -> viable/strict/1763787006
2025-12-04T09:33:42.1224533Z  * [new tag]                 viable/strict/1763789103    -> viable/strict/1763789103
2025-12-04T09:33:42.1225672Z  * [new tag]                 viable/strict/1763790578    -> viable/strict/1763790578
2025-12-04T09:33:42.1226818Z  * [new tag]                 viable/strict/1763796275    -> viable/strict/1763796275
2025-12-04T09:33:42.1228362Z  * [new tag]                 viable/strict/1763801465    -> viable/strict/1763801465
2025-12-04T09:33:42.1229334Z  * [new tag]                 viable/strict/1763803522    -> viable/strict/1763803522
2025-12-04T09:33:42.1230402Z  * [new tag]                 viable/strict/1763808581    -> viable/strict/1763808581
2025-12-04T09:33:42.1231605Z  * [new tag]                 viable/strict/1763840977    -> viable/strict/1763840977
2025-12-04T09:33:42.1232706Z  * [new tag]                 viable/strict/1763846659    -> viable/strict/1763846659
2025-12-04T09:33:42.1233785Z  * [new tag]                 viable/strict/1763872065    -> viable/strict/1763872065
2025-12-04T09:33:42.1234979Z  * [new tag]                 viable/strict/1763873648    -> viable/strict/1763873648
2025-12-04T09:33:42.1236175Z  * [new tag]                 viable/strict/1763875506    -> viable/strict/1763875506
2025-12-04T09:33:42.1237163Z  * [new tag]                 viable/strict/1763889904    -> viable/strict/1763889904
2025-12-04T09:33:42.1238272Z  * [new tag]                 viable/strict/1763930999    -> viable/strict/1763930999
2025-12-04T09:33:42.1239440Z  * [new tag]                 viable/strict/1763944964    -> viable/strict/1763944964
2025-12-04T09:33:42.1240459Z  * [new tag]                 viable/strict/1763958474    -> viable/strict/1763958474
2025-12-04T09:33:42.1241600Z  * [new tag]                 viable/strict/1763967263    -> viable/strict/1763967263
2025-12-04T09:33:42.1242709Z  * [new tag]                 viable/strict/1763972803    -> viable/strict/1763972803
2025-12-04T09:33:42.1243800Z  * [new tag]                 viable/strict/1763976376    -> viable/strict/1763976376
2025-12-04T09:33:42.1244994Z  * [new tag]                 viable/strict/1763989404    -> viable/strict/1763989404
2025-12-04T09:33:42.1246138Z  * [new tag]                 viable/strict/1763990887    -> viable/strict/1763990887
2025-12-04T09:33:42.1247278Z  * [new tag]                 viable/strict/1764019919    -> viable/strict/1764019919
2025-12-04T09:33:42.1248481Z  * [new tag]                 viable/strict/1764023134    -> viable/strict/1764023134
2025-12-04T09:33:42.1249451Z  * [new tag]                 viable/strict/1764024593    -> viable/strict/1764024593
2025-12-04T09:33:42.1250549Z  * [new tag]                 viable/strict/1764026706    -> viable/strict/1764026706
2025-12-04T09:33:42.1252172Z  * [new tag]                 viable/strict/1764031139    -> viable/strict/1764031139
2025-12-04T09:33:42.1253180Z  * [new tag]                 viable/strict/1764033131    -> viable/strict/1764033131
2025-12-04T09:33:42.1254152Z  * [new tag]                 viable/strict/1764035725    -> viable/strict/1764035725
2025-12-04T09:33:42.1255150Z  * [new tag]                 viable/strict/1764624265    -> viable/strict/1764624265
2025-12-04T09:33:42.1256066Z  * [new tag]                 viable/strict/1764631514    -> viable/strict/1764631514
2025-12-04T09:33:42.1257291Z  * [new tag]                 viable/strict/1764632987    -> viable/strict/1764632987
2025-12-04T09:33:42.1258262Z  * [new tag]                 viable/strict/1764636063    -> viable/strict/1764636063
2025-12-04T09:33:42.1259203Z  * [new tag]                 viable/strict/1764643975    -> viable/strict/1764643975
2025-12-04T09:33:42.1260176Z  * [new tag]                 viable/strict/1764646859    -> viable/strict/1764646859
2025-12-04T09:33:42.1261225Z  * [new tag]                 viable/strict/1764653120    -> viable/strict/1764653120
2025-12-04T09:33:42.1262100Z  * [new tag]                 viable/strict/1764654632    -> viable/strict/1764654632
2025-12-04T09:33:42.1263050Z  * [new tag]                 viable/strict/1764656821    -> viable/strict/1764656821
2025-12-04T09:33:42.1264011Z  * [new tag]                 viable/strict/1764658557    -> viable/strict/1764658557
2025-12-04T09:33:42.1264936Z  * [new tag]                 viable/strict/1764660333    -> viable/strict/1764660333
2025-12-04T09:33:42.1265933Z  * [new tag]                 viable/strict/1764661812    -> viable/strict/1764661812
2025-12-04T09:33:42.1266904Z  * [new tag]                 viable/strict/1764664023    -> viable/strict/1764664023
2025-12-04T09:33:42.1267837Z  * [new tag]                 viable/strict/1764669150    -> viable/strict/1764669150
2025-12-04T09:33:42.1268793Z  * [new tag]                 viable/strict/1764680709    -> viable/strict/1764680709
2025-12-04T09:33:42.1269767Z  * [new tag]                 viable/strict/1764687619    -> viable/strict/1764687619
2025-12-04T09:33:42.1270683Z  * [new tag]                 viable/strict/1764696355    -> viable/strict/1764696355
2025-12-04T09:33:42.1271851Z  * [new tag]                 viable/strict/1764701767    -> viable/strict/1764701767
2025-12-04T09:33:42.1272790Z  * [new tag]                 viable/strict/1764710768    -> viable/strict/1764710768
2025-12-04T09:33:42.1273736Z  * [new tag]                 viable/strict/1764716202    -> viable/strict/1764716202
2025-12-04T09:33:42.1274687Z  * [new tag]                 viable/strict/1764793566    -> viable/strict/1764793566
2025-12-04T09:33:42.1275670Z  * [new tag]                 viable/strict/1764797093    -> viable/strict/1764797093
2025-12-04T09:33:42.1276620Z  * [new tag]                 viable/strict/1764800729    -> viable/strict/1764800729
2025-12-04T09:33:42.1278016Z  * [new tag]                 whc_flight_1                -> whc_flight_1
2025-12-04T09:33:42.1279018Z  * [new tag]                 whc_flight_2                -> whc_flight_2
2025-12-04T09:33:42.1280533Z  * [new tag]                 whc_flight_4                -> whc_flight_4
2025-12-04T09:33:42.2142141Z [command]/usr/bin/git rev-parse --verify --quiet ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32^{object}
2025-12-04T09:33:42.2169451Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:33:42.2173609Z ##[endgroup]
2025-12-04T09:33:42.2174082Z ##[group]Determining the checkout info
2025-12-04T09:33:42.2175087Z ##[endgroup]
2025-12-04T09:33:42.2179557Z [command]/usr/bin/git sparse-checkout disable
2025-12-04T09:33:42.2214376Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig
2025-12-04T09:33:42.2243368Z ##[group]Checking out the ref
2025-12-04T09:33:42.2247064Z [command]/usr/bin/git checkout --progress --force ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:33:43.2786650Z Updating files:  75% (15192/20121)
2025-12-04T09:33:43.2947418Z Updating files:  76% (15292/20121)
2025-12-04T09:33:43.3092080Z Updating files:  77% (15494/20121)
2025-12-04T09:33:43.3321933Z Updating files:  78% (15695/20121)
2025-12-04T09:33:43.3617185Z Updating files:  79% (15896/20121)
2025-12-04T09:33:43.3978185Z Updating files:  80% (16097/20121)
2025-12-04T09:33:43.4300786Z Updating files:  81% (16299/20121)
2025-12-04T09:33:43.4539638Z Updating files:  82% (16500/20121)
2025-12-04T09:33:43.4709233Z Updating files:  83% (16701/20121)
2025-12-04T09:33:43.4864371Z Updating files:  84% (16902/20121)
2025-12-04T09:33:43.5044775Z Updating files:  85% (17103/20121)
2025-12-04T09:33:43.5217224Z Updating files:  86% (17305/20121)
2025-12-04T09:33:43.5371501Z Updating files:  87% (17506/20121)
2025-12-04T09:33:43.5497959Z Updating files:  88% (17707/20121)
2025-12-04T09:33:43.5648352Z Updating files:  89% (17908/20121)
2025-12-04T09:33:43.5841485Z Updating files:  90% (18109/20121)
2025-12-04T09:33:43.5969520Z Updating files:  91% (18311/20121)
2025-12-04T09:33:43.6142156Z Updating files:  92% (18512/20121)
2025-12-04T09:33:43.6347494Z Updating files:  93% (18713/20121)
2025-12-04T09:33:43.6576671Z Updating files:  94% (18914/20121)
2025-12-04T09:33:43.6771077Z Updating files:  95% (19115/20121)
2025-12-04T09:33:43.6946944Z Updating files:  96% (19317/20121)
2025-12-04T09:33:43.7130752Z Updating files:  97% (19518/20121)
2025-12-04T09:33:43.7450067Z Updating files:  98% (19719/20121)
2025-12-04T09:33:43.7647769Z Updating files:  99% (19920/20121)
2025-12-04T09:33:43.7648141Z Updating files: 100% (20121/20121)
2025-12-04T09:33:43.7648515Z Updating files: 100% (20121/20121), done.
2025-12-04T09:33:43.7966534Z Note: switching to 'ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32'.
2025-12-04T09:33:43.7966929Z 
2025-12-04T09:33:43.7967188Z You are in 'detached HEAD' state. You can look around, make experimental
2025-12-04T09:33:43.7967841Z changes and commit them, and you can discard any commits you make in this
2025-12-04T09:33:43.7968495Z state without impacting any branches by switching back to a branch.
2025-12-04T09:33:43.7968885Z 
2025-12-04T09:33:43.7969127Z If you want to create a new branch to retain commits you create, you may
2025-12-04T09:33:43.7969721Z do so (now or later) by using -c with the switch command. Example:
2025-12-04T09:33:43.7970074Z 
2025-12-04T09:33:43.7970203Z   git switch -c <new-branch-name>
2025-12-04T09:33:43.7970658Z 
2025-12-04T09:33:43.7970800Z Or undo this operation with:
2025-12-04T09:33:43.7971164Z 
2025-12-04T09:33:43.7971269Z   git switch -
2025-12-04T09:33:43.7971436Z 
2025-12-04T09:33:43.7971711Z Turn off this advice by setting config variable advice.detachedHead to false
2025-12-04T09:33:43.7972466Z 
2025-12-04T09:33:43.7973095Z HEAD is now at ffd9b0fb435 Resolve collective autotuning test failure on arm (#168919)
2025-12-04T09:33:43.8061366Z ##[endgroup]
2025-12-04T09:33:43.8061864Z ##[group]Setting up auth for fetching submodules
2025-12-04T09:33:43.8068191Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic ***
2025-12-04T09:33:43.8121459Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf
2025-12-04T09:33:43.8152148Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com:
2025-12-04T09:33:43.8182164Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com:
2025-12-04T09:33:43.8208055Z ##[endgroup]
2025-12-04T09:33:43.8208914Z ##[group]Fetching submodules
2025-12-04T09:33:43.8212793Z [command]/usr/bin/git submodule sync --recursive
2025-12-04T09:33:43.8589320Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive
2025-12-04T09:33:43.8932149Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni'
2025-12-04T09:33:43.8933408Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16'
2025-12-04T09:33:43.8935990Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv'
2025-12-04T09:33:43.8942243Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK'
2025-12-04T09:33:43.8943309Z Submodule 'third_party/NVTX' (https://github.com/NVIDIA/NVTX.git) registered for path 'third_party/NVTX'
2025-12-04T09:33:43.8945685Z Submodule 'third_party/VulkanMemoryAllocator' (https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git) registered for path 'third_party/VulkanMemoryAllocator'
2025-12-04T09:33:43.8949225Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK'
2025-12-04T09:33:43.8952650Z Submodule 'third_party/aiter' (https://github.com/ROCm/aiter.git) registered for path 'third_party/aiter'
2025-12-04T09:33:43.8956356Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark'
2025-12-04T09:33:43.8960563Z Submodule 'third_party/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/composable_kernel'
2025-12-04T09:33:43.8964117Z Submodule 'third_party/cpp-httplib' (https://github.com/yhirose/cpp-httplib.git) registered for path 'third_party/cpp-httplib'
2025-12-04T09:33:43.8968044Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo'
2025-12-04T09:33:43.8972956Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend'
2025-12-04T09:33:43.8977238Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass'
2025-12-04T09:33:43.8981448Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm'
2025-12-04T09:33:43.8987191Z Submodule 'third_party/flash-attention' (https://github.com/Dao-AILab/flash-attention.git) registered for path 'third_party/flash-attention'
2025-12-04T09:33:43.8993224Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers'
2025-12-04T09:33:43.8997828Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt'
2025-12-04T09:33:43.9002703Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:33:43.9007335Z Submodule 'third_party/gloo' (https://github.com/pytorch/gloo) registered for path 'third_party/gloo'
2025-12-04T09:33:43.9012374Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest'
2025-12-04T09:33:43.9017299Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep'
2025-12-04T09:33:43.9022519Z Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi'
2025-12-04T09:33:43.9027615Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto'
2025-12-04T09:33:43.9033487Z Submodule 'third_party/kleidiai' (https://github.com/ARM-software/kleidiai.git) registered for path 'third_party/kleidiai'
2025-12-04T09:33:43.9039465Z Submodule 'third_party/mimalloc' (https://github.com/microsoft/mimalloc.git) registered for path 'third_party/mimalloc'
2025-12-04T09:33:43.9044841Z Submodule 'third_party/nlohmann' (https://github.com/nlohmann/json.git) registered for path 'third_party/nlohmann'
2025-12-04T09:33:43.9051423Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx'
2025-12-04T09:33:43.9057780Z Submodule 'third_party/opentelemetry-cpp' (https://github.com/open-telemetry/opentelemetry-cpp.git) registered for path 'third_party/opentelemetry-cpp'
2025-12-04T09:33:43.9063097Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft'
2025-12-04T09:33:43.9069196Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf'
2025-12-04T09:33:43.9075539Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd'
2025-12-04T09:33:43.9082290Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool'
2025-12-04T09:33:43.9089390Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11'
2025-12-04T09:33:43.9095917Z Submodule 'third_party/python-peachpy' (https://github.com/malfet/PeachPy.git) registered for path 'third_party/python-peachpy'
2025-12-04T09:33:43.9102528Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef'
2025-12-04T09:33:43.9109575Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe'
2025-12-04T09:33:43.9147061Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/android/libs/fbjni'...
2025-12-04T09:33:44.1532485Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FXdiv'...
2025-12-04T09:33:44.1533401Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FP16'...
2025-12-04T09:33:44.1564841Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fmt'...
2025-12-04T09:33:47.9757592Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NNPACK'...
2025-12-04T09:33:47.9759721Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/benchmark'...
2025-12-04T09:33:47.9761635Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NVTX'...
2025-12-04T09:33:47.9763414Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gloo'...
2025-12-04T09:33:47.9765421Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'...
2025-12-04T09:33:47.9816588Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpuinfo'...
2025-12-04T09:33:47.9818662Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention'...
2025-12-04T09:33:47.9820823Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpp-httplib'...
2025-12-04T09:33:47.9822646Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep'...
2025-12-04T09:33:47.9824369Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ittapi'...
2025-12-04T09:33:47.9826063Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kleidiai'...
2025-12-04T09:33:47.9827855Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pocketfft'...
2025-12-04T09:33:47.9829782Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cudnn_frontend'...
2025-12-04T09:33:47.9831503Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/psimd'...
2025-12-04T09:33:47.9833258Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pthreadpool'...
2025-12-04T09:33:47.9835211Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/googletest'...
2025-12-04T09:33:47.9836974Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/mimalloc'...
2025-12-04T09:33:48.0100469Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flatbuffers'...
2025-12-04T09:33:48.1887646Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp'...
2025-12-04T09:34:08.3052849Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-peachpy'...
2025-12-04T09:34:08.3055358Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/VulkanMemoryAllocator'...
2025-12-04T09:34:08.3058975Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe'...
2025-12-04T09:34:08.3060700Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto'...
2025-12-04T09:34:08.3062312Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/sleef'...
2025-12-04T09:34:08.3063928Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pybind11'...
2025-12-04T09:34:08.3065583Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cutlass'...
2025-12-04T09:34:08.3067183Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm'...
2025-12-04T09:34:08.3068767Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx'...
2025-12-04T09:34:08.3070486Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/composable_kernel'...
2025-12-04T09:34:08.3072774Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nlohmann'...
2025-12-04T09:34:08.4054235Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/XNNPACK'...
2025-12-04T09:34:13.0307069Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter'...
2025-12-04T09:34:13.0307997Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf'...
2025-12-04T09:34:13.0493852Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f'
2025-12-04T09:34:13.0641141Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3'
2025-12-04T09:34:13.0752508Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1'
2025-12-04T09:34:13.1044538Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73'
2025-12-04T09:34:13.2021568Z Submodule path 'third_party/NVTX': checked out '3ebbc93ded7285963bff932c678fa367eb393ba6'
2025-12-04T09:34:13.2606513Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1'
2025-12-04T09:34:14.1198546Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883'
2025-12-04T09:34:14.3405823Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150'
2025-12-04T09:34:14.3428444Z Submodule '3rdparty/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:34:14.3459191Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter/3rdparty/composable_kernel'...
2025-12-04T09:34:19.5728741Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf'
2025-12-04T09:34:19.6010872Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f'
2025-12-04T09:34:20.0146752Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977'
2025-12-04T09:34:20.0718003Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246'
2025-12-04T09:34:20.1855339Z Submodule path 'third_party/cpuinfo': checked out 'f858c30bcb16f8effd5ff46996f0514539e17abc'
2025-12-04T09:34:20.2422923Z Submodule path 'third_party/cudnn_frontend': checked out '0b1577c8c83401237d601d0d0db5210506705396'
2025-12-04T09:34:20.9940338Z Submodule path 'third_party/cutlass': checked out 'f88806b1e31dfa579842638740216dd41fc6c588'
2025-12-04T09:34:21.1755571Z Submodule path 'third_party/fbgemm': checked out 'c0b988d39a9e47c794d699f29930ed4d7c7e13a4'
2025-12-04T09:34:21.1780869Z Submodule 'external/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/external/asmjit'
2025-12-04T09:34:21.1784361Z Submodule 'external/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:34:21.1787172Z Submodule 'external/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:34:21.1790273Z Submodule 'external/cutlass' (https://github.com/jwfromm/cutlass) registered for path 'third_party/fbgemm/external/cutlass'
2025-12-04T09:34:21.1793616Z Submodule 'external/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/external/googletest'
2025-12-04T09:34:21.1797111Z Submodule 'external/hipify_torch' (https://github.com/ROCmSoftwarePlatform/hipify_torch.git) registered for path 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:34:21.1800399Z Submodule 'external/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/fbgemm/external/json'
2025-12-04T09:34:21.1834085Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/asmjit'...
2025-12-04T09:34:22.5489865Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/hipify_torch'...
2025-12-04T09:34:22.5491038Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cpuinfo'...
2025-12-04T09:34:22.5492075Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/googletest'...
2025-12-04T09:34:22.6491049Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/composable_kernel'...
2025-12-04T09:34:26.3528584Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cutlass'...
2025-12-04T09:34:26.4529459Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/json'...
2025-12-04T09:34:29.6880160Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea'
2025-12-04T09:34:30.1032375Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977'
2025-12-04T09:34:30.2211509Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349'
2025-12-04T09:34:30.9661096Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '98125ce499b0fdf7ffbe0e3052f5b8709f4840f8'
2025-12-04T09:34:31.0219980Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T09:34:31.0364687Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691'
2025-12-04T09:34:31.1630956Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03'
2025-12-04T09:34:31.2450617Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5'
2025-12-04T09:34:31.2474473Z Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:34:31.2477210Z Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:34:31.2509922Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/composable_kernel'...
2025-12-04T09:34:36.0998357Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/cutlass'...
2025-12-04T09:34:36.3858536Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33'
2025-12-04T09:34:37.0364638Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420'
2025-12-04T09:34:37.1999067Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757'
2025-12-04T09:34:37.2350077Z Submodule path 'third_party/fmt': checked out '407c905e45ad75fc29bf0f9bb7c5c2fd3475976f'
2025-12-04T09:34:37.2828923Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350'
2025-12-04T09:34:37.3134553Z Submodule path 'third_party/gloo': checked out '54cbae0d3a67fa890b4c3d9ee162b7860315e341'
2025-12-04T09:34:37.3668468Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T09:34:37.3827035Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3'
2025-12-04T09:34:37.3846730Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn'
2025-12-04T09:34:37.3875472Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'...
2025-12-04T09:34:53.9708436Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d'
2025-12-04T09:34:53.9954712Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959'
2025-12-04T09:34:54.1006644Z Submodule path 'third_party/kineto': checked out '31f85df8fbd89c188f14ef10f1ec65379786b943'
2025-12-04T09:34:54.1028926Z Submodule 'libkineto/third_party/dynolog' (https://github.com/facebookincubator/dynolog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:34:54.1031696Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:34:54.1034907Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:34:54.1067647Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog'...
2025-12-04T09:34:55.1857095Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'...
2025-12-04T09:34:55.7793603Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'...
2025-12-04T09:34:55.8849306Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out 'd2ffe0a4e3acace628db49974246b66fc3e85fb1'
2025-12-04T09:34:55.8871203Z Submodule 'third_party/DCGM' (https://github.com/NVIDIA/DCGM.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:34:55.8873956Z Submodule 'third_party/cpr' (https://github.com/libcpr/cpr.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:34:55.8876923Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:34:55.8880188Z Submodule 'third_party/gflags' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:34:55.8883485Z Submodule 'third_party/glog' (https://github.com/google/glog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:34:55.8887061Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:34:55.8890642Z Submodule 'third_party/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:34:55.8894839Z Submodule 'third_party/pfs' (https://github.com/dtrugman/pfs.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:34:55.8898927Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:34:55.8931866Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'...
2025-12-04T09:34:57.9006827Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'...
2025-12-04T09:34:57.9008298Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'...
2025-12-04T09:34:57.9009764Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'...
2025-12-04T09:34:57.9011116Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'...
2025-12-04T09:34:57.9012438Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/glog'...
2025-12-04T09:34:57.9013814Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'...
2025-12-04T09:34:57.9015401Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'...
2025-12-04T09:34:58.0007824Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/json'...
2025-12-04T09:35:04.6101574Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9'
2025-12-04T09:35:04.6319827Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400'
2025-12-04T09:35:04.6752441Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05'
2025-12-04T09:35:04.6918097Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067'
2025-12-04T09:35:04.6936190Z Submodule 'doc' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:35:04.6965567Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'...
2025-12-04T09:35:04.9908510Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4'
2025-12-04T09:35:05.0130472Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446'
2025-12-04T09:35:05.0665296Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T09:35:05.1813525Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5'
2025-12-04T09:35:05.2010253Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150'
2025-12-04T09:35:05.2213005Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp': checked out 'b1234816facfdda29845c46696a02998a4af115a'
2025-12-04T09:35:05.2233378Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:05.2236527Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:05.2269170Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'...
2025-12-04T09:35:07.6168875Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'...
2025-12-04T09:35:07.9080612Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'd7ba35bbb649209c66e582d5a0244ba988a15159'
2025-12-04T09:35:07.9632498Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
2025-12-04T09:35:08.0011510Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21'
2025-12-04T09:35:08.0548578Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T09:35:08.1179655Z Submodule path 'third_party/kleidiai': checked out 'd7770c89632329a9914ef1a90289917597639cbe'
2025-12-04T09:35:08.1634715Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e'
2025-12-04T09:35:08.2814700Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72'
2025-12-04T09:35:08.7565235Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83'
2025-12-04T09:35:08.7608527Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11'
2025-12-04T09:35:08.7639370Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'...
2025-12-04T09:35:09.7396557Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4'
2025-12-04T09:35:09.8223424Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878'
2025-12-04T09:35:09.8246289Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark) registered for path 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:35:09.8249038Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:35:09.8251980Z Submodule 'third_party/ms-gsl' (https://github.com/microsoft/GSL) registered for path 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:35:09.8255375Z Submodule 'third_party/nlohmann-json' (https://github.com/nlohmann/json) registered for path 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:35:09.8258849Z Submodule 'third_party/opentelemetry-proto' (https://github.com/open-telemetry/opentelemetry-proto) registered for path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:35:09.8262120Z Submodule 'third_party/opentracing-cpp' (https://github.com/opentracing/opentracing-cpp.git) registered for path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:35:09.8266615Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:35:09.8270180Z Submodule 'tools/vcpkg' (https://github.com/Microsoft/vcpkg) registered for path 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:35:09.8305530Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/benchmark'...
2025-12-04T09:35:10.2775317Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp'...
2025-12-04T09:35:10.2776748Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentelemetry-proto'...
2025-12-04T09:35:10.2778059Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp'...
2025-12-04T09:35:10.2779275Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/ms-gsl'...
2025-12-04T09:35:10.3776393Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/googletest'...
2025-12-04T09:35:11.0679408Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/nlohmann-json'...
2025-12-04T09:35:19.1967840Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/tools/vcpkg'...
2025-12-04T09:35:19.9318253Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2'
2025-12-04T09:35:19.9791216Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1'
2025-12-04T09:35:19.9989273Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa'
2025-12-04T09:35:20.1203125Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d'
2025-12-04T09:35:20.1371116Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce'
2025-12-04T09:35:20.1552293Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5'
2025-12-04T09:35:20.1742002Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d'
2025-12-04T09:35:20.1759788Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:20.1762824Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:20.1793558Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'...
2025-12-04T09:35:22.6049454Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'...
2025-12-04T09:35:22.8966280Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4'
2025-12-04T09:35:22.9513399Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
2025-12-04T09:35:23.5125817Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50'
2025-12-04T09:35:23.5268396Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa'
2025-12-04T09:35:23.8427896Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a'
2025-12-04T09:35:23.8454575Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:35:23.8457745Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest'
2025-12-04T09:35:23.8491562Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'...
2025-12-04T09:35:24.3866227Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'...
2025-12-04T09:35:24.8813874Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8'
2025-12-04T09:35:24.9655601Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081'
2025-12-04T09:35:24.9769859Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900'
2025-12-04T09:35:24.9917185Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8'
2025-12-04T09:35:25.0413858Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8'
2025-12-04T09:35:25.0769664Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67'
2025-12-04T09:35:25.1291680Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68'
2025-12-04T09:35:25.1619253Z Submodule path 'third_party/tensorpipe': checked out '2b4cd91092d335a697416b2a3cb398283246849d'
2025-12-04T09:35:25.1640325Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:35:25.1643227Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:35:25.1646664Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:35:25.1649559Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:35:25.1681988Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'...
2025-12-04T09:35:26.5005563Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'...
2025-12-04T09:35:26.5006733Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'...
2025-12-04T09:35:26.5007831Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'...
2025-12-04T09:35:26.5674068Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e'
2025-12-04T09:35:26.5861766Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281'
2025-12-04T09:35:26.6726144Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b'
2025-12-04T09:35:26.7067808Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef'
2025-12-04T09:35:26.7086080Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:35:26.7115615Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'...
2025-12-04T09:35:26.9184401Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5'
2025-12-04T09:35:26.9224208Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0
2025-12-04T09:35:26.9570467Z Entering 'android/libs/fbjni'
2025-12-04T09:35:26.9621879Z Entering 'third_party/FP16'
2025-12-04T09:35:26.9671153Z Entering 'third_party/FXdiv'
2025-12-04T09:35:26.9721959Z Entering 'third_party/NNPACK'
2025-12-04T09:35:26.9769808Z Entering 'third_party/NVTX'
2025-12-04T09:35:26.9818990Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T09:35:26.9867130Z Entering 'third_party/XNNPACK'
2025-12-04T09:35:26.9932414Z Entering 'third_party/aiter'
2025-12-04T09:35:26.9982851Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:35:27.0039945Z Entering 'third_party/benchmark'
2025-12-04T09:35:27.0088300Z Entering 'third_party/composable_kernel'
2025-12-04T09:35:27.0147410Z Entering 'third_party/cpp-httplib'
2025-12-04T09:35:27.0195187Z Entering 'third_party/cpuinfo'
2025-12-04T09:35:27.0244244Z Entering 'third_party/cudnn_frontend'
2025-12-04T09:35:27.0292617Z Entering 'third_party/cutlass'
2025-12-04T09:35:27.0351043Z Entering 'third_party/fbgemm'
2025-12-04T09:35:27.0402024Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T09:35:27.0449060Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:35:27.0508811Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:35:27.0558010Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T09:35:27.0616068Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T09:35:27.0663102Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:35:27.0709565Z Entering 'third_party/fbgemm/external/json'
2025-12-04T09:35:27.0760542Z Entering 'third_party/flash-attention'
2025-12-04T09:35:27.0810682Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:35:27.0864809Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:35:27.0923625Z Entering 'third_party/flatbuffers'
2025-12-04T09:35:27.0975582Z Entering 'third_party/fmt'
2025-12-04T09:35:27.1026371Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:35:27.1076445Z Entering 'third_party/gloo'
2025-12-04T09:35:27.1125081Z Entering 'third_party/googletest'
2025-12-04T09:35:27.1175383Z Entering 'third_party/ideep'
2025-12-04T09:35:27.1221526Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T09:35:27.1280021Z Entering 'third_party/ittapi'
2025-12-04T09:35:27.1328628Z Entering 'third_party/kineto'
2025-12-04T09:35:27.1378054Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:35:27.1425161Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:35:27.1474193Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:35:27.1520497Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:35:27.1568409Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:35:27.1614952Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:35:27.1663824Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:35:27.1711247Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:35:27.1758523Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:35:27.1806859Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:35:27.1854403Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:35:27.1900658Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:27.1950491Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:27.2002054Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:35:27.2050202Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:35:27.2098697Z Entering 'third_party/kleidiai'
2025-12-04T09:35:27.2148381Z Entering 'third_party/mimalloc'
2025-12-04T09:35:27.2198263Z Entering 'third_party/nlohmann'
2025-12-04T09:35:27.2247590Z Entering 'third_party/onnx'
2025-12-04T09:35:27.2316926Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T09:35:27.2365724Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T09:35:27.2415709Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:35:27.2461976Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:35:27.2507582Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:35:27.2553337Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:35:27.2603186Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:35:27.2649002Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:35:27.2700396Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:35:27.2746645Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:27.2796244Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:27.2846497Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:35:27.2917503Z Entering 'third_party/pocketfft'
2025-12-04T09:35:27.2970219Z Entering 'third_party/protobuf'
2025-12-04T09:35:27.3024291Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:35:27.3075440Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T09:35:27.3119985Z Entering 'third_party/psimd'
2025-12-04T09:35:27.3169021Z Entering 'third_party/pthreadpool'
2025-12-04T09:35:27.3219624Z Entering 'third_party/pybind11'
2025-12-04T09:35:27.3267708Z Entering 'third_party/python-peachpy'
2025-12-04T09:35:27.3319068Z Entering 'third_party/sleef'
2025-12-04T09:35:27.3367110Z Entering 'third_party/tensorpipe'
2025-12-04T09:35:27.3415758Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:35:27.3465461Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:35:27.3514828Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:35:27.3562768Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:35:27.3611480Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:35:27.3673571Z ##[endgroup]
2025-12-04T09:35:27.3674183Z ##[group]Persisting credentials for submodules
2025-12-04T09:35:27.3681499Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :"
2025-12-04T09:35:27.4020284Z Entering 'android/libs/fbjni'
2025-12-04T09:35:27.4085493Z Entering 'third_party/FP16'
2025-12-04T09:35:27.4149769Z Entering 'third_party/FXdiv'
2025-12-04T09:35:27.4212058Z Entering 'third_party/NNPACK'
2025-12-04T09:35:27.4274871Z Entering 'third_party/NVTX'
2025-12-04T09:35:27.4338239Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T09:35:27.4400654Z Entering 'third_party/XNNPACK'
2025-12-04T09:35:27.4479739Z Entering 'third_party/aiter'
2025-12-04T09:35:27.4542706Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:35:27.4615544Z Entering 'third_party/benchmark'
2025-12-04T09:35:27.4679575Z Entering 'third_party/composable_kernel'
2025-12-04T09:35:27.4752498Z Entering 'third_party/cpp-httplib'
2025-12-04T09:35:27.4814567Z Entering 'third_party/cpuinfo'
2025-12-04T09:35:27.4882564Z Entering 'third_party/cudnn_frontend'
2025-12-04T09:35:27.4947095Z Entering 'third_party/cutlass'
2025-12-04T09:35:27.5021383Z Entering 'third_party/fbgemm'
2025-12-04T09:35:27.5089381Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T09:35:27.5150241Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:35:27.5228828Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:35:27.5290704Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T09:35:27.5364057Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T09:35:27.5426382Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:35:27.5488037Z Entering 'third_party/fbgemm/external/json'
2025-12-04T09:35:27.5552915Z Entering 'third_party/flash-attention'
2025-12-04T09:35:27.5619206Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:35:27.5694133Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:35:27.5767609Z Entering 'third_party/flatbuffers'
2025-12-04T09:35:27.5834303Z Entering 'third_party/fmt'
2025-12-04T09:35:27.5897034Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:35:27.5960736Z Entering 'third_party/gloo'
2025-12-04T09:35:27.6023938Z Entering 'third_party/googletest'
2025-12-04T09:35:27.6089385Z Entering 'third_party/ideep'
2025-12-04T09:35:27.6151098Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T09:35:27.6221896Z Entering 'third_party/ittapi'
2025-12-04T09:35:27.6285911Z Entering 'third_party/kineto'
2025-12-04T09:35:27.6349326Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:35:27.6414385Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:35:27.6479786Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:35:27.6542320Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:35:27.6605572Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:35:27.6669712Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:35:27.6736537Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:35:27.6801345Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:35:27.6866185Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:35:27.6931989Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:35:27.6998359Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:35:27.7058692Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:27.7126234Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:27.7194565Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:35:27.7257284Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:35:27.7322301Z Entering 'third_party/kleidiai'
2025-12-04T09:35:27.7392858Z Entering 'third_party/mimalloc'
2025-12-04T09:35:27.7456106Z Entering 'third_party/nlohmann'
2025-12-04T09:35:27.7521414Z Entering 'third_party/onnx'
2025-12-04T09:35:27.7606930Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T09:35:27.7673857Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T09:35:27.7738340Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:35:27.7801060Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:35:27.7863077Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:35:27.7925052Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:35:27.7987926Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:35:27.8048327Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:35:27.8109951Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:35:27.8171454Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:27.8237736Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:27.8302639Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:35:27.8387706Z Entering 'third_party/pocketfft'
2025-12-04T09:35:27.8450936Z Entering 'third_party/protobuf'
2025-12-04T09:35:27.8520148Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:35:27.8583104Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T09:35:27.8647633Z Entering 'third_party/psimd'
2025-12-04T09:35:27.8712052Z Entering 'third_party/pthreadpool'
2025-12-04T09:35:27.8778949Z Entering 'third_party/pybind11'
2025-12-04T09:35:27.8841107Z Entering 'third_party/python-peachpy'
2025-12-04T09:35:27.8903372Z Entering 'third_party/sleef'
2025-12-04T09:35:27.8965963Z Entering 'third_party/tensorpipe'
2025-12-04T09:35:27.9031537Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:35:27.9092881Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:35:27.9154016Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:35:27.9214763Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:35:27.9279163Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:35:27.9359686Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url"
2025-12-04T09:35:27.9708958Z Entering 'android/libs/fbjni'
2025-12-04T09:35:27.9768187Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config	remote.origin.url
2025-12-04T09:35:27.9786392Z Entering 'third_party/FP16'
2025-12-04T09:35:27.9845597Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config	remote.origin.url
2025-12-04T09:35:27.9863926Z Entering 'third_party/FXdiv'
2025-12-04T09:35:27.9921921Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config	remote.origin.url
2025-12-04T09:35:27.9940931Z Entering 'third_party/NNPACK'
2025-12-04T09:35:27.9999329Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config	remote.origin.url
2025-12-04T09:35:28.0137728Z Entering 'third_party/NVTX'
2025-12-04T09:35:28.0200459Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config	remote.origin.url
2025-12-04T09:35:28.0247809Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T09:35:28.0324145Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config	remote.origin.url
2025-12-04T09:35:28.0354815Z Entering 'third_party/XNNPACK'
2025-12-04T09:35:28.0413200Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config	remote.origin.url
2025-12-04T09:35:28.0448741Z Entering 'third_party/aiter'
2025-12-04T09:35:28.0508478Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config	remote.origin.url
2025-12-04T09:35:28.0527706Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:35:28.0586482Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config	remote.origin.url
2025-12-04T09:35:28.0614797Z Entering 'third_party/benchmark'
2025-12-04T09:35:28.0674276Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T09:35:28.0692727Z Entering 'third_party/composable_kernel'
2025-12-04T09:35:28.0751774Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config	remote.origin.url
2025-12-04T09:35:28.0780599Z Entering 'third_party/cpp-httplib'
2025-12-04T09:35:28.0839861Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config	remote.origin.url
2025-12-04T09:35:28.0858802Z Entering 'third_party/cpuinfo'
2025-12-04T09:35:28.0919763Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config	remote.origin.url
2025-12-04T09:35:28.0939062Z Entering 'third_party/cudnn_frontend'
2025-12-04T09:35:28.0999936Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config	remote.origin.url
2025-12-04T09:35:28.1019264Z Entering 'third_party/cutlass'
2025-12-04T09:35:28.1081027Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config	remote.origin.url
2025-12-04T09:35:28.1110551Z Entering 'third_party/fbgemm'
2025-12-04T09:35:28.1173256Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config	remote.origin.url
2025-12-04T09:35:28.1194327Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T09:35:28.1254412Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config	remote.origin.url
2025-12-04T09:35:28.1272949Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:35:28.1331687Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config	remote.origin.url
2025-12-04T09:35:28.1358763Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:35:28.1416444Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config	remote.origin.url
2025-12-04T09:35:28.1434711Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T09:35:28.1493252Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config	remote.origin.url
2025-12-04T09:35:28.1521868Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T09:35:28.1582743Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config	remote.origin.url
2025-12-04T09:35:28.1600486Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:35:28.1659863Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config	remote.origin.url
2025-12-04T09:35:28.1677526Z Entering 'third_party/fbgemm/external/json'
2025-12-04T09:35:28.1735691Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config	remote.origin.url
2025-12-04T09:35:28.1757098Z Entering 'third_party/flash-attention'
2025-12-04T09:35:28.1817497Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config	remote.origin.url
2025-12-04T09:35:28.1837059Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:35:28.1898060Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config	remote.origin.url
2025-12-04T09:35:28.1922942Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:35:28.1982659Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config	remote.origin.url
2025-12-04T09:35:28.2010740Z Entering 'third_party/flatbuffers'
2025-12-04T09:35:28.2070885Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config	remote.origin.url
2025-12-04T09:35:28.2093098Z Entering 'third_party/fmt'
2025-12-04T09:35:28.2152750Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config	remote.origin.url
2025-12-04T09:35:28.2171847Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:35:28.2230645Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config	remote.origin.url
2025-12-04T09:35:28.2248968Z Entering 'third_party/gloo'
2025-12-04T09:35:28.2307111Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config	remote.origin.url
2025-12-04T09:35:28.2325736Z Entering 'third_party/googletest'
2025-12-04T09:35:28.2384362Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config	remote.origin.url
2025-12-04T09:35:28.2403007Z Entering 'third_party/ideep'
2025-12-04T09:35:28.2463464Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config	remote.origin.url
2025-12-04T09:35:28.2482210Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T09:35:28.2539658Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config	remote.origin.url
2025-12-04T09:35:28.2567279Z Entering 'third_party/ittapi'
2025-12-04T09:35:28.2627004Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config	remote.origin.url
2025-12-04T09:35:28.2645640Z Entering 'third_party/kineto'
2025-12-04T09:35:28.2704796Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config	remote.origin.url
2025-12-04T09:35:28.2722990Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:35:28.2783274Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config	remote.origin.url
2025-12-04T09:35:28.2800972Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:35:28.2860374Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config	remote.origin.url
2025-12-04T09:35:28.2879907Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:35:28.2938665Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config	remote.origin.url
2025-12-04T09:35:28.2956330Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:35:28.3016250Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config	remote.origin.url
2025-12-04T09:35:28.3034328Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:35:28.3094838Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config	remote.origin.url
2025-12-04T09:35:28.3111766Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:35:28.3172008Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config	remote.origin.url
2025-12-04T09:35:28.3191729Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:35:28.3251227Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config	remote.origin.url
2025-12-04T09:35:28.3269016Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:35:28.3328777Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config	remote.origin.url
2025-12-04T09:35:28.3346844Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:35:28.3406460Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config	remote.origin.url
2025-12-04T09:35:28.3425318Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:35:28.3484580Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config	remote.origin.url
2025-12-04T09:35:28.3502863Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:35:28.3565955Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T09:35:28.3584364Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:28.3643946Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T09:35:28.3664004Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:28.3724075Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T09:35:28.3746535Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:35:28.3805888Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config	remote.origin.url
2025-12-04T09:35:28.3823984Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:35:28.3881601Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config	remote.origin.url
2025-12-04T09:35:28.3901658Z Entering 'third_party/kleidiai'
2025-12-04T09:35:28.3960023Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config	remote.origin.url
2025-12-04T09:35:28.3980381Z Entering 'third_party/mimalloc'
2025-12-04T09:35:28.4040103Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config	remote.origin.url
2025-12-04T09:35:28.4059502Z Entering 'third_party/nlohmann'
2025-12-04T09:35:28.4120412Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config	remote.origin.url
2025-12-04T09:35:28.4141164Z Entering 'third_party/onnx'
2025-12-04T09:35:28.4199805Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config	remote.origin.url
2025-12-04T09:35:28.4238124Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T09:35:28.4300303Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T09:35:28.4321789Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T09:35:28.4384137Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config	remote.origin.url
2025-12-04T09:35:28.4403971Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:35:28.4461944Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T09:35:28.4481278Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:35:28.4538523Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config	remote.origin.url
2025-12-04T09:35:28.4559846Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:35:28.4618498Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config	remote.origin.url
2025-12-04T09:35:28.4636180Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:35:28.4694341Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config	remote.origin.url
2025-12-04T09:35:28.4713884Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:35:28.4772504Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config	remote.origin.url
2025-12-04T09:35:28.4789632Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:35:28.4848052Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config	remote.origin.url
2025-12-04T09:35:28.4865664Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:35:28.4923237Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T09:35:28.4939895Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:28.4999450Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T09:35:28.5019259Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:28.5082890Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T09:35:28.5103033Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:35:28.5160670Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config	remote.origin.url
2025-12-04T09:35:28.5201189Z Entering 'third_party/pocketfft'
2025-12-04T09:35:28.5260782Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config	remote.origin.url
2025-12-04T09:35:28.5279818Z Entering 'third_party/protobuf'
2025-12-04T09:35:28.5339151Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config	remote.origin.url
2025-12-04T09:35:28.5360949Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:35:28.5420368Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T09:35:28.5438499Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T09:35:28.5497197Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config	remote.origin.url
2025-12-04T09:35:28.5517486Z Entering 'third_party/psimd'
2025-12-04T09:35:28.5581047Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config	remote.origin.url
2025-12-04T09:35:28.5600782Z Entering 'third_party/pthreadpool'
2025-12-04T09:35:28.5661129Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config	remote.origin.url
2025-12-04T09:35:28.5680638Z Entering 'third_party/pybind11'
2025-12-04T09:35:28.5740181Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T09:35:28.5759432Z Entering 'third_party/python-peachpy'
2025-12-04T09:35:28.5819977Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config	remote.origin.url
2025-12-04T09:35:28.5838376Z Entering 'third_party/sleef'
2025-12-04T09:35:28.5898883Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config	remote.origin.url
2025-12-04T09:35:28.5918291Z Entering 'third_party/tensorpipe'
2025-12-04T09:35:28.5979401Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config	remote.origin.url
2025-12-04T09:35:28.5997866Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:35:28.6058848Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config	remote.origin.url
2025-12-04T09:35:28.6077231Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:35:28.6135893Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config	remote.origin.url
2025-12-04T09:35:28.6153505Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:35:28.6211809Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config	remote.origin.url
2025-12-04T09:35:28.6230288Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:35:28.6288645Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T09:35:28.6305690Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:35:28.6366393Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config	remote.origin.url
2025-12-04T09:35:28.7613160Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:'
2025-12-04T09:35:28.7962990Z Entering 'android/libs/fbjni'
2025-12-04T09:35:28.8012731Z Entering 'third_party/FP16'
2025-12-04T09:35:28.8059889Z Entering 'third_party/FXdiv'
2025-12-04T09:35:28.8106677Z Entering 'third_party/NNPACK'
2025-12-04T09:35:28.8154434Z Entering 'third_party/NVTX'
2025-12-04T09:35:28.8202186Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T09:35:28.8252709Z Entering 'third_party/XNNPACK'
2025-12-04T09:35:28.8319063Z Entering 'third_party/aiter'
2025-12-04T09:35:28.8368149Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:35:28.8424473Z Entering 'third_party/benchmark'
2025-12-04T09:35:28.8474457Z Entering 'third_party/composable_kernel'
2025-12-04T09:35:28.8536725Z Entering 'third_party/cpp-httplib'
2025-12-04T09:35:28.8587443Z Entering 'third_party/cpuinfo'
2025-12-04T09:35:28.8636642Z Entering 'third_party/cudnn_frontend'
2025-12-04T09:35:28.8685340Z Entering 'third_party/cutlass'
2025-12-04T09:35:28.8743108Z Entering 'third_party/fbgemm'
2025-12-04T09:35:28.8794643Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T09:35:28.8841125Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:35:28.8896812Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:35:28.8945210Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T09:35:28.9001915Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T09:35:28.9051711Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:35:28.9100408Z Entering 'third_party/fbgemm/external/json'
2025-12-04T09:35:28.9157702Z Entering 'third_party/flash-attention'
2025-12-04T09:35:28.9207203Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:35:28.9261123Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:35:28.9318471Z Entering 'third_party/flatbuffers'
2025-12-04T09:35:28.9369495Z Entering 'third_party/fmt'
2025-12-04T09:35:28.9421048Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:35:28.9469556Z Entering 'third_party/gloo'
2025-12-04T09:35:28.9517184Z Entering 'third_party/googletest'
2025-12-04T09:35:28.9566356Z Entering 'third_party/ideep'
2025-12-04T09:35:28.9613999Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T09:35:28.9672404Z Entering 'third_party/ittapi'
2025-12-04T09:35:28.9720386Z Entering 'third_party/kineto'
2025-12-04T09:35:28.9769135Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:35:28.9817809Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:35:28.9867040Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:35:28.9916625Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:35:28.9964095Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:35:29.0010661Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:35:29.0061056Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:35:29.0108595Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:35:29.0155584Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:35:29.0209191Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:35:29.0256338Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:35:29.0303202Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:29.0352328Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:29.0404254Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:35:29.0455872Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:35:29.0506427Z Entering 'third_party/kleidiai'
2025-12-04T09:35:29.0555745Z Entering 'third_party/mimalloc'
2025-12-04T09:35:29.0604718Z Entering 'third_party/nlohmann'
2025-12-04T09:35:29.0653153Z Entering 'third_party/onnx'
2025-12-04T09:35:29.0723559Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T09:35:29.0777273Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T09:35:29.0827992Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:35:29.0876367Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:35:29.0923471Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:35:29.0969872Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:35:29.1020161Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:35:29.1067732Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:35:29.1114920Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:35:29.1160950Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:29.1213714Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:29.1262731Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:35:29.1333182Z Entering 'third_party/pocketfft'
2025-12-04T09:35:29.1382369Z Entering 'third_party/protobuf'
2025-12-04T09:35:29.1433554Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:35:29.1480613Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T09:35:29.1530061Z Entering 'third_party/psimd'
2025-12-04T09:35:29.1577542Z Entering 'third_party/pthreadpool'
2025-12-04T09:35:29.1624119Z Entering 'third_party/pybind11'
2025-12-04T09:35:29.1673444Z Entering 'third_party/python-peachpy'
2025-12-04T09:35:29.1720683Z Entering 'third_party/sleef'
2025-12-04T09:35:29.1769050Z Entering 'third_party/tensorpipe'
2025-12-04T09:35:29.1816692Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:35:29.1863497Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:35:29.1910952Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:35:29.1957029Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:35:29.2001976Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:35:29.2068493Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:'
2025-12-04T09:35:29.2409565Z Entering 'android/libs/fbjni'
2025-12-04T09:35:29.2455820Z Entering 'third_party/FP16'
2025-12-04T09:35:29.2505313Z Entering 'third_party/FXdiv'
2025-12-04T09:35:29.2553935Z Entering 'third_party/NNPACK'
2025-12-04T09:35:29.2603363Z Entering 'third_party/NVTX'
2025-12-04T09:35:29.2651376Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T09:35:29.2700461Z Entering 'third_party/XNNPACK'
2025-12-04T09:35:29.2764658Z Entering 'third_party/aiter'
2025-12-04T09:35:29.2813015Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:35:29.2873060Z Entering 'third_party/benchmark'
2025-12-04T09:35:29.2922134Z Entering 'third_party/composable_kernel'
2025-12-04T09:35:29.2981674Z Entering 'third_party/cpp-httplib'
2025-12-04T09:35:29.3029889Z Entering 'third_party/cpuinfo'
2025-12-04T09:35:29.3078640Z Entering 'third_party/cudnn_frontend'
2025-12-04T09:35:29.3125969Z Entering 'third_party/cutlass'
2025-12-04T09:35:29.3189149Z Entering 'third_party/fbgemm'
2025-12-04T09:35:29.3241008Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T09:35:29.3288724Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:35:29.3347307Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:35:29.3394429Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T09:35:29.3450993Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T09:35:29.3498046Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:35:29.3545025Z Entering 'third_party/fbgemm/external/json'
2025-12-04T09:35:29.3595688Z Entering 'third_party/flash-attention'
2025-12-04T09:35:29.3645117Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:35:29.3700612Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:35:29.3760342Z Entering 'third_party/flatbuffers'
2025-12-04T09:35:29.3812072Z Entering 'third_party/fmt'
2025-12-04T09:35:29.3859714Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:35:29.3908199Z Entering 'third_party/gloo'
2025-12-04T09:35:29.3957580Z Entering 'third_party/googletest'
2025-12-04T09:35:29.4006316Z Entering 'third_party/ideep'
2025-12-04T09:35:29.4053154Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T09:35:29.4109399Z Entering 'third_party/ittapi'
2025-12-04T09:35:29.4158196Z Entering 'third_party/kineto'
2025-12-04T09:35:29.4205199Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:35:29.4251773Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:35:29.4300965Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:35:29.4353794Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:35:29.4400767Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:35:29.4447885Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:35:29.4496882Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:35:29.4544079Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:35:29.4592439Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:35:29.4641564Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:35:29.4688289Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:35:29.4734093Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:29.4784606Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:29.4837262Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:35:29.4884766Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:35:29.4934596Z Entering 'third_party/kleidiai'
2025-12-04T09:35:29.4993609Z Entering 'third_party/mimalloc'
2025-12-04T09:35:29.5035381Z Entering 'third_party/nlohmann'
2025-12-04T09:35:29.5088081Z Entering 'third_party/onnx'
2025-12-04T09:35:29.5156523Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T09:35:29.5208657Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T09:35:29.5261478Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:35:29.5309120Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:35:29.5357955Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:35:29.5405273Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:35:29.5454517Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:35:29.5503243Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:35:29.5549839Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:35:29.5598354Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:29.5646114Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:29.5696088Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:35:29.5766242Z Entering 'third_party/pocketfft'
2025-12-04T09:35:29.5815048Z Entering 'third_party/protobuf'
2025-12-04T09:35:29.5867965Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:35:29.5915103Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T09:35:29.5964794Z Entering 'third_party/psimd'
2025-12-04T09:35:29.6013233Z Entering 'third_party/pthreadpool'
2025-12-04T09:35:29.6061225Z Entering 'third_party/pybind11'
2025-12-04T09:35:29.6110508Z Entering 'third_party/python-peachpy'
2025-12-04T09:35:29.6157734Z Entering 'third_party/sleef'
2025-12-04T09:35:29.6206808Z Entering 'third_party/tensorpipe'
2025-12-04T09:35:29.6254755Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:35:29.6301806Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:35:29.6348321Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:35:29.6399965Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:35:29.6444345Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:35:29.6507253Z ##[endgroup]
2025-12-04T09:35:29.6545834Z [command]/usr/bin/git log -1 --format=%H
2025-12-04T09:35:29.6569927Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:35:29.6677676Z ##[group]Run cd "${GITHUB_WORKSPACE}"
2025-12-04T09:35:29.6678101Z [36;1mcd "${GITHUB_WORKSPACE}"[0m
2025-12-04T09:35:29.6678585Z [36;1m# Clean stale submodule dirs[0m
2025-12-04T09:35:29.6678967Z [36;1mif [ -z "${NO_SUDO}" ]; then[0m
2025-12-04T09:35:29.6679427Z [36;1m  sudo git submodule foreach --recursive git clean -ffdx[0m
2025-12-04T09:35:29.6679873Z [36;1melse[0m
2025-12-04T09:35:29.6680230Z [36;1m  git submodule foreach --recursive git clean -ffdx[0m
2025-12-04T09:35:29.6680677Z [36;1mfi[0m
2025-12-04T09:35:29.6689991Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:29.6690439Z env:
2025-12-04T09:35:29.6690696Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:29.6690991Z   NO_SUDO: true
2025-12-04T09:35:29.6691255Z ##[endgroup]
2025-12-04T09:35:29.7056551Z Entering 'android/libs/fbjni'
2025-12-04T09:35:29.7097025Z Entering 'third_party/FP16'
2025-12-04T09:35:29.7134037Z Entering 'third_party/FXdiv'
2025-12-04T09:35:29.7171358Z Entering 'third_party/NNPACK'
2025-12-04T09:35:29.7212068Z Entering 'third_party/NVTX'
2025-12-04T09:35:29.7256671Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T09:35:29.7294228Z Entering 'third_party/XNNPACK'
2025-12-04T09:35:29.7435463Z Entering 'third_party/aiter'
2025-12-04T09:35:29.7484734Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:35:29.7607027Z Entering 'third_party/benchmark'
2025-12-04T09:35:29.7646951Z Entering 'third_party/composable_kernel'
2025-12-04T09:35:29.7783412Z Entering 'third_party/cpp-httplib'
2025-12-04T09:35:29.7821224Z Entering 'third_party/cpuinfo'
2025-12-04T09:35:29.7863502Z Entering 'third_party/cudnn_frontend'
2025-12-04T09:35:29.7903247Z Entering 'third_party/cutlass'
2025-12-04T09:35:29.8017528Z Entering 'third_party/fbgemm'
2025-12-04T09:35:29.8085485Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T09:35:29.8120165Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:35:29.8252491Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:35:29.8291326Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T09:35:29.8402602Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T09:35:29.8439699Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:35:29.8473218Z Entering 'third_party/fbgemm/external/json'
2025-12-04T09:35:29.8523165Z Entering 'third_party/flash-attention'
2025-12-04T09:35:29.8568921Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:35:29.8682119Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:35:29.8782933Z Entering 'third_party/flatbuffers'
2025-12-04T09:35:29.8861333Z Entering 'third_party/fmt'
2025-12-04T09:35:29.8898993Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:35:29.8935128Z Entering 'third_party/gloo'
2025-12-04T09:35:29.8972655Z Entering 'third_party/googletest'
2025-12-04T09:35:29.9010834Z Entering 'third_party/ideep'
2025-12-04T09:35:29.9043834Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T09:35:29.9139742Z Entering 'third_party/ittapi'
2025-12-04T09:35:29.9178810Z Entering 'third_party/kineto'
2025-12-04T09:35:29.9217706Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:35:29.9259742Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:35:29.9315122Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:35:29.9350683Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:35:29.9387136Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:35:29.9421401Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:35:29.9457738Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:35:29.9492892Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:35:29.9530646Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:35:29.9576436Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:35:29.9611489Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:35:29.9647024Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:29.9704045Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:29.9747701Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:35:29.9784186Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:35:29.9823949Z Entering 'third_party/kleidiai'
2025-12-04T09:35:29.9868866Z Entering 'third_party/mimalloc'
2025-12-04T09:35:29.9907116Z Entering 'third_party/nlohmann'
2025-12-04T09:35:29.9957660Z Entering 'third_party/onnx'
2025-12-04T09:35:30.0333608Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T09:35:30.0375424Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T09:35:30.0439018Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:35:30.0475873Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:35:30.0512789Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:35:30.0547141Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:35:30.0595291Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:35:30.0630513Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:35:30.0666233Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:35:30.0701955Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:30.0754868Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:30.0797777Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:35:30.1097407Z Entering 'third_party/pocketfft'
2025-12-04T09:35:30.1134998Z Entering 'third_party/protobuf'
2025-12-04T09:35:30.1223456Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:35:30.1257796Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T09:35:30.1304072Z Entering 'third_party/psimd'
2025-12-04T09:35:30.1338504Z Entering 'third_party/pthreadpool'
2025-12-04T09:35:30.1374076Z Entering 'third_party/pybind11'
2025-12-04T09:35:30.1412511Z Entering 'third_party/python-peachpy'
2025-12-04T09:35:30.1448686Z Entering 'third_party/sleef'
2025-12-04T09:35:30.1487305Z Entering 'third_party/tensorpipe'
2025-12-04T09:35:30.1526720Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:35:30.1562956Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:35:30.1597822Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:35:30.1638198Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:35:30.1671414Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:35:30.1864161Z Prepare all required actions
2025-12-04T09:35:30.1864785Z Getting action download info
2025-12-04T09:35:30.3423620Z ##[group]Run ./.github/actions/setup-linux
2025-12-04T09:35:30.3423997Z env:
2025-12-04T09:35:30.3424251Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:30.3424553Z ##[endgroup]
2025-12-04T09:35:30.3468875Z ##[group]Run set -euo pipefail
2025-12-04T09:35:30.3469315Z [36;1mset -euo pipefail[0m
2025-12-04T09:35:30.3469661Z [36;1mfunction get_ec2_metadata() {[0m
2025-12-04T09:35:30.3470107Z [36;1m  # Pulled from instance metadata endpoint for EC2[0m
2025-12-04T09:35:30.3470839Z [36;1m  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html[0m
2025-12-04T09:35:30.3471835Z [36;1m  category=$1[0m
2025-12-04T09:35:30.3472261Z [36;1m  # If it is GCP runner (runner name contains gcp), do not run this[0m
2025-12-04T09:35:30.3472765Z [36;1m  runner_name_str=i-00bb8650059fae3eb[0m
2025-12-04T09:35:30.3473223Z [36;1m  if [[ -f /.inarc ]]; then[0m
2025-12-04T09:35:30.3473617Z [36;1m    echo "ARC Runner, no info on ec2 metadata"[0m
2025-12-04T09:35:30.3474078Z [36;1m  elif [[ $runner_name_str == *"gcp"* ]]; then[0m
2025-12-04T09:35:30.3474634Z [36;1m    echo "Runner is from Google Cloud Platform, No info on ec2 metadata"[0m
2025-12-04T09:35:30.3475142Z [36;1m  else[0m
2025-12-04T09:35:30.3476157Z [36;1m    curl -H "X-aws-ec2-metadata-token: $(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 30")" -fsSL "http://169.254.169.254/latest/meta-data/${category}"[0m
2025-12-04T09:35:30.3477253Z [36;1m  fi[0m
2025-12-04T09:35:30.3477509Z [36;1m}[0m
2025-12-04T09:35:30.3477814Z [36;1mecho "ami-id: $(get_ec2_metadata ami-id)"[0m
2025-12-04T09:35:30.3478301Z [36;1mecho "instance-id: $(get_ec2_metadata instance-id)"[0m
2025-12-04T09:35:30.3478869Z [36;1mecho "instance-type: $(get_ec2_metadata instance-type)"[0m
2025-12-04T09:35:30.3479364Z [36;1mecho "system info $(uname -a)"[0m
2025-12-04T09:35:30.3486395Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:30.3486842Z env:
2025-12-04T09:35:30.3487089Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:30.3487379Z ##[endgroup]
2025-12-04T09:35:30.3647605Z ami-id: ami-08982f1c5bf93d976
2025-12-04T09:35:30.3762574Z instance-id: i-00bb8650059fae3eb
2025-12-04T09:35:30.3873475Z instance-type: g4dn.4xlarge
2025-12-04T09:35:30.3885296Z system info Linux ip-10-0-51-5.ec2.internal 6.1.150-174.273.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Sep  9 12:21:26 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
2025-12-04T09:35:30.3908944Z ##[group]Run if [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi
2025-12-04T09:35:30.3909540Z [36;1mif [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi[0m
2025-12-04T09:35:30.3918054Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:30.3918489Z env:
2025-12-04T09:35:30.3918757Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:30.3919072Z ##[endgroup]
2025-12-04T09:35:31.7511673Z Thu Dec  4 09:35:31 2025       
2025-12-04T09:35:31.7512839Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:35:31.7513490Z | NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
2025-12-04T09:35:31.7514107Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:35:31.7514758Z | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
2025-12-04T09:35:31.7515434Z | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
2025-12-04T09:35:31.7515983Z |                                         |                        |               MIG M. |
2025-12-04T09:35:31.7516392Z |=========================================+========================+======================|
2025-12-04T09:35:31.7614688Z |   0  Tesla T4                       Off |   00000000:00:1E.0 Off |                    0 |
2025-12-04T09:35:31.7615641Z | N/A   36C    P0             25W /   70W |       0MiB /  15360MiB |      9%      Default |
2025-12-04T09:35:31.7616135Z |                                         |                        |                  N/A |
2025-12-04T09:35:31.7616692Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:35:31.7617078Z 
2025-12-04T09:35:31.7617306Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:35:31.7617856Z | Processes:                                                                              |
2025-12-04T09:35:31.7618413Z |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
2025-12-04T09:35:31.7618926Z |        ID   ID                                                               Usage      |
2025-12-04T09:35:31.7619359Z |=========================================================================================|
2025-12-04T09:35:31.7619896Z |  No running processes found                                                             |
2025-12-04T09:35:31.7620501Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:35:32.1765859Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T09:35:32.1766992Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:35:32.1776015Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:32.1776559Z env:
2025-12-04T09:35:32.1776815Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:32.1777132Z ##[endgroup]
2025-12-04T09:35:32.1853846Z ##[group]Run if systemctl is-active --quiet docker; then
2025-12-04T09:35:32.1854371Z [36;1mif systemctl is-active --quiet docker; then[0m
2025-12-04T09:35:32.1854838Z [36;1m    echo "Docker daemon is running...";[0m
2025-12-04T09:35:32.1855249Z [36;1melse[0m
2025-12-04T09:35:32.1855656Z [36;1m    echo "Starting docker daemon..." && sudo systemctl start docker;[0m
2025-12-04T09:35:32.1856162Z [36;1mfi[0m
2025-12-04T09:35:32.1863569Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:32.1864021Z env:
2025-12-04T09:35:32.1864279Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:32.1864573Z ##[endgroup]
2025-12-04T09:35:32.1955603Z Docker daemon is running...
2025-12-04T09:35:32.2001091Z ##[group]Run nick-fields/retry@v3.0.0
2025-12-04T09:35:32.2001453Z with:
2025-12-04T09:35:32.2001693Z   shell: bash
2025-12-04T09:35:32.2001936Z   timeout_minutes: 5
2025-12-04T09:35:32.2002221Z   max_attempts: 3
2025-12-04T09:35:32.2002497Z   retry_wait_seconds: 30
2025-12-04T09:35:32.2005225Z   command: AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
    --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

# For LF Runners we need to make sure we also login to Meta's ECR docker registry too.
META_AWS_ACCOUNT_ID=308535385114
if [ "$AWS_ACCOUNT_ID" != "$META_AWS_ACCOUNT_ID" ] ; then
    aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
        --password-stdin "$META_AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
fi

2025-12-04T09:35:32.2008000Z   polling_interval_seconds: 1
2025-12-04T09:35:32.2008335Z   warning_on_retry: true
2025-12-04T09:35:32.2008628Z   continue_on_error: false
2025-12-04T09:35:32.2008930Z env:
2025-12-04T09:35:32.2009177Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:32.2009474Z   AWS_RETRY_MODE: standard
2025-12-04T09:35:32.2009777Z   AWS_MAX_ATTEMPTS: 5
2025-12-04T09:35:32.2010072Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T09:35:32.2010374Z ##[endgroup]
2025-12-04T09:35:33.5309946Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T09:35:33.5310973Z Configure a credential helper to remove this warning. See
2025-12-04T09:35:33.5311658Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T09:35:33.5312114Z 
2025-12-04T09:35:33.5312251Z Login Succeeded
2025-12-04T09:35:34.2959261Z Command completed after 1 attempt(s).
2025-12-04T09:35:34.3018505Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"
2025-12-04T09:35:34.3019252Z [36;1menv | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T09:35:34.3019800Z [36;1menv | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T09:35:34.3028310Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:34.3028763Z env:
2025-12-04T09:35:34.3029024Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:34.3029316Z ##[endgroup]
2025-12-04T09:35:34.3120663Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty
2025-12-04T09:35:34.3121334Z [36;1m# ignore expansion of "docker ps -q" since it could be empty[0m
2025-12-04T09:35:34.3121861Z [36;1m# shellcheck disable=SC2046[0m
2025-12-04T09:35:34.3122262Z [36;1mdocker stop $(docker ps -q) || true[0m
2025-12-04T09:35:34.3122671Z [36;1m# Prune all of the docker images[0m
2025-12-04T09:35:34.3123047Z [36;1mdocker system prune -af[0m
2025-12-04T09:35:34.3130042Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:34.3130486Z env:
2025-12-04T09:35:34.3130753Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:34.3131049Z ##[endgroup]
2025-12-04T09:35:34.3401700Z "docker stop" requires at least 1 argument.
2025-12-04T09:35:34.3402156Z See 'docker stop --help'.
2025-12-04T09:35:34.3402376Z 
2025-12-04T09:35:34.3402565Z Usage:  docker stop [OPTIONS] CONTAINER [CONTAINER...]
2025-12-04T09:35:34.3402897Z 
2025-12-04T09:35:34.3403023Z Stop one or more running containers
2025-12-04T09:35:34.3590001Z Total reclaimed space: 0B
2025-12-04T09:35:34.3802461Z ##[group]Run pytorch/test-infra/.github/actions/calculate-docker-image@main
2025-12-04T09:35:34.3803025Z with:
2025-12-04T09:35:34.3803969Z   docker-image-name: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:34.3805019Z   use-custom-docker-registry: true
2025-12-04T09:35:34.3805391Z   docker-build-dir: .ci/docker
2025-12-04T09:35:34.3805736Z   docker-build-script: ./build.sh
2025-12-04T09:35:34.3806070Z   working-directory: .
2025-12-04T09:35:34.3806644Z   docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:34.3807115Z   force-push: false
2025-12-04T09:35:34.3807369Z env:
2025-12-04T09:35:34.3807621Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:34.3807920Z ##[endgroup]
2025-12-04T09:35:34.3829831Z ##[group]Run set -ex
2025-12-04T09:35:34.3830200Z [36;1mset -ex[0m
2025-12-04T09:35:34.3830451Z [36;1m[0m
2025-12-04T09:35:34.3830991Z [36;1m# If the docker build directory or the build script doesn't exist, the action will[0m
2025-12-04T09:35:34.3831794Z [36;1m# gracefully return the docker image name as it is.  Pulling docker image in Linux[0m
2025-12-04T09:35:34.3832455Z [36;1m# job could then download the pre-built image as usual[0m
2025-12-04T09:35:34.3833276Z [36;1mif [[ -d "${DOCKER_BUILD_DIR}" ]] && [[ -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]] && [[ "${USE_CUSTOM_DOCKER_REGISTRY}" == "true" ]]; then[0m
2025-12-04T09:35:34.3834043Z [36;1m  echo "skip=false" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:35:34.3834439Z [36;1melse[0m
2025-12-04T09:35:34.3834742Z [36;1m  echo "skip=true" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:35:34.3835265Z [36;1m  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:35:34.3835748Z [36;1m[0m
2025-12-04T09:35:34.3836389Z [36;1m  echo "Not using custom ECR registry.  Either it was not requested or there is no Docker build script in the ${REPO_NAME} repo..."[0m
2025-12-04T09:35:34.3837145Z [36;1m  exit 0[0m
2025-12-04T09:35:34.3837401Z [36;1mfi[0m
2025-12-04T09:35:34.3837626Z [36;1m[0m
2025-12-04T09:35:34.3838021Z [36;1mif [[ "${DOCKER_IMAGE_NAME}" == *"${DOCKER_REGISTRY}/${REPO_NAME}"* ]]; then[0m
2025-12-04T09:35:34.3838735Z [36;1m  # The docker image name already includes the ECR prefix and tag, so we can just[0m
2025-12-04T09:35:34.3839374Z [36;1m  # use it as it is, but first let's extract the tag[0m
2025-12-04T09:35:34.3839928Z [36;1m  DOCKER_TAG=$(echo "${DOCKER_IMAGE_NAME}" | awk -F '[:,]' '{print $2}')[0m
2025-12-04T09:35:34.3840533Z [36;1m  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:35:34.3841106Z [36;1m  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:35:34.3841595Z [36;1melse[0m
2025-12-04T09:35:34.3841890Z [36;1m  if [[ "${DOCKER_IMAGE_NAME}" == *:* ]]; then[0m
2025-12-04T09:35:34.3842341Z [36;1m    CUSTOM_TAG_PREFIX=${DOCKER_IMAGE_NAME#*:}[0m
2025-12-04T09:35:34.3842808Z [36;1m    DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME%%:*}[0m
2025-12-04T09:35:34.3843193Z [36;1m  fi[0m
2025-12-04T09:35:34.3843725Z [36;1m  DOCKER_TAG=${CUSTOM_TAG_PREFIX:+${CUSTOM_TAG_PREFIX}-}$(git rev-parse HEAD:"${DOCKER_BUILD_DIR}")[0m
2025-12-04T09:35:34.3844440Z [36;1m  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:35:34.3845194Z [36;1m  echo "docker-image=${DOCKER_REGISTRY}/${REPO_NAME}/${DOCKER_IMAGE_NAME}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:35:34.3846010Z [36;1m  echo "custom-tag-prefix=${CUSTOM_TAG_PREFIX}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:35:34.3846521Z [36;1mfi[0m
2025-12-04T09:35:34.3854076Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:34.3854501Z env:
2025-12-04T09:35:34.3854751Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:34.3855066Z   REPO_NAME: pytorch
2025-12-04T09:35:34.3856173Z   DOCKER_IMAGE_NAME: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:34.3857335Z   DOCKER_BUILD_DIR: .ci/docker
2025-12-04T09:35:34.3857680Z   DOCKER_BUILD_SCRIPT: ./build.sh
2025-12-04T09:35:34.3858132Z   DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:34.3858602Z   USE_CUSTOM_DOCKER_REGISTRY: true
2025-12-04T09:35:34.3858953Z   CUSTOM_TAG_PREFIX: 
2025-12-04T09:35:34.3859242Z ##[endgroup]
2025-12-04T09:35:34.3887992Z + [[ -d .ci/docker ]]
2025-12-04T09:35:34.3888327Z + [[ -f .ci/docker/./build.sh ]]
2025-12-04T09:35:34.3888816Z + [[ true == \t\r\u\e ]]
2025-12-04T09:35:34.3889113Z + echo skip=false
2025-12-04T09:35:34.3890380Z + [[ 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a == *\3\0\8\5\3\5\3\8\5\1\1\4\.\d\k\r\.\e\c\r\.\u\s\-\e\a\s\t\-\1\.\a\m\a\z\o\n\a\w\s\.\c\o\m\/\p\y\t\o\r\c\h* ]]
2025-12-04T09:35:34.3896410Z ++ echo 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:34.3897393Z ++ awk -F '[:,]' '{print $2}'
2025-12-04T09:35:34.3920834Z + DOCKER_TAG=pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:34.3921890Z + echo docker-tag=pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:34.3923426Z + echo docker-image=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:34.3950578Z ##[group]Run set +e
2025-12-04T09:35:34.3950953Z [36;1mset +e[0m
2025-12-04T09:35:34.3951207Z [36;1mset -x[0m
2025-12-04T09:35:34.3951471Z [36;1m[0m
2025-12-04T09:35:34.3951722Z [36;1mlogin() {[0m
2025-12-04T09:35:34.3952268Z [36;1m  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1"[0m
2025-12-04T09:35:34.3952882Z [36;1m}[0m
2025-12-04T09:35:34.3953138Z [36;1m[0m
2025-12-04T09:35:34.3953366Z [36;1mretry () {[0m
2025-12-04T09:35:34.3953679Z [36;1m  $*  || (sleep 1 && $*) || (sleep 2 && $*)[0m
2025-12-04T09:35:34.3954047Z [36;1m}[0m
2025-12-04T09:35:34.3954272Z [36;1m[0m
2025-12-04T09:35:34.3954543Z [36;1mretry login "${DOCKER_REGISTRY}"[0m
2025-12-04T09:35:34.3954900Z [36;1m[0m
2025-12-04T09:35:34.3955149Z [36;1mSTART_TIME=$(date +%s)[0m
2025-12-04T09:35:34.3955480Z [36;1m# Wait up to 120 minutes[0m
2025-12-04T09:35:34.3955903Z [36;1mwhile [[ $(( $(date +%s) - 7200 )) -lt $START_TIME ]]; do[0m
2025-12-04T09:35:34.3956486Z [36;1m  # Check if image already exists, if it does then skip building it[0m
2025-12-04T09:35:34.3957061Z [36;1m  if docker manifest inspect "${DOCKER_IMAGE}"; then[0m
2025-12-04T09:35:34.3957489Z [36;1m    exit 0[0m
2025-12-04T09:35:34.3957758Z [36;1m  fi[0m
2025-12-04T09:35:34.3957994Z [36;1m[0m
2025-12-04T09:35:34.3958452Z [36;1m  # NB: This flag is used by Docker build workflow to push the image to ECR, so we can[0m
2025-12-04T09:35:34.3959248Z [36;1m  # use this to differentiate between the Docker build and regular build jobs. For the[0m
2025-12-04T09:35:34.3960041Z [36;1m  # latter, it will wait for the Docker images to become available before continuing[0m
2025-12-04T09:35:34.3960642Z [36;1m  if [ "${DOCKER_PUSH:-false}" == "true" ]; then[0m
2025-12-04T09:35:34.3961114Z [36;1m    # It's a Docker build job, let's build the image[0m
2025-12-04T09:35:34.3961525Z [36;1m    break[0m
2025-12-04T09:35:34.3961797Z [36;1m  else[0m
2025-12-04T09:35:34.3962192Z [36;1m    # It's a regular build job, wait for the image to become available[0m
2025-12-04T09:35:34.3962678Z [36;1m    sleep 300[0m
2025-12-04T09:35:34.3962962Z [36;1m  fi[0m
2025-12-04T09:35:34.3963200Z [36;1mdone[0m
2025-12-04T09:35:34.3963447Z [36;1m[0m
2025-12-04T09:35:34.3963858Z [36;1m# NB: This part requires a full checkout. Otherwise, the merge base will[0m
2025-12-04T09:35:34.3964692Z [36;1m# be empty.  The default action would be to continue rebuild the image[0m
2025-12-04T09:35:34.3965308Z [36;1mif [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then[0m
2025-12-04T09:35:34.3965846Z [36;1m  # if we're on the base branch then use the parent commit[0m
2025-12-04T09:35:34.3966325Z [36;1m  MERGE_BASE=$(git rev-parse HEAD~)[0m
2025-12-04T09:35:34.3966688Z [36;1melse[0m
2025-12-04T09:35:34.3967075Z [36;1m  # otherwise we're on a PR, so use the most recent base commit[0m
2025-12-04T09:35:34.3967643Z [36;1m  MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")[0m
2025-12-04T09:35:34.3968157Z [36;1mfi[0m
2025-12-04T09:35:34.3968403Z [36;1m[0m
2025-12-04T09:35:34.3968674Z [36;1mif [[ -z "${MERGE_BASE}" ]]; then[0m
2025-12-04T09:35:34.3969104Z [36;1m  echo "rebuild=true" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:35:34.3969493Z [36;1m[0m
2025-12-04T09:35:34.3970046Z [36;1m  echo "Finding merge base only works with full checkout, please set fetch-depth to 0, continuing ..."[0m
2025-12-04T09:35:34.3970709Z [36;1m  exit 0[0m
2025-12-04T09:35:34.3971224Z [36;1mfi[0m
2025-12-04T09:35:34.3971477Z [36;1m[0m
2025-12-04T09:35:34.3971836Z [36;1mif ! git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}"; then[0m
2025-12-04T09:35:34.3972651Z [36;1m  echo "Directory '${DOCKER_BUILD_DIR}' not found in commit $MERGE_BASE, you should rebase onto a more recent commit"[0m
2025-12-04T09:35:34.3973332Z [36;1m  exit 1[0m
2025-12-04T09:35:34.3973588Z [36;1mfi[0m
2025-12-04T09:35:34.3973829Z [36;1m[0m
2025-12-04T09:35:34.3974233Z [36;1mPREVIOUS_DOCKER_TAG=$(git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}")[0m
2025-12-04T09:35:34.3975021Z [36;1m# If no image exists but the hash is the same as the previous hash then we should error out here[0m
2025-12-04T09:35:34.3975730Z [36;1mif [[ "${PREVIOUS_DOCKER_TAG}" == "${DOCKER_TAG}" ]]; then[0m
2025-12-04T09:35:34.3976606Z [36;1m  echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch"[0m
2025-12-04T09:35:34.3977536Z [36;1m  echo "         Will re-build docker image to store in local cache, TTS may be longer"[0m
2025-12-04T09:35:34.3978067Z [36;1mfi[0m
2025-12-04T09:35:34.3978316Z [36;1m[0m
2025-12-04T09:35:34.3978623Z [36;1mecho "rebuild=true" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:35:34.3985540Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:34.3985992Z env:
2025-12-04T09:35:34.3986250Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:34.3986564Z   DOCKER_BUILD_DIR: .ci/docker
2025-12-04T09:35:34.3986979Z   BASE_REVISION: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:35:34.3988083Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:34.3989422Z   DOCKER_TAG: pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:34.3990205Z   DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:34.3990680Z   DOCKER_PUSH: 
2025-12-04T09:35:34.3990949Z ##[endgroup]
2025-12-04T09:35:34.4019598Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:34.4020120Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:34.4022690Z + aws ecr get-login-password --region us-east-1
2025-12-04T09:35:34.4024183Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:35.0202103Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T09:35:35.0202853Z Configure a credential helper to remove this warning. See
2025-12-04T09:35:35.0203530Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T09:35:35.0203987Z 
2025-12-04T09:35:35.0204124Z Login Succeeded
2025-12-04T09:35:35.0219404Z ++ date +%s
2025-12-04T09:35:35.0230365Z + START_TIME=1764840935
2025-12-04T09:35:35.0233660Z ++ date +%s
2025-12-04T09:35:35.0244383Z + [[ 1764833735 -lt 1764840935 ]]
2025-12-04T09:35:35.0245479Z + docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:35.2899016Z {
2025-12-04T09:35:35.2899645Z 	"schemaVersion": 2,
2025-12-04T09:35:35.2900504Z 	"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
2025-12-04T09:35:35.2901031Z 	"config": {
2025-12-04T09:35:35.2901414Z 		"mediaType": "application/vnd.docker.container.image.v1+json",
2025-12-04T09:35:35.2901888Z 		"size": 34787,
2025-12-04T09:35:35.2902646Z 		"digest": "sha256:5465aa79632b68f6240c23f0d0b021df4d0fd595333b61a40d36a0cf73656024"
2025-12-04T09:35:35.2903190Z 	},
2025-12-04T09:35:35.2903431Z 	"layers": [
2025-12-04T09:35:35.2903679Z 		{
2025-12-04T09:35:35.2904047Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2904534Z 			"size": 30447951,
2025-12-04T09:35:35.2905037Z 			"digest": "sha256:63e5bc7682b85ae57a1221210f64d62e7a90b0a30f19af4ca734b8242ae49d63"
2025-12-04T09:35:35.2905589Z 		},
2025-12-04T09:35:35.2905797Z 		{
2025-12-04T09:35:35.2906171Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2906652Z 			"size": 1554,
2025-12-04T09:35:35.2907107Z 			"digest": "sha256:835841cca3b7e1464290cdb78e48773e03583413fbed852c3cc5165a392ea44d"
2025-12-04T09:35:35.2907653Z 		},
2025-12-04T09:35:35.2907872Z 		{
2025-12-04T09:35:35.2908235Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2908719Z 			"size": 313276213,
2025-12-04T09:35:35.2909230Z 			"digest": "sha256:1bf1bb125deaa5b8a3adf121671e87ba2fa7e229f9eb1dff7ade581cb737175a"
2025-12-04T09:35:35.2909777Z 		},
2025-12-04T09:35:35.2909997Z 		{
2025-12-04T09:35:35.2910371Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2910837Z 			"size": 787,
2025-12-04T09:35:35.2911309Z 			"digest": "sha256:b21856d1bf420da6fa8ec7331b82ab355d4f4178644e7d3a3d3d0fbc3610109a"
2025-12-04T09:35:35.2911866Z 		},
2025-12-04T09:35:35.2912090Z 		{
2025-12-04T09:35:35.2912448Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2912928Z 			"size": 106,
2025-12-04T09:35:35.2913416Z 			"digest": "sha256:848ba2c095e2b9e6acfb0ecf077adb526fb2fa82ed44cf6648ebde97f296f8ec"
2025-12-04T09:35:35.2913967Z 		},
2025-12-04T09:35:35.2914187Z 		{
2025-12-04T09:35:35.2914557Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2915026Z 			"size": 704,
2025-12-04T09:35:35.2915503Z 			"digest": "sha256:029495b23122c840ca0e52d487afa8d2c4dbf1991cd7f204ec3e434dcf947bf4"
2025-12-04T09:35:35.2916052Z 		},
2025-12-04T09:35:35.2916257Z 		{
2025-12-04T09:35:35.2916787Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2917316Z 			"size": 1216,
2025-12-04T09:35:35.2917781Z 			"digest": "sha256:073bb82063cfba4639b11fea43753dbb128f9238353189fc02d2e2aa0b2ad359"
2025-12-04T09:35:35.2918333Z 		},
2025-12-04T09:35:35.2918555Z 		{
2025-12-04T09:35:35.2918929Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2919398Z 			"size": 484,
2025-12-04T09:35:35.2919867Z 			"digest": "sha256:59b63930883363c7d2aaab27cc61555d9f3e119dc18247a8624c98ebdaa354a5"
2025-12-04T09:35:35.2920409Z 		},
2025-12-04T09:35:35.2920613Z 		{
2025-12-04T09:35:35.2920982Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2921465Z 			"size": 110362071,
2025-12-04T09:35:35.2921934Z 			"digest": "sha256:1c6177b2970db2d7743b4337c420a35f2ec79f338c30d97d534a1f0987c00913"
2025-12-04T09:35:35.2922482Z 		},
2025-12-04T09:35:35.2922703Z 		{
2025-12-04T09:35:35.2923062Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2923544Z 			"size": 4961,
2025-12-04T09:35:35.2924024Z 			"digest": "sha256:fabe466dd5f33c3209a56abf5cb46b9b07fe21c57fb43b98e13308c8665c0864"
2025-12-04T09:35:35.2924579Z 		},
2025-12-04T09:35:35.2924781Z 		{
2025-12-04T09:35:35.2925368Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2925859Z 			"size": 1755,
2025-12-04T09:35:35.2926314Z 			"digest": "sha256:2b5a11b41761d8ea3b829e4772e4064cb6c4e4989126af324d0057661e4493a1"
2025-12-04T09:35:35.2926863Z 		},
2025-12-04T09:35:35.2927080Z 		{
2025-12-04T09:35:35.2927439Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2927925Z 			"size": 724,
2025-12-04T09:35:35.2928388Z 			"digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084"
2025-12-04T09:35:35.2928994Z 		},
2025-12-04T09:35:35.2929211Z 		{
2025-12-04T09:35:35.2929582Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2930046Z 			"size": 544,
2025-12-04T09:35:35.2930518Z 			"digest": "sha256:dc0780902fca810498f16efa71f8e5990385f141a0cfcc552616a4acc434f79a"
2025-12-04T09:35:35.2931070Z 		},
2025-12-04T09:35:35.2931296Z 		{
2025-12-04T09:35:35.2931661Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2932151Z 			"size": 3185191720,
2025-12-04T09:35:35.2932646Z 			"digest": "sha256:5b09a2b135c8e540e2b9374b68991afdd63a5dfaba75fb44efe054a591f400c2"
2025-12-04T09:35:35.2933185Z 		},
2025-12-04T09:35:35.2933410Z 		{
2025-12-04T09:35:35.2933783Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2934248Z 			"size": 32,
2025-12-04T09:35:35.2934724Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:35.2935274Z 		},
2025-12-04T09:35:35.2935482Z 		{
2025-12-04T09:35:35.2935854Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2936426Z 			"size": 396,
2025-12-04T09:35:35.2936902Z 			"digest": "sha256:5bfdaeb5578d6ffcd7db29c48303cbceb13c591210feaa216a8daa7a6d445b4b"
2025-12-04T09:35:35.2937470Z 		},
2025-12-04T09:35:35.2937690Z 		{
2025-12-04T09:35:35.2938065Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2938547Z 			"size": 236865,
2025-12-04T09:35:35.2939023Z 			"digest": "sha256:0ef42867f370b8a14b8c301388793b78a0bd2533bb2a317b129b03c8667dc767"
2025-12-04T09:35:35.2939567Z 		},
2025-12-04T09:35:35.2939775Z 		{
2025-12-04T09:35:35.2940149Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2940628Z 			"size": 230,
2025-12-04T09:35:35.2941076Z 			"digest": "sha256:446083e497f322789c2d87933a77fb2dfd94e18d2e85f6d4362e6e9521b82c4e"
2025-12-04T09:35:35.2941619Z 		},
2025-12-04T09:35:35.2941836Z 		{
2025-12-04T09:35:35.2942203Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2942684Z 			"size": 3043500,
2025-12-04T09:35:35.2943167Z 			"digest": "sha256:d8a170bef0f4e0e28f5ba0952320dd465552adf74f0864b4f47cc11f4c4f82f7"
2025-12-04T09:35:35.2943717Z 		},
2025-12-04T09:35:35.2943922Z 		{
2025-12-04T09:35:35.2944296Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2944779Z 			"size": 1472,
2025-12-04T09:35:35.2945248Z 			"digest": "sha256:e2b6cd6a5bd0418a1e4aca3f37942324d4d9f9b0177597e37fc8d1a5626048e1"
2025-12-04T09:35:35.2945797Z 		},
2025-12-04T09:35:35.2946015Z 		{
2025-12-04T09:35:35.2946378Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2946873Z 			"size": 481,
2025-12-04T09:35:35.2947341Z 			"digest": "sha256:93efc0181a22218a544413f1d57e9e0e7a0f492e41bef598084c5b9177e3987a"
2025-12-04T09:35:35.2947885Z 		},
2025-12-04T09:35:35.2948091Z 		{
2025-12-04T09:35:35.2948468Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2948956Z 			"size": 202,
2025-12-04T09:35:35.2949411Z 			"digest": "sha256:7454c938f17425bcf167ad28a62b42b95f638a7d2cf0840885cfe5ffe8480a12"
2025-12-04T09:35:35.2949963Z 		},
2025-12-04T09:35:35.2950188Z 		{
2025-12-04T09:35:35.2950550Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2951037Z 			"size": 607,
2025-12-04T09:35:35.2951613Z 			"digest": "sha256:4d57ff55f6d4161cb6c29e2c0b08d47e65898427db3938479158684899f0023d"
2025-12-04T09:35:35.2952163Z 		},
2025-12-04T09:35:35.2952371Z 		{
2025-12-04T09:35:35.2952747Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2953236Z 			"size": 6243016141,
2025-12-04T09:35:35.2953717Z 			"digest": "sha256:b0301534b4a58072d5b140b08a7608bbead41d126fa29fdc78c1e8a43ebb865d"
2025-12-04T09:35:35.2954274Z 		},
2025-12-04T09:35:35.2954500Z 		{
2025-12-04T09:35:35.2954857Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2955410Z 			"size": 829,
2025-12-04T09:35:35.2955875Z 			"digest": "sha256:1969e15d0c13874ea5883ed829235a19ef6dc21c8aa6172032b78a8ffa6ff262"
2025-12-04T09:35:35.2956404Z 		},
2025-12-04T09:35:35.2956624Z 		{
2025-12-04T09:35:35.2956995Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2957462Z 			"size": 33450177,
2025-12-04T09:35:35.2957961Z 			"digest": "sha256:73180a0f2d5a961a0cc0ba2c3cf375fdcfb43ae5e4e5c63a000c4b4366d52a64"
2025-12-04T09:35:35.2958516Z 		},
2025-12-04T09:35:35.2958735Z 		{
2025-12-04T09:35:35.2959092Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2959576Z 			"size": 104,
2025-12-04T09:35:35.2960048Z 			"digest": "sha256:ad81b25cb69f8cf42a4a96678a64b7d0598a8f95236a3e63d1fec4e53edff613"
2025-12-04T09:35:35.2960588Z 		},
2025-12-04T09:35:35.2960806Z 		{
2025-12-04T09:35:35.2961184Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2961650Z 			"size": 1496,
2025-12-04T09:35:35.2962132Z 			"digest": "sha256:8165374f8dccf88a7791a5d31afbe29e4d4542b4f1cf1904945e07f9af6bf8ba"
2025-12-04T09:35:35.2962685Z 		},
2025-12-04T09:35:35.2962890Z 		{
2025-12-04T09:35:35.2963262Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2963744Z 			"size": 458786969,
2025-12-04T09:35:35.2964223Z 			"digest": "sha256:7779c0bb9be2030df9060b526b98d0afeed1ce5b61ee0530321ef04a4e145e8c"
2025-12-04T09:35:35.2964781Z 		},
2025-12-04T09:35:35.2965001Z 		{
2025-12-04T09:35:35.2965374Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2965845Z 			"size": 164,
2025-12-04T09:35:35.2966314Z 			"digest": "sha256:4d0a1c027262ed8c83181b931b64afa1c41c3cac97580231c4cae3a524ebd7d5"
2025-12-04T09:35:35.2966861Z 		},
2025-12-04T09:35:35.2967071Z 		{
2025-12-04T09:35:35.2967449Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2967932Z 			"size": 346,
2025-12-04T09:35:35.2968385Z 			"digest": "sha256:a51e0dab2d596e6563483f27c12660007160847d177ba4c31812a8f44ada5754"
2025-12-04T09:35:35.2968929Z 		},
2025-12-04T09:35:35.2969148Z 		{
2025-12-04T09:35:35.2969509Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2969989Z 			"size": 32,
2025-12-04T09:35:35.2970459Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:35.2971266Z 		},
2025-12-04T09:35:35.2971492Z 		{
2025-12-04T09:35:35.2971871Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2972355Z 			"size": 106,
2025-12-04T09:35:35.2972817Z 			"digest": "sha256:3eb6d4ff040b8761b1e3e1da768bdb884ce0e5324e3d0f6471b0a8b2ddf4736f"
2025-12-04T09:35:35.2973371Z 		},
2025-12-04T09:35:35.2973590Z 		{
2025-12-04T09:35:35.2973951Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2974438Z 			"size": 424,
2025-12-04T09:35:35.2974903Z 			"digest": "sha256:b168858b85373f8ddca549d79267a06de4fa945d04bf791c55c9ddc93957fa3c"
2025-12-04T09:35:35.2975446Z 		},
2025-12-04T09:35:35.2975663Z 		{
2025-12-04T09:35:35.2976034Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2976578Z 			"size": 19309367,
2025-12-04T09:35:35.2977060Z 			"digest": "sha256:d77a39278026a8899e2f97643918bdcf96e711ca26951880b4841b319dc71321"
2025-12-04T09:35:35.2977595Z 		},
2025-12-04T09:35:35.2977812Z 		{
2025-12-04T09:35:35.2978336Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2978819Z 			"size": 108,
2025-12-04T09:35:35.2979302Z 			"digest": "sha256:36fbd357280b6b40e90f36ac3d19da3da10e5dbf0027a5cfe8e2f29d1870d347"
2025-12-04T09:35:35.2979846Z 		},
2025-12-04T09:35:35.2980067Z 		{
2025-12-04T09:35:35.2980442Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2980912Z 			"size": 826,
2025-12-04T09:35:35.2981390Z 			"digest": "sha256:4e3b10a5dd6aed29f238d604925e2a4f873141c1087c8dd4fdde5c61e7560893"
2025-12-04T09:35:35.2982057Z 		},
2025-12-04T09:35:35.2982266Z 		{
2025-12-04T09:35:35.2982640Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.2983125Z 			"size": 724,
2025-12-04T09:35:35.2983572Z 			"digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084"
2025-12-04T09:35:35.2984114Z 		},
2025-12-04T09:35:35.2984329Z 		{
2025-12-04T09:35:35.2999624Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3000270Z 			"size": 149,
2025-12-04T09:35:35.3000763Z 			"digest": "sha256:3092fab73b59190b9facfc49bf18f58612172bc2fd68dfa339a1118632616939"
2025-12-04T09:35:35.3001312Z 		},
2025-12-04T09:35:35.3001542Z 		{
2025-12-04T09:35:35.3001929Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3002403Z 			"size": 136,
2025-12-04T09:35:35.3002890Z 			"digest": "sha256:20020dd28a15ba092fcbfe906ee39cdddfcc9d0b7eb42fdd6f4c08a984fa9c00"
2025-12-04T09:35:35.3003453Z 		},
2025-12-04T09:35:35.3003687Z 		{
2025-12-04T09:35:35.3004054Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3004534Z 			"size": 140,
2025-12-04T09:35:35.3005049Z 			"digest": "sha256:ae5280ce969dcff08c091e9a5f7641f13561b2b0ee44d78b7c3f81d8fe8e6d32"
2025-12-04T09:35:35.3005594Z 		},
2025-12-04T09:35:35.3005813Z 		{
2025-12-04T09:35:35.3006186Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3006662Z 			"size": 32,
2025-12-04T09:35:35.3007136Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:35.3007689Z 		},
2025-12-04T09:35:35.3007896Z 		{
2025-12-04T09:35:35.3008273Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3008750Z 			"size": 223,
2025-12-04T09:35:35.3009205Z 			"digest": "sha256:026e4484b749dfc556dcf7c8f45c1759518a89072e4dbc974d9405ada1582d03"
2025-12-04T09:35:35.3009749Z 		},
2025-12-04T09:35:35.3009964Z 		{
2025-12-04T09:35:35.3010344Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3010812Z 			"size": 256,
2025-12-04T09:35:35.3011300Z 			"digest": "sha256:1be9da2ce53d20d8befad5c024ee0eb41ee35984307cbd5621d8effae0353073"
2025-12-04T09:35:35.3011864Z 		},
2025-12-04T09:35:35.3012069Z 		{
2025-12-04T09:35:35.3012445Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3012924Z 			"size": 32,
2025-12-04T09:35:35.3013389Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:35.3013941Z 		},
2025-12-04T09:35:35.3014156Z 		{
2025-12-04T09:35:35.3014514Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3014987Z 			"size": 106,
2025-12-04T09:35:35.3015450Z 			"digest": "sha256:6481b7a1d9fb4001fd6f9e2a8d1600192529ddb957128e41671ca4630fa06ad4"
2025-12-04T09:35:35.3015993Z 		},
2025-12-04T09:35:35.3016198Z 		{
2025-12-04T09:35:35.3016662Z + exit 0
2025-12-04T09:35:35.3017046Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3017536Z 			"size": 312293471,
2025-12-04T09:35:35.3018026Z 			"digest": "sha256:fa519d18c39d8f297109c056017ebce7efc322d058afd27fdac5880d6c8d35b0"
2025-12-04T09:35:35.3018580Z 		},
2025-12-04T09:35:35.3018799Z 		{
2025-12-04T09:35:35.3019176Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3019649Z 			"size": 3058012325,
2025-12-04T09:35:35.3020313Z 			"digest": "sha256:d172f25b97f78fce0f6c6701f0db794b1c994a9cdf8cff9ddc6bdd1a1bea835c"
2025-12-04T09:35:35.3020884Z 		},
2025-12-04T09:35:35.3021092Z 		{
2025-12-04T09:35:35.3021471Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3021958Z 			"size": 129,
2025-12-04T09:35:35.3022420Z 			"digest": "sha256:fd60ab6b1c2c85a932e9894b5d0cf5c9e75fa21782e3028ea40d76017ecfbf85"
2025-12-04T09:35:35.3022977Z 		},
2025-12-04T09:35:35.3023197Z 		{
2025-12-04T09:35:35.3023563Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3024153Z 			"size": 880,
2025-12-04T09:35:35.3024632Z 			"digest": "sha256:0afe45579c2c87002db8c1abf7b32a748e6cb3b9b57e9b391f91cad9f84df476"
2025-12-04T09:35:35.3025187Z 		},
2025-12-04T09:35:35.3025393Z 		{
2025-12-04T09:35:35.3025765Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3026252Z 			"size": 724,
2025-12-04T09:35:35.3026707Z 			"digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084"
2025-12-04T09:35:35.3027248Z 		},
2025-12-04T09:35:35.3027466Z 		{
2025-12-04T09:35:35.3027826Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3028305Z 			"size": 139,
2025-12-04T09:35:35.3028769Z 			"digest": "sha256:5884ffd6720b47274f651262d5f9224f55960f9ea717faafe332aa20afb0ffa4"
2025-12-04T09:35:35.3029302Z 		},
2025-12-04T09:35:35.3029519Z 		{
2025-12-04T09:35:35.3029891Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3030363Z 			"size": 32,
2025-12-04T09:35:35.3030834Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:35.3031384Z 		},
2025-12-04T09:35:35.3031601Z 		{
2025-12-04T09:35:35.3031961Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3032439Z 			"size": 160,
2025-12-04T09:35:35.3032931Z 			"digest": "sha256:ab7a7c316fa7a9b7a96304ce96fafdffbc5cc6b960a4bb2def9131b36d9225c5"
2025-12-04T09:35:35.3033483Z 		},
2025-12-04T09:35:35.3033705Z 		{
2025-12-04T09:35:35.3034081Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3034549Z 			"size": 1012,
2025-12-04T09:35:35.3035032Z 			"digest": "sha256:c7775ce5574bdde75b4c09a1db19f7d0dc027f1f4c1f961022fc55833133e616"
2025-12-04T09:35:35.3035587Z 		},
2025-12-04T09:35:35.3035794Z 		{
2025-12-04T09:35:35.3036170Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3036651Z 			"size": 724,
2025-12-04T09:35:35.3037107Z 			"digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084"
2025-12-04T09:35:35.3037652Z 		},
2025-12-04T09:35:35.3037870Z 		{
2025-12-04T09:35:35.3038241Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3038710Z 			"size": 134,
2025-12-04T09:35:35.3039180Z 			"digest": "sha256:81945c4fb228ca73f4bac38b6d8a1eca7139585d4a078219dfaa16ea13945949"
2025-12-04T09:35:35.3039735Z 		},
2025-12-04T09:35:35.3039950Z 		{
2025-12-04T09:35:35.3040323Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3040807Z 			"size": 32,
2025-12-04T09:35:35.3041265Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:35.3041817Z 		},
2025-12-04T09:35:35.3042034Z 		{
2025-12-04T09:35:35.3042398Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3042873Z 			"size": 158,
2025-12-04T09:35:35.3043346Z 			"digest": "sha256:663cbe24d60bf42bc7a440cb4867e4287cacf54194dd3152406668e61d7e92e5"
2025-12-04T09:35:35.3043905Z 		},
2025-12-04T09:35:35.3044108Z 		{
2025-12-04T09:35:35.3044480Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3044962Z 			"size": 603,
2025-12-04T09:35:35.3045404Z 			"digest": "sha256:43f216b027865c8ca16f855703465445f3a548614a4d7e29387337b9651ac25c"
2025-12-04T09:35:35.3045936Z 		},
2025-12-04T09:35:35.3046151Z 		{
2025-12-04T09:35:35.3046601Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3047090Z 			"size": 724,
2025-12-04T09:35:35.3047553Z 			"digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084"
2025-12-04T09:35:35.3048079Z 		},
2025-12-04T09:35:35.3048302Z 		{
2025-12-04T09:35:35.3048678Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3049144Z 			"size": 155,
2025-12-04T09:35:35.3049618Z 			"digest": "sha256:c47c3cfeb68763aa19727693ad52fe0c80561a98139adaa2ab5eccea35c2d1b4"
2025-12-04T09:35:35.3050239Z 		},
2025-12-04T09:35:35.3050457Z 		{
2025-12-04T09:35:35.3050817Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3051297Z 			"size": 32,
2025-12-04T09:35:35.3051773Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:35.3052311Z 		},
2025-12-04T09:35:35.3052533Z 		{
2025-12-04T09:35:35.3052910Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3053377Z 			"size": 188,
2025-12-04T09:35:35.3053854Z 			"digest": "sha256:7d326b9e267322de9337ac2a71ddeac4cb61f28a018a6155863f83a164ad9437"
2025-12-04T09:35:35.3054407Z 		},
2025-12-04T09:35:35.3054612Z 		{
2025-12-04T09:35:35.3054987Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3055469Z 			"size": 1370,
2025-12-04T09:35:35.3055930Z 			"digest": "sha256:7ec8f17141c8335192fa21b660dfe1fe0ad16b202bc234e7d4ef063b35124158"
2025-12-04T09:35:35.3056566Z 		},
2025-12-04T09:35:35.3056790Z 		{
2025-12-04T09:35:35.3057161Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3057630Z 			"size": 32,
2025-12-04T09:35:35.3058107Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:35.3058663Z 		},
2025-12-04T09:35:35.3058868Z 		{
2025-12-04T09:35:35.3059239Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3059725Z 			"size": 136,
2025-12-04T09:35:35.3060184Z 			"digest": "sha256:26249ea175bf816b87c4c83e5efb78fd386a800fa10e819ba85b06858bcf877e"
2025-12-04T09:35:35.3060734Z 		},
2025-12-04T09:35:35.3060951Z 		{
2025-12-04T09:35:35.3061310Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3061790Z 			"size": 529,
2025-12-04T09:35:35.3062259Z 			"digest": "sha256:5e8e9ccb36f30a8c3a7e6a5011ee5001152f36c9c749397f3e234b1822326dd0"
2025-12-04T09:35:35.3062806Z 		},
2025-12-04T09:35:35.3063010Z 		{
2025-12-04T09:35:35.3063387Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3063865Z 			"size": 32,
2025-12-04T09:35:35.3064324Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:35.3064877Z 		},
2025-12-04T09:35:35.3065095Z 		{
2025-12-04T09:35:35.3065450Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3065928Z 			"size": 104,
2025-12-04T09:35:35.3066400Z 			"digest": "sha256:5bc72d4e1de83a1a254e8808f727118dd54cf048c14ff298a5299e015a116bfd"
2025-12-04T09:35:35.3066934Z 		},
2025-12-04T09:35:35.3067152Z 		{
2025-12-04T09:35:35.3067525Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3067993Z 			"size": 436,
2025-12-04T09:35:35.3068461Z 			"digest": "sha256:83cddbd497794c27254e11c4c00105d1f61399e7fef9d208a0be250724efd2c0"
2025-12-04T09:35:35.3069009Z 		},
2025-12-04T09:35:35.3069224Z 		{
2025-12-04T09:35:35.3069580Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3070071Z 			"size": 32,
2025-12-04T09:35:35.3070544Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:35.3071276Z 		},
2025-12-04T09:35:35.3071502Z 		{
2025-12-04T09:35:35.3071877Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3072345Z 			"size": 109,
2025-12-04T09:35:35.3072984Z 			"digest": "sha256:60c25d8c3dd2d78785f659204d0b1e64954ca581f89874b68ffe8fee23c6b661"
2025-12-04T09:35:35.3073534Z 		},
2025-12-04T09:35:35.3073762Z 		{
2025-12-04T09:35:35.3074119Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3074603Z 			"size": 1896,
2025-12-04T09:35:35.3075095Z 			"digest": "sha256:a534dcf4b9a9e5fabed742c8a8fc43c9cfe7346ea88ab3c177c3b14fd3afe00a"
2025-12-04T09:35:35.3075664Z 		},
2025-12-04T09:35:35.3075868Z 		{
2025-12-04T09:35:35.3076241Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3076821Z 			"size": 245582017,
2025-12-04T09:35:35.3077299Z 			"digest": "sha256:10138310c65c78d7de8375225ce37f5f7bfae7898e4e8bbcb90bd56a1bd05db4"
2025-12-04T09:35:35.3077849Z 		},
2025-12-04T09:35:35.3078066Z 		{
2025-12-04T09:35:35.3078423Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3078904Z 			"size": 106,
2025-12-04T09:35:35.3079383Z 			"digest": "sha256:8487679f252b6fb703dc9398d73aaeec68df724bfc961579ec5bdae62ebe3a37"
2025-12-04T09:35:35.3079918Z 		},
2025-12-04T09:35:35.3080135Z 		{
2025-12-04T09:35:35.3080503Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3080981Z 			"size": 162,
2025-12-04T09:35:35.3081439Z 			"digest": "sha256:52580ee2caa9ab69b0ac640315ee350e847cd0955c0a1eafa933a076669e87ad"
2025-12-04T09:35:35.3081980Z 		},
2025-12-04T09:35:35.3082194Z 		{
2025-12-04T09:35:35.3082551Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3083029Z 			"size": 7944,
2025-12-04T09:35:35.3083518Z 			"digest": "sha256:741c215cb2ffb295ab6a07fab3f0dfdde029463779ff9c0bbff4add26a340cfb"
2025-12-04T09:35:35.3084060Z 		},
2025-12-04T09:35:35.3084273Z 		{
2025-12-04T09:35:35.3084641Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3085106Z 			"size": 8070,
2025-12-04T09:35:35.3085568Z 			"digest": "sha256:d17f5aba17a608d1c7851cb3940a25d43f063385813051127074f693d0ede19b"
2025-12-04T09:35:35.3086117Z 		},
2025-12-04T09:35:35.3086323Z 		{
2025-12-04T09:35:35.3086688Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3087163Z 			"size": 304,
2025-12-04T09:35:35.3087638Z 			"digest": "sha256:bc08246bb4ba18c3ec5bc69e16b6b4e929c5bd0f3fae10eeb0b1a622a63d6fa2"
2025-12-04T09:35:35.3088187Z 		},
2025-12-04T09:35:35.3088409Z 		{
2025-12-04T09:35:35.3088780Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3089249Z 			"size": 23755574,
2025-12-04T09:35:35.3089730Z 			"digest": "sha256:7323bf084bf98f915db061b178c56525a0f95bd34d211b381c7527ad242c5a58"
2025-12-04T09:35:35.3090272Z 		},
2025-12-04T09:35:35.3090472Z 		{
2025-12-04T09:35:35.3090836Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3091314Z 			"size": 108,
2025-12-04T09:35:35.3091786Z 			"digest": "sha256:d344ecc97fd77c7d12fd68ddb67aeb6cc3dd2e723de5ad1ca2c80b45c8d6bd77"
2025-12-04T09:35:35.3092341Z 		},
2025-12-04T09:35:35.3092553Z 		{
2025-12-04T09:35:35.3092912Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3093393Z 			"size": 54145663,
2025-12-04T09:35:35.3093881Z 			"digest": "sha256:fb60b2d2147ff57c218f449f5b680132af8f7f8032ed69f422b48a3c3c1424f4"
2025-12-04T09:35:35.3094429Z 		},
2025-12-04T09:35:35.3094636Z 		{
2025-12-04T09:35:35.3095003Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:35.3095485Z 			"size": 32,
2025-12-04T09:35:35.3095941Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:35.3096610Z 		}
2025-12-04T09:35:35.3096832Z 	]
2025-12-04T09:35:35.3097035Z }
2025-12-04T09:35:35.3127369Z ##[group]Run set -eux
2025-12-04T09:35:35.3127703Z [36;1mset -eux[0m
2025-12-04T09:35:35.3128186Z [36;1m# It's ok if this steps fails, it would then be an anonymous user like what we used to have[0m
2025-12-04T09:35:35.3129683Z [36;1maws secretsmanager get-secret-value --secret-id docker_hub_readonly_token | jq --raw-output '.SecretString' | jq -r .docker_hub_readonly_token | docker login --username pytorchbot --password-stdin || true[0m
2025-12-04T09:35:35.3138172Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:35.3138627Z env:
2025-12-04T09:35:35.3138864Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:35.3139167Z ##[endgroup]
2025-12-04T09:35:35.3170444Z + aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token
2025-12-04T09:35:35.3171386Z + jq --raw-output .SecretString
2025-12-04T09:35:35.3172740Z + jq -r .docker_hub_readonly_token
2025-12-04T09:35:35.3173846Z + docker login --username pytorchbot --password-stdin
2025-12-04T09:35:35.9808775Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T09:35:35.9809586Z Configure a credential helper to remove this warning. See
2025-12-04T09:35:35.9810665Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T09:35:35.9811139Z 
2025-12-04T09:35:35.9811280Z Login Succeeded
2025-12-04T09:35:35.9907156Z ##[group]Run tag=${ECR_DOCKER_IMAGE##*:}
2025-12-04T09:35:35.9907607Z [36;1mtag=${ECR_DOCKER_IMAGE##*:}[0m
2025-12-04T09:35:35.9908079Z [36;1mecho "docker pull ghcr.io/pytorch/ci-image:${tag/:/-}"[0m
2025-12-04T09:35:35.9914992Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:35.9915438Z env:
2025-12-04T09:35:35.9915689Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:35.9916675Z   ECR_DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:35.9917712Z ##[endgroup]
2025-12-04T09:35:35.9947780Z docker pull ghcr.io/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:35.9999448Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main
2025-12-04T09:35:35.9999957Z with:
2025-12-04T09:35:36.0000878Z   docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:36.0002005Z   docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:36.0002458Z env:
2025-12-04T09:35:36.0002697Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:36.0003004Z ##[endgroup]
2025-12-04T09:35:36.0019692Z ##[group]Run set -x
2025-12-04T09:35:36.0020017Z [36;1mset -x[0m
2025-12-04T09:35:36.0020291Z [36;1mset +e[0m
2025-12-04T09:35:36.0020533Z [36;1m[0m
2025-12-04T09:35:36.0020801Z [36;1mlogin() {[0m
2025-12-04T09:35:36.0021364Z [36;1m  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1"[0m
2025-12-04T09:35:36.0021965Z [36;1m}[0m
2025-12-04T09:35:36.0022207Z [36;1m[0m
2025-12-04T09:35:36.0022502Z [36;1mretry () {[0m
2025-12-04T09:35:36.0022808Z [36;1m  $*  || (sleep 1 && $*) || (sleep 2 && $*)[0m
2025-12-04T09:35:36.0023174Z [36;1m}[0m
2025-12-04T09:35:36.0023414Z [36;1m[0m
2025-12-04T09:35:36.0023668Z [36;1mretry login "${DOCKER_REGISTRY}"[0m
2025-12-04T09:35:36.0024023Z [36;1m[0m
2025-12-04T09:35:36.0024599Z [36;1mIMAGE_SIZE=$(docker manifest inspect "${DOCKER_IMAGE}" | jq '[.layers[].size, .config.size] | add / 1024 / 1024')[0m
2025-12-04T09:35:36.0025383Z [36;1mecho "Compressed size of image in MB: ${IMAGE_SIZE}"[0m
2025-12-04T09:35:36.0025811Z [36;1m[0m
2025-12-04T09:35:36.0026058Z [36;1mset -e[0m
2025-12-04T09:35:36.0026458Z [36;1m# ignore output since only exit code is used for conditional[0m
2025-12-04T09:35:36.0027030Z [36;1m# only pull docker image if it's not available locally[0m
2025-12-04T09:35:36.0027673Z [36;1mif ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then[0m
2025-12-04T09:35:36.0028267Z [36;1m  retry docker pull "${DOCKER_IMAGE}"[0m
2025-12-04T09:35:36.0028654Z [36;1mfi[0m
2025-12-04T09:35:36.0035248Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:36.0035689Z env:
2025-12-04T09:35:36.0035943Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:36.0036903Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:36.0038028Z   DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:36.0038482Z ##[endgroup]
2025-12-04T09:35:36.0065055Z + set +e
2025-12-04T09:35:36.0065677Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:36.0066207Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:36.0068937Z + aws ecr get-login-password --region us-east-1
2025-12-04T09:35:36.0070205Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:36.6280919Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T09:35:36.6281681Z Configure a credential helper to remove this warning. See
2025-12-04T09:35:36.6282531Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T09:35:36.6282990Z 
2025-12-04T09:35:36.6283109Z Login Succeeded
2025-12-04T09:35:36.6305759Z ++ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:36.6306907Z ++ jq '[.layers[].size, .config.size] | add / 1024 / 1024'
2025-12-04T09:35:36.8582579Z + IMAGE_SIZE=13438.219573020935
2025-12-04T09:35:36.8583080Z + echo 'Compressed size of image in MB: 13438.219573020935'
2025-12-04T09:35:36.8583568Z + set -e
2025-12-04T09:35:36.8584574Z + docker inspect --type=image 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:36.8586038Z Compressed size of image in MB: 13438.219573020935
2025-12-04T09:35:36.8716317Z + retry docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:36.8718025Z + docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:37.0519833Z pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a: Pulling from pytorch/ci-image
2025-12-04T09:35:37.0521350Z 63e5bc7682b8: Pulling fs layer
2025-12-04T09:35:37.0521881Z 835841cca3b7: Pulling fs layer
2025-12-04T09:35:37.0522554Z 1bf1bb125dea: Pulling fs layer
2025-12-04T09:35:37.0523102Z b21856d1bf42: Pulling fs layer
2025-12-04T09:35:37.0523505Z 848ba2c095e2: Pulling fs layer
2025-12-04T09:35:37.0523834Z 029495b23122: Pulling fs layer
2025-12-04T09:35:37.0524175Z 073bb82063cf: Pulling fs layer
2025-12-04T09:35:37.0524499Z 59b639308833: Pulling fs layer
2025-12-04T09:35:37.0524822Z 1c6177b2970d: Pulling fs layer
2025-12-04T09:35:37.0525153Z fabe466dd5f3: Pulling fs layer
2025-12-04T09:35:37.0525469Z 2b5a11b41761: Pulling fs layer
2025-12-04T09:35:37.0525794Z 9681563a88ff: Pulling fs layer
2025-12-04T09:35:37.0526120Z dc0780902fca: Pulling fs layer
2025-12-04T09:35:37.0526435Z 5b09a2b135c8: Pulling fs layer
2025-12-04T09:35:37.0526850Z 4f4fb700ef54: Pulling fs layer
2025-12-04T09:35:37.0527359Z 5bfdaeb5578d: Pulling fs layer
2025-12-04T09:35:37.0527866Z 0ef42867f370: Pulling fs layer
2025-12-04T09:35:37.0528309Z 446083e497f3: Pulling fs layer
2025-12-04T09:35:37.0528937Z d8a170bef0f4: Pulling fs layer
2025-12-04T09:35:37.0529290Z e2b6cd6a5bd0: Pulling fs layer
2025-12-04T09:35:37.0529604Z 93efc0181a22: Pulling fs layer
2025-12-04T09:35:37.0530038Z 7454c938f174: Pulling fs layer
2025-12-04T09:35:37.0530575Z 4d57ff55f6d4: Pulling fs layer
2025-12-04T09:35:37.0530905Z b0301534b4a5: Pulling fs layer
2025-12-04T09:35:37.0531314Z 1969e15d0c13: Pulling fs layer
2025-12-04T09:35:37.0531681Z 73180a0f2d5a: Pulling fs layer
2025-12-04T09:35:37.0532060Z ad81b25cb69f: Pulling fs layer
2025-12-04T09:35:37.0532378Z 029495b23122: Waiting
2025-12-04T09:35:37.0532666Z 8165374f8dcc: Pulling fs layer
2025-12-04T09:35:37.0532990Z 7779c0bb9be2: Pulling fs layer
2025-12-04T09:35:37.0533307Z 4d0a1c027262: Pulling fs layer
2025-12-04T09:35:37.0533636Z a51e0dab2d59: Pulling fs layer
2025-12-04T09:35:37.0533965Z 3eb6d4ff040b: Pulling fs layer
2025-12-04T09:35:37.0534317Z b168858b8537: Pulling fs layer
2025-12-04T09:35:37.0534640Z d77a39278026: Pulling fs layer
2025-12-04T09:35:37.0534965Z 36fbd357280b: Pulling fs layer
2025-12-04T09:35:37.0535294Z 4e3b10a5dd6a: Pulling fs layer
2025-12-04T09:35:37.0535890Z 3092fab73b59: Pulling fs layer
2025-12-04T09:35:37.0536218Z 20020dd28a15: Pulling fs layer
2025-12-04T09:35:37.0536634Z ae5280ce969d: Pulling fs layer
2025-12-04T09:35:37.0536948Z 026e4484b749: Pulling fs layer
2025-12-04T09:35:37.0537289Z 1be9da2ce53d: Pulling fs layer
2025-12-04T09:35:37.0537624Z 6481b7a1d9fb: Pulling fs layer
2025-12-04T09:35:37.0537940Z fa519d18c39d: Pulling fs layer
2025-12-04T09:35:37.0538266Z d172f25b97f7: Pulling fs layer
2025-12-04T09:35:37.0538675Z fd60ab6b1c2c: Pulling fs layer
2025-12-04T09:35:37.0538996Z 0afe45579c2c: Pulling fs layer
2025-12-04T09:35:37.0539328Z 5884ffd6720b: Pulling fs layer
2025-12-04T09:35:37.0539660Z ab7a7c316fa7: Pulling fs layer
2025-12-04T09:35:37.0539977Z c7775ce5574b: Pulling fs layer
2025-12-04T09:35:37.0540306Z 81945c4fb228: Pulling fs layer
2025-12-04T09:35:37.0540618Z 1c6177b2970d: Waiting
2025-12-04T09:35:37.0540883Z 073bb82063cf: Waiting
2025-12-04T09:35:37.0541203Z 848ba2c095e2: Waiting
2025-12-04T09:35:37.0541479Z 59b639308833: Waiting
2025-12-04T09:35:37.0541752Z 5b09a2b135c8: Waiting
2025-12-04T09:35:37.0542018Z fabe466dd5f3: Waiting
2025-12-04T09:35:37.0542302Z 0ef42867f370: Waiting
2025-12-04T09:35:37.0542579Z 5bfdaeb5578d: Waiting
2025-12-04T09:35:37.0543007Z 663cbe24d60b: Pulling fs layer
2025-12-04T09:35:37.0543330Z b21856d1bf42: Waiting
2025-12-04T09:35:37.0543936Z d8a170bef0f4: Waiting
2025-12-04T09:35:37.0544207Z ae5280ce969d: Waiting
2025-12-04T09:35:37.0544496Z 43f216b02786: Pulling fs layer
2025-12-04T09:35:37.0544812Z 4d57ff55f6d4: Waiting
2025-12-04T09:35:37.0545073Z 4f4fb700ef54: Waiting
2025-12-04T09:35:37.0545349Z 446083e497f3: Waiting
2025-12-04T09:35:37.0545618Z 9681563a88ff: Waiting
2025-12-04T09:35:37.0545878Z e2b6cd6a5bd0: Waiting
2025-12-04T09:35:37.0546155Z b0301534b4a5: Waiting
2025-12-04T09:35:37.0546444Z c47c3cfeb687: Pulling fs layer
2025-12-04T09:35:37.0546750Z dc0780902fca: Waiting
2025-12-04T09:35:37.0547024Z 1969e15d0c13: Waiting
2025-12-04T09:35:37.0547300Z 2b5a11b41761: Waiting
2025-12-04T09:35:37.0547611Z 8165374f8dcc: Waiting
2025-12-04T09:35:37.0547958Z 7d326b9e2673: Pulling fs layer
2025-12-04T09:35:37.0548277Z 7779c0bb9be2: Waiting
2025-12-04T09:35:37.0548575Z 5884ffd6720b: Waiting
2025-12-04T09:35:37.0548861Z d172f25b97f7: Waiting
2025-12-04T09:35:37.0549138Z fa519d18c39d: Waiting
2025-12-04T09:35:37.0549398Z 026e4484b749: Waiting
2025-12-04T09:35:37.0549676Z 1be9da2ce53d: Waiting
2025-12-04T09:35:37.0549971Z 7ec8f17141c8: Pulling fs layer
2025-12-04T09:35:37.0550279Z 4d0a1c027262: Waiting
2025-12-04T09:35:37.0550558Z 663cbe24d60b: Waiting
2025-12-04T09:35:37.0550842Z ad81b25cb69f: Waiting
2025-12-04T09:35:37.0551122Z 26249ea175bf: Pulling fs layer
2025-12-04T09:35:37.0551466Z 43f216b02786: Waiting
2025-12-04T09:35:37.0551759Z 5e8e9ccb36f3: Pulling fs layer
2025-12-04T09:35:37.0552081Z 93efc0181a22: Waiting
2025-12-04T09:35:37.0552345Z 7454c938f174: Waiting
2025-12-04T09:35:37.0552624Z fd60ab6b1c2c: Waiting
2025-12-04T09:35:37.0552920Z 5bc72d4e1de8: Pulling fs layer
2025-12-04T09:35:37.0553234Z 4e3b10a5dd6a: Waiting
2025-12-04T09:35:37.0553509Z 81945c4fb228: Waiting
2025-12-04T09:35:37.0553794Z 83cddbd49779: Pulling fs layer
2025-12-04T09:35:37.0554097Z c47c3cfeb687: Waiting
2025-12-04T09:35:37.0554394Z 60c25d8c3dd2: Pulling fs layer
2025-12-04T09:35:37.0554812Z 7d326b9e2673: Waiting
2025-12-04T09:35:37.0555086Z 5bc72d4e1de8: Waiting
2025-12-04T09:35:37.0555344Z 26249ea175bf: Waiting
2025-12-04T09:35:37.0555628Z a534dcf4b9a9: Pulling fs layer
2025-12-04T09:35:37.0555946Z 83cddbd49779: Waiting
2025-12-04T09:35:37.0556210Z 5e8e9ccb36f3: Waiting
2025-12-04T09:35:37.0556497Z 10138310c65c: Pulling fs layer
2025-12-04T09:35:37.0556806Z 7ec8f17141c8: Waiting
2025-12-04T09:35:37.0557231Z 3092fab73b59: Waiting
2025-12-04T09:35:37.0557520Z 8487679f252b: Pulling fs layer
2025-12-04T09:35:37.0557938Z 20020dd28a15: Waiting
2025-12-04T09:35:37.0558244Z a534dcf4b9a9: Waiting
2025-12-04T09:35:37.0558516Z d77a39278026: Waiting
2025-12-04T09:35:37.0558828Z 10138310c65c: Waiting
2025-12-04T09:35:37.0559230Z 52580ee2caa9: Pulling fs layer
2025-12-04T09:35:37.0559547Z 8487679f252b: Waiting
2025-12-04T09:35:37.0559818Z b168858b8537: Waiting
2025-12-04T09:35:37.0560088Z 741c215cb2ff: Pulling fs layer
2025-12-04T09:35:37.0560423Z d17f5aba17a6: Pulling fs layer
2025-12-04T09:35:37.0560740Z 36fbd357280b: Waiting
2025-12-04T09:35:37.0561016Z bc08246bb4ba: Pulling fs layer
2025-12-04T09:35:37.0561334Z 60c25d8c3dd2: Waiting
2025-12-04T09:35:37.0561622Z 7323bf084bf9: Pulling fs layer
2025-12-04T09:35:37.0561932Z 741c215cb2ff: Waiting
2025-12-04T09:35:37.0562194Z d17f5aba17a6: Waiting
2025-12-04T09:35:37.0562481Z d344ecc97fd7: Pulling fs layer
2025-12-04T09:35:37.0562811Z fb60b2d2147f: Pulling fs layer
2025-12-04T09:35:37.0563113Z c7775ce5574b: Waiting
2025-12-04T09:35:37.0563388Z fb60b2d2147f: Waiting
2025-12-04T09:35:37.0563664Z d344ecc97fd7: Waiting
2025-12-04T09:35:37.0563924Z 7323bf084bf9: Waiting
2025-12-04T09:35:37.0564197Z bc08246bb4ba: Waiting
2025-12-04T09:35:37.0564469Z 73180a0f2d5a: Waiting
2025-12-04T09:35:37.0564734Z ab7a7c316fa7: Waiting
2025-12-04T09:35:37.0565009Z 6481b7a1d9fb: Waiting
2025-12-04T09:35:37.0565285Z a51e0dab2d59: Waiting
2025-12-04T09:35:37.0565547Z 3eb6d4ff040b: Waiting
2025-12-04T09:35:37.1269239Z 835841cca3b7: Verifying Checksum
2025-12-04T09:35:37.1269902Z 835841cca3b7: Download complete
2025-12-04T09:35:37.2024193Z b21856d1bf42: Verifying Checksum
2025-12-04T09:35:37.2024600Z b21856d1bf42: Download complete
2025-12-04T09:35:37.2802841Z 848ba2c095e2: Verifying Checksum
2025-12-04T09:35:37.2803217Z 848ba2c095e2: Download complete
2025-12-04T09:35:37.3666160Z 029495b23122: Download complete
2025-12-04T09:35:37.4341239Z 63e5bc7682b8: Verifying Checksum
2025-12-04T09:35:37.4341633Z 63e5bc7682b8: Download complete
2025-12-04T09:35:37.4607554Z 073bb82063cf: Verifying Checksum
2025-12-04T09:35:37.4608224Z 073bb82063cf: Download complete
2025-12-04T09:35:37.4999646Z 59b639308833: Download complete
2025-12-04T09:35:37.5747119Z fabe466dd5f3: Verifying Checksum
2025-12-04T09:35:37.5747590Z fabe466dd5f3: Download complete
2025-12-04T09:35:37.6754374Z 2b5a11b41761: Verifying Checksum
2025-12-04T09:35:37.6754818Z 2b5a11b41761: Download complete
2025-12-04T09:35:37.7474671Z 9681563a88ff: Verifying Checksum
2025-12-04T09:35:37.7475324Z 9681563a88ff: Download complete
2025-12-04T09:35:37.8248418Z dc0780902fca: Verifying Checksum
2025-12-04T09:35:37.8248859Z dc0780902fca: Download complete
2025-12-04T09:35:38.4201140Z 63e5bc7682b8: Pull complete
2025-12-04T09:35:38.4411491Z 835841cca3b7: Pull complete
2025-12-04T09:35:38.7304314Z 1c6177b2970d: Verifying Checksum
2025-12-04T09:35:38.7305046Z 1c6177b2970d: Download complete
2025-12-04T09:35:38.7377065Z 4f4fb700ef54: Download complete
2025-12-04T09:35:38.8172522Z 5bfdaeb5578d: Verifying Checksum
2025-12-04T09:35:38.8172972Z 5bfdaeb5578d: Download complete
2025-12-04T09:35:38.8929047Z 0ef42867f370: Download complete
2025-12-04T09:35:38.9812799Z 446083e497f3: Verifying Checksum
2025-12-04T09:35:38.9813481Z 446083e497f3: Download complete
2025-12-04T09:35:39.0827640Z d8a170bef0f4: Verifying Checksum
2025-12-04T09:35:39.0828075Z d8a170bef0f4: Download complete
2025-12-04T09:35:39.1736835Z e2b6cd6a5bd0: Verifying Checksum
2025-12-04T09:35:39.1737285Z e2b6cd6a5bd0: Download complete
2025-12-04T09:35:39.2682152Z 93efc0181a22: Verifying Checksum
2025-12-04T09:35:39.2696973Z 93efc0181a22: Download complete
2025-12-04T09:35:39.3768558Z 7454c938f174: Verifying Checksum
2025-12-04T09:35:39.3769063Z 7454c938f174: Download complete
2025-12-04T09:35:39.4613140Z 4d57ff55f6d4: Verifying Checksum
2025-12-04T09:35:39.4613868Z 4d57ff55f6d4: Download complete
2025-12-04T09:35:40.8572046Z 1bf1bb125dea: Verifying Checksum
2025-12-04T09:35:40.8572501Z 1bf1bb125dea: Download complete
2025-12-04T09:35:40.9470825Z 1969e15d0c13: Verifying Checksum
2025-12-04T09:35:40.9471443Z 1969e15d0c13: Download complete
2025-12-04T09:35:41.4242374Z 73180a0f2d5a: Verifying Checksum
2025-12-04T09:35:41.4243000Z 73180a0f2d5a: Download complete
2025-12-04T09:35:41.5222086Z ad81b25cb69f: Verifying Checksum
2025-12-04T09:35:41.5222829Z ad81b25cb69f: Download complete
2025-12-04T09:35:41.6100832Z 8165374f8dcc: Verifying Checksum
2025-12-04T09:35:41.6101202Z 8165374f8dcc: Download complete
2025-12-04T09:35:49.3125913Z 7779c0bb9be2: Verifying Checksum
2025-12-04T09:35:49.3126410Z 7779c0bb9be2: Download complete
2025-12-04T09:35:49.4217943Z 4d0a1c027262: Verifying Checksum
2025-12-04T09:35:49.4218545Z 4d0a1c027262: Download complete
2025-12-04T09:35:49.5047087Z a51e0dab2d59: Verifying Checksum
2025-12-04T09:35:49.5047716Z a51e0dab2d59: Download complete
2025-12-04T09:35:49.5272777Z 1bf1bb125dea: Pull complete
2025-12-04T09:35:49.6016546Z 3eb6d4ff040b: Download complete
2025-12-04T09:35:49.6811081Z b168858b8537: Verifying Checksum
2025-12-04T09:35:49.6811697Z b168858b8537: Download complete
2025-12-04T09:35:49.7264009Z b21856d1bf42: Pull complete
2025-12-04T09:35:49.8883692Z 848ba2c095e2: Pull complete
2025-12-04T09:35:50.0018687Z 029495b23122: Pull complete
2025-12-04T09:35:50.1363959Z d77a39278026: Verifying Checksum
2025-12-04T09:35:50.1364394Z d77a39278026: Download complete
2025-12-04T09:35:50.1910543Z 073bb82063cf: Pull complete
2025-12-04T09:35:50.2424293Z 36fbd357280b: Verifying Checksum
2025-12-04T09:35:50.2425011Z 36fbd357280b: Download complete
2025-12-04T09:35:50.3155927Z 59b639308833: Pull complete
2025-12-04T09:35:50.3395806Z 4e3b10a5dd6a: Verifying Checksum
2025-12-04T09:35:50.3396453Z 4e3b10a5dd6a: Download complete
2025-12-04T09:35:50.4218646Z 3092fab73b59: Verifying Checksum
2025-12-04T09:35:50.4219376Z 3092fab73b59: Download complete
2025-12-04T09:35:50.5242079Z 20020dd28a15: Verifying Checksum
2025-12-04T09:35:50.5242672Z 20020dd28a15: Download complete
2025-12-04T09:35:50.6102568Z ae5280ce969d: Download complete
2025-12-04T09:35:50.6999671Z 026e4484b749: Verifying Checksum
2025-12-04T09:35:50.7000211Z 026e4484b749: Download complete
2025-12-04T09:35:50.7790183Z 1be9da2ce53d: Verifying Checksum
2025-12-04T09:35:50.7790677Z 1be9da2ce53d: Download complete
2025-12-04T09:35:50.8728955Z 6481b7a1d9fb: Verifying Checksum
2025-12-04T09:35:50.8729451Z 6481b7a1d9fb: Download complete
2025-12-04T09:35:53.0623928Z 1c6177b2970d: Pull complete
2025-12-04T09:35:53.2668915Z fabe466dd5f3: Pull complete
2025-12-04T09:35:53.4794517Z 2b5a11b41761: Pull complete
2025-12-04T09:35:53.7023864Z 9681563a88ff: Pull complete
2025-12-04T09:35:53.9147689Z dc0780902fca: Pull complete
2025-12-04T09:35:55.7686524Z fa519d18c39d: Verifying Checksum
2025-12-04T09:36:25.4925628Z fa519d18c39d: Download complete
2025-12-04T09:36:25.4926078Z 5b09a2b135c8: Verifying Checksum
2025-12-04T09:36:25.4926418Z 5b09a2b135c8: Download complete
2025-12-04T09:36:25.5923811Z fd60ab6b1c2c: Verifying Checksum
2025-12-04T09:36:25.5924448Z fd60ab6b1c2c: Download complete
2025-12-04T09:36:25.6861000Z 0afe45579c2c: Verifying Checksum
2025-12-04T09:36:25.6861498Z 0afe45579c2c: Download complete
2025-12-04T09:36:25.7783866Z 5884ffd6720b: Verifying Checksum
2025-12-04T09:36:25.7784493Z 5884ffd6720b: Download complete
2025-12-04T09:36:25.9505977Z ab7a7c316fa7: Verifying Checksum
2025-12-04T09:36:25.9506456Z ab7a7c316fa7: Download complete
2025-12-04T09:36:26.0142132Z c7775ce5574b: Verifying Checksum
2025-12-04T09:36:26.0142837Z c7775ce5574b: Download complete
2025-12-04T09:36:26.1046311Z 81945c4fb228: Verifying Checksum
2025-12-04T09:36:26.1046781Z 81945c4fb228: Download complete
2025-12-04T09:36:26.1813211Z 663cbe24d60b: Verifying Checksum
2025-12-04T09:36:26.1813638Z 663cbe24d60b: Download complete
2025-12-04T09:36:26.2748229Z 43f216b02786: Download complete
2025-12-04T09:36:26.4057984Z c47c3cfeb687: Download complete
2025-12-04T09:36:26.4893757Z 7d326b9e2673: Verifying Checksum
2025-12-04T09:36:26.4894278Z 7d326b9e2673: Download complete
2025-12-04T09:36:26.5896530Z 7ec8f17141c8: Download complete
2025-12-04T09:36:26.7125442Z 26249ea175bf: Verifying Checksum
2025-12-04T09:36:26.7125901Z 26249ea175bf: Download complete
2025-12-04T09:36:26.8018817Z 5e8e9ccb36f3: Download complete
2025-12-04T09:36:26.8727983Z 5bc72d4e1de8: Download complete
2025-12-04T09:36:26.9792648Z 83cddbd49779: Verifying Checksum
2025-12-04T09:36:26.9793366Z 83cddbd49779: Download complete
2025-12-04T09:36:27.0569091Z 60c25d8c3dd2: Verifying Checksum
2025-12-04T09:36:27.0569581Z 60c25d8c3dd2: Download complete
2025-12-04T09:36:27.1287948Z a534dcf4b9a9: Verifying Checksum
2025-12-04T09:36:27.1288437Z a534dcf4b9a9: Download complete
2025-12-04T09:36:30.6454194Z 10138310c65c: Verifying Checksum
2025-12-04T09:36:30.8135186Z 10138310c65c: Download complete
2025-12-04T09:36:30.8135882Z 52580ee2caa9: Download complete
2025-12-04T09:36:30.8996765Z 741c215cb2ff: Verifying Checksum
2025-12-04T09:36:30.8997184Z 741c215cb2ff: Download complete
2025-12-04T09:36:31.0253513Z d17f5aba17a6: Download complete
2025-12-04T09:36:31.5527761Z 7323bf084bf9: Verifying Checksum
2025-12-04T09:36:31.5528203Z 7323bf084bf9: Download complete
2025-12-04T09:36:31.6562320Z d344ecc97fd7: Download complete
2025-12-04T09:36:32.6236846Z fb60b2d2147f: Verifying Checksum
2025-12-04T09:36:32.6237394Z fb60b2d2147f: Download complete
2025-12-04T09:36:46.7134522Z d172f25b97f7: Verifying Checksum
2025-12-04T09:36:46.7134954Z d172f25b97f7: Download complete
2025-12-04T09:37:18.1359843Z 5b09a2b135c8: Pull complete
2025-12-04T09:37:18.3621287Z 4f4fb700ef54: Pull complete
2025-12-04T09:37:18.5760913Z 5bfdaeb5578d: Pull complete
2025-12-04T09:37:18.8273791Z 0ef42867f370: Pull complete
2025-12-04T09:37:19.0561997Z 446083e497f3: Pull complete
2025-12-04T09:37:19.3478293Z d8a170bef0f4: Pull complete
2025-12-04T09:37:19.5648449Z e2b6cd6a5bd0: Pull complete
2025-12-04T09:37:19.7899298Z 93efc0181a22: Pull complete
2025-12-04T09:37:20.0154941Z 7454c938f174: Pull complete
2025-12-04T09:37:20.2436138Z 4d57ff55f6d4: Pull complete
2025-12-04T09:37:20.4956107Z b0301534b4a5: Verifying Checksum
2025-12-04T09:37:20.4957792Z b0301534b4a5: Download complete
2025-12-04T09:38:38.6098927Z b0301534b4a5: Pull complete
2025-12-04T09:38:38.8263105Z 1969e15d0c13: Pull complete
2025-12-04T09:38:39.5864570Z 73180a0f2d5a: Pull complete
2025-12-04T09:38:39.7973528Z ad81b25cb69f: Pull complete
2025-12-04T09:38:40.0254374Z 8165374f8dcc: Pull complete
2025-12-04T09:38:48.3902357Z 7779c0bb9be2: Pull complete
2025-12-04T09:38:48.6095369Z 4d0a1c027262: Pull complete
2025-12-04T09:38:48.8267620Z a51e0dab2d59: Pull complete
2025-12-04T09:38:49.1640060Z 3eb6d4ff040b: Pull complete
2025-12-04T09:38:49.3626019Z b168858b8537: Pull complete
2025-12-04T09:38:49.8015313Z d77a39278026: Pull complete
2025-12-04T09:38:50.0234852Z 36fbd357280b: Pull complete
2025-12-04T09:38:50.2405731Z 4e3b10a5dd6a: Pull complete
2025-12-04T09:38:50.6338185Z 3092fab73b59: Pull complete
2025-12-04T09:38:50.8650041Z 20020dd28a15: Pull complete
2025-12-04T09:38:51.0934213Z ae5280ce969d: Pull complete
2025-12-04T09:38:51.5114541Z 026e4484b749: Pull complete
2025-12-04T09:38:51.7183653Z 1be9da2ce53d: Pull complete
2025-12-04T09:38:52.1108693Z 6481b7a1d9fb: Pull complete
2025-12-04T09:38:53.9475480Z fa519d18c39d: Pull complete
2025-12-04T09:39:54.1719381Z d172f25b97f7: Pull complete
2025-12-04T09:39:54.2611874Z fd60ab6b1c2c: Pull complete
2025-12-04T09:39:54.4058510Z 0afe45579c2c: Pull complete
2025-12-04T09:39:54.6742695Z 5884ffd6720b: Pull complete
2025-12-04T09:39:54.9071645Z ab7a7c316fa7: Pull complete
2025-12-04T09:39:55.0381296Z c7775ce5574b: Pull complete
2025-12-04T09:39:55.3482946Z 81945c4fb228: Pull complete
2025-12-04T09:39:55.5947516Z 663cbe24d60b: Pull complete
2025-12-04T09:39:55.7045319Z 43f216b02786: Pull complete
2025-12-04T09:39:55.8876068Z c47c3cfeb687: Pull complete
2025-12-04T09:39:56.1569512Z 7d326b9e2673: Pull complete
2025-12-04T09:39:56.2843815Z 7ec8f17141c8: Pull complete
2025-12-04T09:39:56.5366894Z 26249ea175bf: Pull complete
2025-12-04T09:39:56.6791294Z 5e8e9ccb36f3: Pull complete
2025-12-04T09:39:57.0140280Z 5bc72d4e1de8: Pull complete
2025-12-04T09:39:57.2378689Z 83cddbd49779: Pull complete
2025-12-04T09:39:57.6460501Z 60c25d8c3dd2: Pull complete
2025-12-04T09:39:57.8508626Z a534dcf4b9a9: Pull complete
2025-12-04T09:40:04.5379920Z 10138310c65c: Pull complete
2025-12-04T09:40:04.7595992Z 8487679f252b: Pull complete
2025-12-04T09:40:04.9868373Z 52580ee2caa9: Pull complete
2025-12-04T09:40:05.2033172Z 741c215cb2ff: Pull complete
2025-12-04T09:40:05.4282347Z d17f5aba17a6: Pull complete
2025-12-04T09:40:05.6616193Z bc08246bb4ba: Pull complete
2025-12-04T09:40:07.0220123Z 7323bf084bf9: Pull complete
2025-12-04T09:40:07.1545922Z d344ecc97fd7: Pull complete
2025-12-04T09:40:08.8963067Z fb60b2d2147f: Pull complete
2025-12-04T09:40:09.0771504Z Digest: sha256:ae30f11a5b50741bd652aa0c94ad89ef791c4e50157eff642748620825cf7940
2025-12-04T09:40:09.1095583Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:40:09.1263005Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:40:09.1341211Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T09:40:09.1342496Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:40:09.1352125Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:40:09.1352573Z env:
2025-12-04T09:40:09.1352820Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:40:09.1353129Z ##[endgroup]
2025-12-04T09:40:09.1602256Z ##[group]Run pytorch/test-infra/.github/actions/setup-nvidia@main
2025-12-04T09:40:09.1602776Z with:
2025-12-04T09:40:09.1603026Z   driver-version: 525.105.17
2025-12-04T09:40:09.1603333Z env:
2025-12-04T09:40:09.1603583Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:40:09.1603873Z ##[endgroup]
2025-12-04T09:40:09.1657747Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T09:40:09.1658852Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:40:09.1666074Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:40:09.1666524Z env:
2025-12-04T09:40:09.1666777Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:40:09.1667073Z ##[endgroup]
2025-12-04T09:40:09.1725593Z ##[group]Run set -euo pipefail
2025-12-04T09:40:09.1725999Z [36;1mset -euo pipefail[0m
2025-12-04T09:40:09.1726355Z [36;1m[0m
2025-12-04T09:40:09.1726596Z [36;1mhas_gpu=false[0m
2025-12-04T09:40:09.1726894Z [36;1mdevices=""[0m
2025-12-04T09:40:09.1727167Z [36;1m[0m
2025-12-04T09:40:09.1727478Z [36;1mif command -v nvidia-smi >/dev/null 2>&1; then[0m
2025-12-04T09:40:09.1728014Z [36;1m  if nvidia-smi -L >/tmp/nvidia_devices 2>/dev/null; then[0m
2025-12-04T09:40:09.1728475Z [36;1m    has_gpu=true[0m
2025-12-04T09:40:09.1728823Z [36;1m    devices=$(cat /tmp/nvidia_devices)[0m
2025-12-04T09:40:09.1729187Z [36;1m  fi[0m
2025-12-04T09:40:09.1729456Z [36;1mfi[0m
2025-12-04T09:40:09.1729698Z [36;1m[0m
2025-12-04T09:40:09.1729950Z [36;1mif [ "$has_gpu" = false ]; then[0m
2025-12-04T09:40:09.1730416Z [36;1m  if ls /dev/nvidia* >/tmp/nvidia_devices 2>/dev/null; then[0m
2025-12-04T09:40:09.1730873Z [36;1m    has_gpu=true[0m
2025-12-04T09:40:09.1731250Z [36;1m    devices=$(cat /tmp/nvidia_devices)[0m
2025-12-04T09:40:09.1731608Z [36;1m  fi[0m
2025-12-04T09:40:09.1731854Z [36;1mfi[0m
2025-12-04T09:40:09.1732100Z [36;1m[0m
2025-12-04T09:40:09.1732449Z [36;1mif [ "$has_gpu" = false ] && command -v lspci >/dev/null 2>&1; then[0m
2025-12-04T09:40:09.1733057Z [36;1m  if lspci | grep -i 'nvidia' >/tmp/nvidia_devices 2>/dev/null; then[0m
2025-12-04T09:40:09.1733551Z [36;1m    has_gpu=true[0m
2025-12-04T09:40:09.1733883Z [36;1m    devices=$(cat /tmp/nvidia_devices)[0m
2025-12-04T09:40:09.1734255Z [36;1m  fi[0m
2025-12-04T09:40:09.1734503Z [36;1mfi[0m
2025-12-04T09:40:09.1734730Z [36;1m[0m
2025-12-04T09:40:09.1735083Z [36;1mprintf 'HAS_NVIDIA=%s\n' "$has_gpu" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:40:09.1735919Z [36;1mprintf 'DETECTED_DEVICES<<EOF\n%s\nEOF\n' "$devices" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:40:09.1742791Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:40:09.1743224Z env:
2025-12-04T09:40:09.1743475Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:40:09.1743796Z ##[endgroup]
2025-12-04T09:40:10.7294604Z ##[group]Run if [ "${HAS_NVIDIA}" = "true" ]; then
2025-12-04T09:40:10.7295087Z [36;1mif [ "${HAS_NVIDIA}" = "true" ]; then[0m
2025-12-04T09:40:10.7295539Z [36;1m  echo "HAS_NVIDIA_GPU=true" >> "${GITHUB_ENV}"[0m
2025-12-04T09:40:10.7296319Z [36;1m  echo "GPU_FLAG=--gpus all -e NVIDIA_DRIVER_CAPABILITIES=all" >> "${GITHUB_ENV}"[0m
2025-12-04T09:40:10.7296879Z [36;1melse[0m
2025-12-04T09:40:10.7297200Z [36;1m  echo "HAS_NVIDIA_GPU=false" >> "${GITHUB_ENV}"[0m
2025-12-04T09:40:10.7297613Z [36;1mfi[0m
2025-12-04T09:40:10.7304527Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:40:10.7304979Z env:
2025-12-04T09:40:10.7305233Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:40:10.7305542Z   HAS_NVIDIA: true
2025-12-04T09:40:10.7305796Z ##[endgroup]
2025-12-04T09:40:10.7388988Z ##[group]Run nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482
2025-12-04T09:40:10.7389505Z with:
2025-12-04T09:40:10.7389746Z   timeout_minutes: 10
2025-12-04T09:40:10.7390045Z   max_attempts: 3
2025-12-04T09:40:10.7422919Z   command: # Is it disgusting to have a full shell script here in this github action? Sure
# But is it the best way to make it so that this action relies on nothing else? Absolutely
set -eou pipefail

DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID)
DRIVER_FN="NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run"

install_nvidia_docker2_amzn2() {
    (
        set -x
        # Needed for yum-config-manager
        sudo yum install -y yum-utils
        if [[ "${DISTRIBUTION}" == "amzn2023" ]] ; then
          YUM_REPO_URL="https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo"
        else
          # Amazon Linux 2
          YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo"
        fi

        sudo yum-config-manager --add-repo "${YUM_REPO_URL}"
        sudo yum install -y \
          nvidia-container-toolkit-1.17.8 \
          libnvidia-container-tools-1.17.8 \
          libnvidia-container1-1.17.8 \
          nvidia-container-toolkit-base-1.17.8
        sudo systemctl restart docker
    )
}

install_nvidia_docker2_ubuntu20() {
    (
        set -x
        # Install nvidia-driver package if not installed
        status="$(dpkg-query -W --showformat='${db:Status-Status}' nvidia-docker2 2>&1)"
        if [ ! $? = 0 ] || [ ! "$status" = installed ]; then
          sudo apt-get install -y nvidia-container-toolkit-1.17.8
          sudo systemctl restart docker
        fi
    )
}

pre_install_nvidia_driver_amzn2() {
    (
        # Purge any nvidia driver installed from RHEL repo
        sudo yum remove -y nvidia-driver-latest-dkms
    )
}

install_nvidia_driver_common() {
    (
        # Try to gather more information about the runner and its existing NVIDIA driver if any
        echo "Before installing NVIDIA driver"
        lspci
        lsmod
        modinfo nvidia || true

        HAS_NVIDIA_DRIVER=0
        # Check if NVIDIA driver has already been installed
        if [ -x "$(command -v nvidia-smi)" ]; then
            set +e
            # The driver exists, check its version next. Also check only the first GPU if there are more than one of them
            # so that the same driver version is not print over multiple lines
            INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0)
            NVIDIA_SMI_STATUS=$?

            if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then
                echo "Failed to get NVIDIA driver version ($INSTALLED_DRIVER_VERSION). Continuing"
            elif [ "$INSTALLED_DRIVER_VERSION" != "$DRIVER_VERSION" ]; then
                echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has been installed, but we expect to have $DRIVER_VERSION instead. Continuing"

                # Turn off persistent mode so that the installation script can unload the kernel module
                sudo killall nvidia-persistenced || true
            else
                HAS_NVIDIA_DRIVER=1
                echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has already been installed. Skipping NVIDIA driver installation"
            fi
            set -e
        fi

        if [ "$HAS_NVIDIA_DRIVER" -eq 0 ]; then
            # CAUTION: this may need to be updated in future
            if [ "${DISTRIBUTION}" != ubuntu20.04 ]; then
                  sudo yum groupinstall -y "Development Tools"
                  # ensure our kernel install is the same as our underlying kernel,
                  # groupinstall "Development Tools" has a habit of mismatching kernel headers
                  sudo yum install -y "kernel-devel-uname-r == $(uname -r)"
                  sudo modprobe backlight
            fi
            sudo curl -fsL -o /tmp/nvidia_driver "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"

            set +e
            sudo /bin/bash /tmp/nvidia_driver -s --no-drm
            NVIDIA_INSTALLATION_STATUS=$?

            RESET_GPU=0
            if [ "$NVIDIA_INSTALLATION_STATUS" -ne 0 ]; then
                sudo cat /var/log/nvidia-installer.log
                # Fail to install NVIDIA driver, try to reset the GPU
                RESET_GPU=1
            elif [ -x "$(command -v nvidia-smi)" ]; then
                # Check again if nvidia-smi works even if the driver installation completes successfully
                INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0)
                NVIDIA_SMI_STATUS=$?

                if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then
                    RESET_GPU=1
                fi
            fi

            if [ "$RESET_GPU" -eq 1 ]; then
                NVIDIA_DEVICES=$(lspci -D | grep -i NVIDIA | cut -d' ' -f1)
                # The GPU can get stuck in a failure state if somehow the test crashs the GPU microcode. When this
                # happens, we'll try to reset all NVIDIA devices https://github.com/pytorch/pytorch/issues/88388
                for PCI_ID in $NVIDIA_DEVICES; do
                    DEVICE_ENABLED=$(cat /sys/bus/pci/devices/$PCI_ID/enable)

                    echo "Reseting $PCI_ID (enabled state: $DEVICE_ENABLED)"
                    # This requires sudo permission of course
                    echo "1" | sudo tee /sys/bus/pci/devices/$PCI_ID/reset
                    sleep 1
                done
            fi

            sudo rm -fv /tmp/nvidia_driver
            set -e
        fi
    )
}

post_install_nvidia_driver_common() {
    (
        sudo modprobe nvidia || true
        echo "After installing NVIDIA driver"
        lspci
        lsmod
        modinfo nvidia || true

        (
            set +e

            nvidia-smi
            # NB: Annoyingly, nvidia-smi command returns successfully with return code 0 even in
            # the case where the driver has already crashed as it still can get the driver version
            # and some basic information like the bus ID.  However, the rest of the information
            # would be missing (ERR!), for example:
            #
            # +-----------------------------------------------------------------------------+
            # | NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |
            # |-------------------------------+----------------------+----------------------+
            # | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
            # | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
            # |                               |                      |               MIG M. |
            # |===============================+======================+======================|
            # |   0  ERR!                Off  | 00000000:00:1E.0 Off |                 ERR! |
            # |ERR!  ERR! ERR!    ERR! / ERR! |   4184MiB / 23028MiB |    ERR!      Default |
            # |                               |                      |                 ERR! |
            # +-------------------------------+----------------------+----------------------+
            #
            # +-----------------------------------------------------------------------------+
            # | Processes:                                                                  |
            # |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
            # |        ID   ID                                                   Usage      |
            # |=============================================================================|
            # +-----------------------------------------------------------------------------+
            #
            # This should be reported as a failure instead as it will guarantee to fail when
            # Docker tries to run with --gpus all
            #
            # So, the correct check here is to query one of the missing piece of info like
            # GPU name, so that the command can fail accordingly
            nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0
            NVIDIA_SMI_STATUS=$?

            # Allowable exit statuses for nvidia-smi, see: https://github.com/NVIDIA/gpu-operator/issues/285
            if [ "$NVIDIA_SMI_STATUS" -eq 0 ] || [ "$NVIDIA_SMI_STATUS" -eq 14 ]; then
                echo "INFO: Ignoring allowed status ${NVIDIA_SMI_STATUS}"
            else
                echo "ERROR: nvidia-smi exited with unresolved status ${NVIDIA_SMI_STATUS}"
                exit ${NVIDIA_SMI_STATUS}
            fi
            set -e
        )
    )
}

install_nvidia_driver_amzn2() {
    (
        set -x
        pre_install_nvidia_driver_amzn2
        install_nvidia_driver_common
        post_install_nvidia_driver_common
    )
}

install_nvidia_driver_ubuntu20() {
    (
        set -x
        install_nvidia_driver_common
        post_install_nvidia_driver_common
    )
}

echo "== Installing nvidia driver ${DRIVER_FN} =="
case "${DISTRIBUTION}" in
    amzn*)
        install_nvidia_driver_amzn2
        ;;
    ubuntu20.04)
        install_nvidia_driver_ubuntu20
        ;;
    *)
        echo "ERROR: Unknown distribution ${DISTRIBUTION}"
        exit 1
        ;;
esac

# Install container toolkit based on distribution
echo "== Installing nvidia container toolkit for ${DISTRIBUTION} =="
case "${DISTRIBUTION}" in
    amzn*)
        install_nvidia_docker2_amzn2
        ;;
    ubuntu20.04)
        install_nvidia_docker2_ubuntu20
        ;;
    *)
        echo "ERROR: Unknown distribution ${DISTRIBUTION}"
        exit 1
        ;;
esac

# Fix https://github.com/NVIDIA/nvidia-docker/issues/1648 on runners with
# more than one GPUs. This just needs to be run once. The command fails
# on subsequent runs and complains that the mode is already on, but that's
# ok
sudo nvidia-persistenced || true
# This should show persistence mode ON
nvidia-smi

# check if the container-toolkit is correctly installed and CUDA is available inside a container
docker run --rm -t --gpus=all public.ecr.aws/docker/library/python:3.13 nvidia-smi

2025-12-04T09:40:10.7456274Z   retry_wait_seconds: 10
2025-12-04T09:40:10.7456615Z   polling_interval_seconds: 1
2025-12-04T09:40:10.7456942Z   warning_on_retry: true
2025-12-04T09:40:10.7457264Z   continue_on_error: false
2025-12-04T09:40:10.7457578Z env:
2025-12-04T09:40:10.7457811Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:40:10.7458121Z   HAS_NVIDIA_GPU: true
2025-12-04T09:40:10.7458488Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:40:10.7458919Z   DRIVER_VERSION: 525.105.17
2025-12-04T09:40:10.7459260Z ##[endgroup]
2025-12-04T09:40:10.8245529Z == Installing nvidia driver NVIDIA-Linux-x86_64-525.105.17.run ==
2025-12-04T09:40:10.8246636Z + pre_install_nvidia_driver_amzn2
2025-12-04T09:40:10.8247441Z + sudo yum remove -y nvidia-driver-latest-dkms
2025-12-04T09:40:11.5435209Z No match for argument: nvidia-driver-latest-dkms
2025-12-04T09:40:11.5435966Z No packages marked for removal.
2025-12-04T09:40:11.5514872Z Dependencies resolved.
2025-12-04T09:40:11.5527015Z Nothing to do.
2025-12-04T09:40:11.5527549Z Complete!
2025-12-04T09:40:11.6327011Z + install_nvidia_driver_common
2025-12-04T09:40:11.6334521Z + echo 'Before installing NVIDIA driver'
2025-12-04T09:40:11.6335693Z Before installing NVIDIA driver
2025-12-04T09:40:11.6337719Z + lspci
2025-12-04T09:40:11.6915484Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma]
2025-12-04T09:40:11.6916537Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
2025-12-04T09:40:11.6917329Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
2025-12-04T09:40:11.6918004Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111
2025-12-04T09:40:11.6918621Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller
2025-12-04T09:40:11.6919316Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
2025-12-04T09:40:11.6919956Z 00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
2025-12-04T09:40:11.6920592Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
2025-12-04T09:40:11.6921097Z + lsmod
2025-12-04T09:40:11.6961140Z Module                  Size  Used by
2025-12-04T09:40:11.6961851Z nvidia_uvm           1925120  0
2025-12-04T09:40:11.6962202Z nvidia              14286848  1 nvidia_uvm
2025-12-04T09:40:11.6962611Z drm                   602112  1 nvidia
2025-12-04T09:40:11.6963227Z drm_panel_orientation_quirks    32768  1 drm
2025-12-04T09:40:11.6963669Z backlight              24576  1 drm
2025-12-04T09:40:11.6964035Z i2c_core              110592  2 nvidia,drm
2025-12-04T09:40:11.6964402Z xt_conntrack           16384  1
2025-12-04T09:40:11.6964738Z nft_chain_nat          16384  3
2025-12-04T09:40:11.6965057Z xt_MASQUERADE          20480  1
2025-12-04T09:40:11.6965473Z nf_nat                 57344  2 nft_chain_nat,xt_MASQUERADE
2025-12-04T09:40:11.6966279Z nf_conntrack_netlink    57344  0
2025-12-04T09:40:11.6967035Z nf_conntrack          184320  4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE
2025-12-04T09:40:11.6967596Z nf_defrag_ipv6         24576  1 nf_conntrack
2025-12-04T09:40:11.6967997Z nf_defrag_ipv4         16384  1 nf_conntrack
2025-12-04T09:40:11.6968416Z xfrm_user              57344  1
2025-12-04T09:40:11.6968739Z xfrm_algo              16384  1 xfrm_user
2025-12-04T09:40:11.6969109Z xt_addrtype            16384  2
2025-12-04T09:40:11.6969440Z nft_compat             20480  4
2025-12-04T09:40:11.6969810Z nf_tables             311296  57 nft_compat,nft_chain_nat
2025-12-04T09:40:11.6970333Z nfnetlink              20480  4 nft_compat,nf_conntrack_netlink,nf_tables
2025-12-04T09:40:11.6970806Z br_netfilter           36864  0
2025-12-04T09:40:11.6971630Z bridge                323584  1 br_netfilter
2025-12-04T09:40:11.6972001Z stp                    16384  1 bridge
2025-12-04T09:40:11.6972359Z llc                    16384  2 bridge,stp
2025-12-04T09:40:11.6972719Z overlay               167936  0
2025-12-04T09:40:11.6973026Z tls                   139264  0
2025-12-04T09:40:11.6973343Z nls_ascii              16384  1
2025-12-04T09:40:11.6973662Z nls_cp437              20480  1
2025-12-04T09:40:11.6973962Z vfat                   24576  1
2025-12-04T09:40:11.6974285Z fat                    86016  1 vfat
2025-12-04T09:40:11.6974632Z sunrpc                700416  1
2025-12-04T09:40:11.6974930Z i8042                  45056  0
2025-12-04T09:40:11.6975245Z ena                   184320  0
2025-12-04T09:40:11.6975561Z serio                  28672  3 i8042
2025-12-04T09:40:11.6975908Z skx_edac_common        28672  0
2025-12-04T09:40:11.6976278Z button                 24576  0
2025-12-04T09:40:11.6976606Z ghash_clmulni_intel    16384  0
2025-12-04T09:40:11.6976938Z sch_fq_codel           20480  17
2025-12-04T09:40:11.6977484Z fuse                  184320  1
2025-12-04T09:40:11.6977797Z dm_mod                188416  0
2025-12-04T09:40:11.6978117Z configfs               57344  1
2025-12-04T09:40:11.6978420Z loop                   36864  0
2025-12-04T09:40:11.6978740Z dmi_sysfs              20480  0
2025-12-04T09:40:11.6979062Z crc32_pclmul           16384  0
2025-12-04T09:40:11.6979374Z crc32c_intel           24576  0
2025-12-04T09:40:11.6979692Z efivarfs               24576  1
2025-12-04T09:40:11.6980014Z + modinfo nvidia
2025-12-04T09:40:11.6983617Z filename:       /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko
2025-12-04T09:40:11.6984644Z import_ns:      DMA_BUF
2025-12-04T09:40:11.6984959Z alias:          char-major-195-*
2025-12-04T09:40:11.6985295Z version:        580.82.07
2025-12-04T09:40:11.6985589Z supported:      external
2025-12-04T09:40:11.6985899Z license:        Dual MIT/GPL
2025-12-04T09:40:11.6986254Z firmware:       nvidia/580.82.07/gsp_tu10x.bin
2025-12-04T09:40:11.6986686Z firmware:       nvidia/580.82.07/gsp_ga10x.bin
2025-12-04T09:40:11.6987103Z srcversion:     BA7240A71DCF7DC6FE88C1D
2025-12-04T09:40:11.6987516Z alias:          of:N*T*Cnvidia,tegra264-displayC*
2025-12-04T09:40:11.6987959Z alias:          of:N*T*Cnvidia,tegra264-display
2025-12-04T09:40:11.6988379Z alias:          of:N*T*Cnvidia,tegra234-displayC*
2025-12-04T09:40:11.6988813Z alias:          of:N*T*Cnvidia,tegra234-display
2025-12-04T09:40:11.6989392Z alias:          pci:v000010DEd*sv*sd*bc06sc80i00*
2025-12-04T09:40:11.6989807Z alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
2025-12-04T09:40:11.6990233Z alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
2025-12-04T09:40:11.6990626Z depends:        i2c-core,drm
2025-12-04T09:40:11.6990933Z retpoline:      Y
2025-12-04T09:40:11.6991208Z name:           nvidia
2025-12-04T09:40:11.6991664Z vermagic:       6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 
2025-12-04T09:40:11.6992261Z parm:           NvSwitchRegDwords:NvSwitch regkey (charp)
2025-12-04T09:40:11.6992819Z parm:           NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp)
2025-12-04T09:40:11.6993351Z parm:           NVreg_ResmanDebugLevel:int
2025-12-04T09:40:11.6993743Z parm:           NVreg_RmLogonRC:int
2025-12-04T09:40:11.6994109Z parm:           NVreg_ModifyDeviceFiles:int
2025-12-04T09:40:11.6994505Z parm:           NVreg_DeviceFileUID:int
2025-12-04T09:40:11.6994887Z parm:           NVreg_DeviceFileGID:int
2025-12-04T09:40:11.6995259Z parm:           NVreg_DeviceFileMode:int
2025-12-04T09:40:11.6995713Z parm:           NVreg_InitializeSystemMemoryAllocations:int
2025-12-04T09:40:11.6996199Z parm:           NVreg_UsePageAttributeTable:int
2025-12-04T09:40:11.6996618Z parm:           NVreg_EnablePCIeGen3:int
2025-12-04T09:40:11.6996991Z parm:           NVreg_EnableMSI:int
2025-12-04T09:40:11.6997376Z parm:           NVreg_EnableStreamMemOPs:int
2025-12-04T09:40:11.6997824Z parm:           NVreg_RestrictProfilingToAdminUsers:int
2025-12-04T09:40:11.6998306Z parm:           NVreg_PreserveVideoMemoryAllocations:int
2025-12-04T09:40:11.6998785Z parm:           NVreg_EnableS0ixPowerManagement:int
2025-12-04T09:40:11.6999294Z parm:           NVreg_S0ixPowerManagementVideoMemoryThreshold:int
2025-12-04T09:40:11.6999789Z parm:           NVreg_DynamicPowerManagement:int
2025-12-04T09:40:11.7000308Z parm:           NVreg_DynamicPowerManagementVideoMemoryThreshold:int
2025-12-04T09:40:11.7000820Z parm:           NVreg_EnableGpuFirmware:int
2025-12-04T09:40:11.7001247Z parm:           NVreg_EnableGpuFirmwareLogs:int
2025-12-04T09:40:11.7001693Z parm:           NVreg_OpenRmEnableUnsupportedGpus:int
2025-12-04T09:40:11.7002164Z parm:           NVreg_EnableUserNUMAManagement:int
2025-12-04T09:40:11.7002590Z parm:           NVreg_MemoryPoolSize:int
2025-12-04T09:40:11.7002981Z parm:           NVreg_KMallocHeapMaxSize:int
2025-12-04T09:40:11.7003402Z parm:           NVreg_VMallocHeapMaxSize:int
2025-12-04T09:40:11.7003807Z parm:           NVreg_IgnoreMMIOCheck:int
2025-12-04T09:40:11.7004185Z parm:           NVreg_NvLinkDisable:int
2025-12-04T09:40:11.7004723Z parm:           NVreg_EnablePCIERelaxedOrderingMode:int
2025-12-04T09:40:11.7005182Z parm:           NVreg_RegisterPCIDriver:int
2025-12-04T09:40:11.7005615Z parm:           NVreg_RegisterPlatformDeviceDriver:int
2025-12-04T09:40:11.7006070Z parm:           NVreg_EnableResizableBar:int
2025-12-04T09:40:11.7006495Z parm:           NVreg_EnableDbgBreakpoint:int
2025-12-04T09:40:11.7006934Z parm:           NVreg_EnableNonblockingOpen:int
2025-12-04T09:40:11.7007363Z parm:           NVreg_CoherentGPUMemoryMode:charp
2025-12-04T09:40:11.7007791Z parm:           NVreg_RegistryDwords:charp
2025-12-04T09:40:11.7008216Z parm:           NVreg_RegistryDwordsPerDevice:charp
2025-12-04T09:40:11.7008616Z parm:           NVreg_RmMsg:charp
2025-12-04T09:40:11.7008978Z parm:           NVreg_GpuBlacklist:charp
2025-12-04T09:40:11.7009385Z parm:           NVreg_TemporaryFilePath:charp
2025-12-04T09:40:11.7009781Z parm:           NVreg_ExcludedGpus:charp
2025-12-04T09:40:11.7010181Z parm:           NVreg_DmaRemapPeerMmio:int
2025-12-04T09:40:11.7010591Z parm:           NVreg_RmNvlinkBandwidth:charp
2025-12-04T09:40:11.7011037Z parm:           NVreg_RmNvlinkBandwidthLinkCount:int
2025-12-04T09:40:11.7011460Z parm:           NVreg_ImexChannelCount:int
2025-12-04T09:40:11.7011866Z parm:           NVreg_CreateImexChannel0:int
2025-12-04T09:40:11.7012300Z parm:           NVreg_GrdmaPciTopoCheckOverride:int
2025-12-04T09:40:11.7012786Z parm:           rm_firmware_active:charp
2025-12-04T09:40:11.7013156Z + HAS_NVIDIA_DRIVER=0
2025-12-04T09:40:11.7013474Z ++ command -v nvidia-smi
2025-12-04T09:40:11.7013781Z + '[' -x /usr/bin/nvidia-smi ']'
2025-12-04T09:40:11.7014106Z + set +e
2025-12-04T09:40:11.7014499Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0
2025-12-04T09:40:13.2489648Z + INSTALLED_DRIVER_VERSION=580.82.07
2025-12-04T09:40:13.2490088Z + NVIDIA_SMI_STATUS=0
2025-12-04T09:40:13.2490637Z + '[' 0 -ne 0 ']'
2025-12-04T09:40:13.2490913Z + '[' 580.82.07 '!=' 525.105.17 ']'
2025-12-04T09:40:13.2491566Z + echo 'NVIDIA driver (580.82.07) has been installed, but we expect to have 525.105.17 instead. Continuing'
2025-12-04T09:40:13.2492218Z + sudo killall nvidia-persistenced
2025-12-04T09:40:13.2492825Z NVIDIA driver (580.82.07) has been installed, but we expect to have 525.105.17 instead. Continuing
2025-12-04T09:40:13.3983716Z nvidia-persistenced: no process found
2025-12-04T09:40:13.4003090Z + true
2025-12-04T09:40:13.4003439Z + set -e
2025-12-04T09:40:13.4003678Z + '[' 0 -eq 0 ']'
2025-12-04T09:40:13.4003977Z + '[' amzn2023 '!=' ubuntu20.04 ']'
2025-12-04T09:40:13.4004388Z + sudo yum groupinstall -y 'Development Tools'
2025-12-04T09:40:13.9483309Z Last metadata expiration check: 0:23:23 ago on Thu Dec  4 09:16:50 2025.
2025-12-04T09:40:13.9944015Z No match for group package "system-rpm-config"
2025-12-04T09:40:13.9963978Z No match for group package "rcs"
2025-12-04T09:40:13.9990371Z No match for group package "pkgconfig"
2025-12-04T09:40:14.0583796Z Dependencies resolved.
2025-12-04T09:40:14.0923056Z ================================================================================
2025-12-04T09:40:14.0923633Z  Package           Architecture     Version             Repository         Size
2025-12-04T09:40:14.0924161Z ================================================================================
2025-12-04T09:40:14.0924554Z Installing Groups:
2025-12-04T09:40:14.0924957Z  Development Tools                                                             
2025-12-04T09:40:14.0925311Z 
2025-12-04T09:40:14.0925419Z Transaction Summary
2025-12-04T09:40:14.0925726Z ================================================================================
2025-12-04T09:40:14.0926044Z 
2025-12-04T09:40:15.0086091Z ================================================================================
2025-12-04T09:40:15.0086609Z WARNING:
2025-12-04T09:40:15.0086920Z   A newer release of "Amazon Linux" is available.
2025-12-04T09:40:15.0087210Z 
2025-12-04T09:40:15.0087319Z   Available Versions:
2025-12-04T09:40:15.0087798Z 
2025-12-04T09:40:15.0087908Z   Version 2023.9.20250929:
2025-12-04T09:40:15.0088298Z     Run the following command to upgrade to 2023.9.20250929:
2025-12-04T09:40:15.0088622Z 
2025-12-04T09:40:15.0088784Z       dnf upgrade --releasever=2023.9.20250929
2025-12-04T09:40:15.0089049Z 
2025-12-04T09:40:15.0089153Z     Release notes:
2025-12-04T09:40:15.0089683Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20250929.html
2025-12-04T09:40:15.0090156Z 
2025-12-04T09:40:15.0090283Z   Version 2023.9.20251014:
2025-12-04T09:40:15.0090755Z     Run the following command to upgrade to 2023.9.20251014:
2025-12-04T09:40:15.0091088Z 
2025-12-04T09:40:15.0091231Z       dnf upgrade --releasever=2023.9.20251014
2025-12-04T09:40:15.0091511Z 
2025-12-04T09:40:15.0091615Z     Release notes:
2025-12-04T09:40:15.0092120Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251014.html
2025-12-04T09:40:15.0092589Z 
2025-12-04T09:40:15.0092700Z   Version 2023.9.20251020:
2025-12-04T09:40:15.0093089Z     Run the following command to upgrade to 2023.9.20251020:
2025-12-04T09:40:15.0093405Z 
2025-12-04T09:40:15.0093557Z       dnf upgrade --releasever=2023.9.20251020
2025-12-04T09:40:15.0093819Z 
2025-12-04T09:40:15.0093934Z     Release notes:
2025-12-04T09:40:15.0094415Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251020.html
2025-12-04T09:40:15.0094895Z 
2025-12-04T09:40:15.0095160Z   Version 2023.9.20251027:
2025-12-04T09:40:15.0095550Z     Run the following command to upgrade to 2023.9.20251027:
2025-12-04T09:40:15.0095864Z 
2025-12-04T09:40:15.0096000Z       dnf upgrade --releasever=2023.9.20251027
2025-12-04T09:40:15.0096372Z 
2025-12-04T09:40:15.0096475Z     Release notes:
2025-12-04T09:40:15.0096970Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251027.html
2025-12-04T09:40:15.0097436Z 
2025-12-04T09:40:15.0097561Z   Version 2023.9.20251105:
2025-12-04T09:40:15.0097935Z     Run the following command to upgrade to 2023.9.20251105:
2025-12-04T09:40:15.0098276Z 
2025-12-04T09:40:15.0098414Z       dnf upgrade --releasever=2023.9.20251105
2025-12-04T09:40:15.0098676Z 
2025-12-04T09:40:15.0098795Z     Release notes:
2025-12-04T09:40:15.0099281Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251105.html
2025-12-04T09:40:15.0099762Z 
2025-12-04T09:40:15.0099869Z   Version 2023.9.20251110:
2025-12-04T09:40:15.0100255Z     Run the following command to upgrade to 2023.9.20251110:
2025-12-04T09:40:15.0100572Z 
2025-12-04T09:40:15.0100722Z       dnf upgrade --releasever=2023.9.20251110
2025-12-04T09:40:15.0100980Z 
2025-12-04T09:40:15.0101081Z     Release notes:
2025-12-04T09:40:15.0101571Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251110.html
2025-12-04T09:40:15.0102032Z 
2025-12-04T09:40:15.0102151Z   Version 2023.9.20251117:
2025-12-04T09:40:15.0102518Z     Run the following command to upgrade to 2023.9.20251117:
2025-12-04T09:40:15.0102854Z 
2025-12-04T09:40:15.0102990Z       dnf upgrade --releasever=2023.9.20251117
2025-12-04T09:40:15.0103265Z 
2025-12-04T09:40:15.0103367Z     Release notes:
2025-12-04T09:40:15.0103855Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251117.html
2025-12-04T09:40:15.0104316Z 
2025-12-04T09:40:15.0104449Z ================================================================================
2025-12-04T09:40:15.0104844Z Complete!
2025-12-04T09:40:15.1253255Z ++ uname -r
2025-12-04T09:40:15.1265676Z + sudo yum install -y 'kernel-devel-uname-r == 6.1.150-174.273.amzn2023.x86_64'
2025-12-04T09:40:15.7101409Z Last metadata expiration check: 0:23:25 ago on Thu Dec  4 09:16:50 2025.
2025-12-04T09:40:15.7418792Z Using '==' operator in reldeps can result in an undefined behavior. It is deprecated and the support will be dropped in future versions. Use '=' operator instead.
2025-12-04T09:40:15.7543608Z Package kernel-devel-1:6.1.150-174.273.amzn2023.x86_64 is already installed.
2025-12-04T09:40:15.8187446Z Dependencies resolved.
2025-12-04T09:40:15.8544721Z Nothing to do.
2025-12-04T09:40:15.8545385Z Complete!
2025-12-04T09:40:15.9691489Z + sudo modprobe backlight
2025-12-04T09:40:16.2362214Z + sudo curl -fsL -o /tmp/nvidia_driver https://s3.amazonaws.com/ossci-linux/nvidia_driver/NVIDIA-Linux-x86_64-525.105.17.run
2025-12-04T09:40:20.5959749Z + set +e
2025-12-04T09:40:20.5960225Z + sudo /bin/bash /tmp/nvidia_driver -s --no-drm
2025-12-04T09:40:22.1108168Z Verifying archive integrity... OK
2025-12-04T09:40:49.8207627Z Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 525.105.17...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
2025-12-04T09:40:50.3739232Z 
2025-12-04T09:40:50.3742167Z WARNING: The nvidia-drm module will not be installed. As a result, DRM-KMS will not function with this installation of the NVIDIA driver.
2025-12-04T09:40:50.3742892Z 
2025-12-04T09:41:16.5245111Z 
2025-12-04T09:41:16.5247012Z WARNING: nvidia-installer was forced to guess the X library path '/usr/lib64' and X module path '/usr/lib64/xorg/modules'; these paths were not queryable from the system.  If X fails to find the NVIDIA X driver module, please install the `pkg-config` utility and the X.Org SDK/development package for your distribution and reinstall the driver.
2025-12-04T09:41:16.5248729Z 
2025-12-04T09:41:16.5264403Z 
2025-12-04T09:41:16.5265773Z WARNING: This NVIDIA driver package includes Vulkan components, but no Vulkan ICD loader was detected on this system. The NVIDIA Vulkan ICD will not function without the loader. Most distributions package the Vulkan loader; try installing the "vulkan-loader", "vulkan-icd-loader", or "libvulkan1" package.
2025-12-04T09:41:16.5267258Z 
2025-12-04T09:41:28.1302144Z + NVIDIA_INSTALLATION_STATUS=0
2025-12-04T09:41:28.1302557Z + RESET_GPU=0
2025-12-04T09:41:28.1302836Z + '[' 0 -ne 0 ']'
2025-12-04T09:41:28.1304638Z ++ command -v nvidia-smi
2025-12-04T09:41:28.1307569Z + '[' -x /usr/bin/nvidia-smi ']'
2025-12-04T09:41:28.1311266Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0
2025-12-04T09:41:30.7139362Z + INSTALLED_DRIVER_VERSION=525.105.17
2025-12-04T09:41:30.7139821Z + NVIDIA_SMI_STATUS=0
2025-12-04T09:41:30.7140138Z + '[' 0 -ne 0 ']'
2025-12-04T09:41:30.7140389Z + '[' 0 -eq 1 ']'
2025-12-04T09:41:30.7140683Z + sudo rm -fv /tmp/nvidia_driver
2025-12-04T09:41:30.8987357Z removed '/tmp/nvidia_driver'
2025-12-04T09:41:30.9006505Z + set -e
2025-12-04T09:41:30.9008667Z + post_install_nvidia_driver_common
2025-12-04T09:41:30.9012478Z + sudo modprobe nvidia
2025-12-04T09:41:31.1454613Z + echo 'After installing NVIDIA driver'
2025-12-04T09:41:31.1455031Z + lspci
2025-12-04T09:41:31.1455286Z After installing NVIDIA driver
2025-12-04T09:41:31.1592554Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma]
2025-12-04T09:41:31.1593189Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
2025-12-04T09:41:31.1594009Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
2025-12-04T09:41:31.1594664Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111
2025-12-04T09:41:31.1595587Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller
2025-12-04T09:41:31.1596267Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
2025-12-04T09:41:31.1596902Z 00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
2025-12-04T09:41:31.1597521Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
2025-12-04T09:41:31.1598050Z + lsmod
2025-12-04T09:41:31.1625496Z Module                  Size  Used by
2025-12-04T09:41:31.1625918Z nvidia              56537088  0
2025-12-04T09:41:31.1626243Z drm                   602112  1 nvidia
2025-12-04T09:41:31.1626645Z drm_panel_orientation_quirks    32768  1 drm
2025-12-04T09:41:31.1627068Z backlight              24576  1 drm
2025-12-04T09:41:31.1627432Z i2c_core              110592  2 nvidia,drm
2025-12-04T09:41:31.1627797Z xt_conntrack           16384  1
2025-12-04T09:41:31.1628121Z nft_chain_nat          16384  3
2025-12-04T09:41:31.1628438Z xt_MASQUERADE          20480  1
2025-12-04T09:41:31.1628807Z nf_nat                 57344  2 nft_chain_nat,xt_MASQUERADE
2025-12-04T09:41:31.1629219Z nf_conntrack_netlink    57344  0
2025-12-04T09:41:31.1629707Z nf_conntrack          184320  4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE
2025-12-04T09:41:31.1630261Z nf_defrag_ipv6         24576  1 nf_conntrack
2025-12-04T09:41:31.1630858Z nf_defrag_ipv4         16384  1 nf_conntrack
2025-12-04T09:41:31.1631223Z xfrm_user              57344  1
2025-12-04T09:41:31.1631579Z xfrm_algo              16384  1 xfrm_user
2025-12-04T09:41:31.1631947Z xt_addrtype            16384  2
2025-12-04T09:41:31.1632275Z nft_compat             20480  4
2025-12-04T09:41:31.1632646Z nf_tables             311296  57 nft_compat,nft_chain_nat
2025-12-04T09:41:31.1633167Z nfnetlink              20480  4 nft_compat,nf_conntrack_netlink,nf_tables
2025-12-04T09:41:31.1633647Z br_netfilter           36864  0
2025-12-04T09:41:31.1633984Z bridge                323584  1 br_netfilter
2025-12-04T09:41:31.1634366Z stp                    16384  1 bridge
2025-12-04T09:41:31.1634722Z llc                    16384  2 bridge,stp
2025-12-04T09:41:31.1635062Z overlay               167936  0
2025-12-04T09:41:31.1635376Z tls                   139264  0
2025-12-04T09:41:31.1635689Z nls_ascii              16384  1
2025-12-04T09:41:31.1636001Z nls_cp437              20480  1
2025-12-04T09:41:31.1636305Z vfat                   24576  1
2025-12-04T09:41:31.1636618Z fat                    86016  1 vfat
2025-12-04T09:41:31.1636952Z sunrpc                700416  1
2025-12-04T09:41:31.1637245Z i8042                  45056  0
2025-12-04T09:41:31.1637549Z ena                   184320  0
2025-12-04T09:41:31.1637863Z serio                  28672  3 i8042
2025-12-04T09:41:31.1638193Z skx_edac_common        28672  0
2025-12-04T09:41:31.1638510Z button                 24576  0
2025-12-04T09:41:31.1638832Z ghash_clmulni_intel    16384  0
2025-12-04T09:41:31.1639148Z sch_fq_codel           20480  17
2025-12-04T09:41:31.1639478Z fuse                  184320  1
2025-12-04T09:41:31.1639785Z dm_mod                188416  0
2025-12-04T09:41:31.1640083Z configfs               57344  1
2025-12-04T09:41:31.1640391Z loop                   36864  0
2025-12-04T09:41:31.1640699Z dmi_sysfs              20480  0
2025-12-04T09:41:31.1641012Z crc32_pclmul           16384  0
2025-12-04T09:41:31.1641313Z crc32c_intel           24576  0
2025-12-04T09:41:31.1641632Z efivarfs               24576  1
2025-12-04T09:41:31.1641944Z + modinfo nvidia
2025-12-04T09:41:31.1644323Z filename:       /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko
2025-12-04T09:41:31.1644944Z firmware:       nvidia/525.105.17/gsp_tu10x.bin
2025-12-04T09:41:31.1645380Z firmware:       nvidia/525.105.17/gsp_ad10x.bin
2025-12-04T09:41:31.1645771Z alias:          char-major-195-*
2025-12-04T09:41:31.1646109Z version:        525.105.17
2025-12-04T09:41:31.1646428Z supported:      external
2025-12-04T09:41:31.1646716Z license:        NVIDIA
2025-12-04T09:41:31.1647150Z srcversion:     98F82D76E0EF3952EEE57A7
2025-12-04T09:41:31.1647556Z alias:          pci:v000010DEd*sv*sd*bc06sc80i00*
2025-12-04T09:41:31.1648042Z alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
2025-12-04T09:41:31.1648513Z alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
2025-12-04T09:41:31.1648909Z depends:        i2c-core,drm
2025-12-04T09:41:31.1649230Z retpoline:      Y
2025-12-04T09:41:31.1649505Z name:           nvidia
2025-12-04T09:41:31.1649960Z vermagic:       6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 
2025-12-04T09:41:31.1650558Z parm:           NvSwitchRegDwords:NvSwitch regkey (charp)
2025-12-04T09:41:31.1651105Z parm:           NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp)
2025-12-04T09:41:31.1651638Z parm:           NVreg_ResmanDebugLevel:int
2025-12-04T09:41:31.1652026Z parm:           NVreg_RmLogonRC:int
2025-12-04T09:41:31.1652393Z parm:           NVreg_ModifyDeviceFiles:int
2025-12-04T09:41:31.1652798Z parm:           NVreg_DeviceFileUID:int
2025-12-04T09:41:31.1653177Z parm:           NVreg_DeviceFileGID:int
2025-12-04T09:41:31.1653575Z parm:           NVreg_DeviceFileMode:int
2025-12-04T09:41:31.1654009Z parm:           NVreg_InitializeSystemMemoryAllocations:int
2025-12-04T09:41:31.1654489Z parm:           NVreg_UsePageAttributeTable:int
2025-12-04T09:41:31.1654901Z parm:           NVreg_EnablePCIeGen3:int
2025-12-04T09:41:31.1655360Z parm:           NVreg_EnableMSI:int
2025-12-04T09:41:31.1655724Z parm:           NVreg_TCEBypassMode:int
2025-12-04T09:41:31.1656120Z parm:           NVreg_EnableStreamMemOPs:int
2025-12-04T09:41:31.1656647Z parm:           NVreg_RestrictProfilingToAdminUsers:int
2025-12-04T09:41:31.1657126Z parm:           NVreg_PreserveVideoMemoryAllocations:int
2025-12-04T09:41:31.1657598Z parm:           NVreg_EnableS0ixPowerManagement:int
2025-12-04T09:41:31.1658109Z parm:           NVreg_S0ixPowerManagementVideoMemoryThreshold:int
2025-12-04T09:41:31.1658605Z parm:           NVreg_DynamicPowerManagement:int
2025-12-04T09:41:31.1659128Z parm:           NVreg_DynamicPowerManagementVideoMemoryThreshold:int
2025-12-04T09:41:31.1659639Z parm:           NVreg_EnableGpuFirmware:int
2025-12-04T09:41:31.1660046Z parm:           NVreg_EnableGpuFirmwareLogs:int
2025-12-04T09:41:31.1660506Z parm:           NVreg_OpenRmEnableUnsupportedGpus:int
2025-12-04T09:41:31.1660970Z parm:           NVreg_EnableUserNUMAManagement:int
2025-12-04T09:41:31.1661396Z parm:           NVreg_MemoryPoolSize:int
2025-12-04T09:41:31.1661784Z parm:           NVreg_KMallocHeapMaxSize:int
2025-12-04T09:41:31.1662195Z parm:           NVreg_VMallocHeapMaxSize:int
2025-12-04T09:41:31.1662596Z parm:           NVreg_IgnoreMMIOCheck:int
2025-12-04T09:41:31.1662973Z parm:           NVreg_NvLinkDisable:int
2025-12-04T09:41:31.1663399Z parm:           NVreg_EnablePCIERelaxedOrderingMode:int
2025-12-04T09:41:31.1663847Z parm:           NVreg_RegisterPCIDriver:int
2025-12-04T09:41:31.1664246Z parm:           NVreg_EnableDbgBreakpoint:int
2025-12-04T09:41:31.1664668Z parm:           NVreg_RegistryDwords:charp
2025-12-04T09:41:31.1665092Z parm:           NVreg_RegistryDwordsPerDevice:charp
2025-12-04T09:41:31.1665494Z parm:           NVreg_RmMsg:charp
2025-12-04T09:41:31.1665853Z parm:           NVreg_GpuBlacklist:charp
2025-12-04T09:41:31.1666257Z parm:           NVreg_TemporaryFilePath:charp
2025-12-04T09:41:31.1666666Z parm:           NVreg_ExcludedGpus:charp
2025-12-04T09:41:31.1667049Z parm:           NVreg_DmaRemapPeerMmio:int
2025-12-04T09:41:31.1667442Z parm:           rm_firmware_active:charp
2025-12-04T09:41:31.1667792Z + set +e
2025-12-04T09:41:31.1668011Z + nvidia-smi
2025-12-04T09:41:33.1429579Z Thu Dec  4 09:41:33 2025       
2025-12-04T09:41:33.1430085Z +-----------------------------------------------------------------------------+
2025-12-04T09:41:33.1430694Z | NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
2025-12-04T09:41:33.1431275Z |-------------------------------+----------------------+----------------------+
2025-12-04T09:41:33.1432271Z | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
2025-12-04T09:41:33.1432928Z | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
2025-12-04T09:41:33.1433460Z |                               |                      |               MIG M. |
2025-12-04T09:41:33.1433856Z |===============================+======================+======================|
2025-12-04T09:41:33.1509162Z |   0  Tesla T4            Off  | 00000000:00:1E.0 Off |                    0 |
2025-12-04T09:41:33.1509768Z | N/A   30C    P0    25W /  70W |      2MiB / 15360MiB |      4%      Default |
2025-12-04T09:41:33.1510324Z |                               |                      |                  N/A |
2025-12-04T09:41:33.1510781Z +-------------------------------+----------------------+----------------------+
2025-12-04T09:41:33.1511258Z                                                                                
2025-12-04T09:41:33.1511728Z +-----------------------------------------------------------------------------+
2025-12-04T09:41:33.1512229Z | Processes:                                                                  |
2025-12-04T09:41:33.1512893Z |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
2025-12-04T09:41:33.1513383Z |        ID   ID                                                   Usage      |
2025-12-04T09:41:33.1514039Z |=============================================================================|
2025-12-04T09:41:33.1514551Z |  No running processes found                                                 |
2025-12-04T09:41:33.1515122Z +-----------------------------------------------------------------------------+
2025-12-04T09:41:33.5994895Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0
2025-12-04T09:41:35.5628166Z Tesla T4
2025-12-04T09:41:36.0004830Z + NVIDIA_SMI_STATUS=0
2025-12-04T09:41:36.0005216Z + '[' 0 -eq 0 ']'
2025-12-04T09:41:36.0005520Z + echo 'INFO: Ignoring allowed status 0'
2025-12-04T09:41:36.0005929Z + set -e
2025-12-04T09:41:36.0006178Z INFO: Ignoring allowed status 0
2025-12-04T09:41:36.0012658Z == Installing nvidia container toolkit for amzn2023 ==
2025-12-04T09:41:36.0016421Z + sudo yum install -y yum-utils
2025-12-04T09:41:36.5647087Z Last metadata expiration check: 0:24:46 ago on Thu Dec  4 09:16:50 2025.
2025-12-04T09:41:36.5992846Z Package dnf-utils-4.3.0-13.amzn2023.0.5.noarch is already installed.
2025-12-04T09:41:36.6620604Z Dependencies resolved.
2025-12-04T09:41:36.6974932Z Nothing to do.
2025-12-04T09:41:36.6975660Z Complete!
2025-12-04T09:41:36.7955566Z + [[ amzn2023 == \a\m\z\n\2\0\2\3 ]]
2025-12-04T09:41:36.7956327Z + YUM_REPO_URL=https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
2025-12-04T09:41:36.7957442Z + sudo yum-config-manager --add-repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
2025-12-04T09:41:37.1641007Z Adding repo from: https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
2025-12-04T09:41:37.2235151Z + sudo yum install -y nvidia-container-toolkit-1.17.8 libnvidia-container-tools-1.17.8 libnvidia-container1-1.17.8 nvidia-container-toolkit-base-1.17.8
2025-12-04T09:41:37.8990156Z nvidia-container-toolkit                         19 kB/s | 833  B     00:00    
2025-12-04T09:41:38.0055711Z Dependencies resolved.
2025-12-04T09:41:38.0401147Z ================================================================================
2025-12-04T09:41:38.0401710Z  Package                       Arch   Version    Repository                Size
2025-12-04T09:41:38.0402199Z ================================================================================
2025-12-04T09:41:38.0402573Z Downgrading:
2025-12-04T09:41:38.0403037Z  libnvidia-container-tools     x86_64 1.17.8-1   nvidia-container-toolkit  40 k
2025-12-04T09:41:38.0403759Z  libnvidia-container1          x86_64 1.17.8-1   nvidia-container-toolkit 1.0 M
2025-12-04T09:41:38.0404482Z  nvidia-container-toolkit      x86_64 1.17.8-1   nvidia-container-toolkit 1.2 M
2025-12-04T09:41:38.0405495Z  nvidia-container-toolkit-base x86_64 1.17.8-1   nvidia-container-toolkit 5.8 M
2025-12-04T09:41:38.0406097Z 
2025-12-04T09:41:38.0406269Z Transaction Summary
2025-12-04T09:41:38.0406703Z ================================================================================
2025-12-04T09:41:38.0407094Z Downgrade  4 Packages
2025-12-04T09:41:38.0407291Z 
2025-12-04T09:41:38.0407437Z Total download size: 8.0 M
2025-12-04T09:41:38.0408249Z Downloading Packages:
2025-12-04T09:41:38.0790466Z (1/4): libnvidia-container-tools-1.17.8-1.x86_6 1.1 MB/s |  40 kB     00:00    
2025-12-04T09:41:38.1307652Z (2/4): libnvidia-container1-1.17.8-1.x86_64.rpm  11 MB/s | 1.0 MB     00:00    
2025-12-04T09:41:38.1826269Z (3/4): nvidia-container-toolkit-1.17.8-1.x86_64 8.8 MB/s | 1.2 MB     00:00    
2025-12-04T09:41:38.3246748Z (4/4): nvidia-container-toolkit-base-1.17.8-1.x  24 MB/s | 5.8 MB     00:00    
2025-12-04T09:41:38.3258902Z --------------------------------------------------------------------------------
2025-12-04T09:41:38.3263296Z Total                                            28 MB/s | 8.0 MB     00:00     
2025-12-04T09:41:38.3266656Z Running transaction check
2025-12-04T09:41:38.3428838Z Transaction check succeeded.
2025-12-04T09:41:38.3429406Z Running transaction test
2025-12-04T09:41:38.3998070Z Transaction test succeeded.
2025-12-04T09:41:38.4002155Z Running transaction
2025-12-04T09:41:39.4223420Z   Preparing        :                                                        1/1 
2025-12-04T09:41:39.5798207Z   Downgrading      : nvidia-container-toolkit-base-1.17.8-1.x86_64          1/8 
2025-12-04T09:41:39.6122889Z   Downgrading      : libnvidia-container1-1.17.8-1.x86_64                   2/8 
2025-12-04T09:41:39.6885917Z   Running scriptlet: libnvidia-container1-1.17.8-1.x86_64                   2/8 
2025-12-04T09:41:39.8453549Z   Downgrading      : libnvidia-container-tools-1.17.8-1.x86_64              3/8 
2025-12-04T09:41:39.8756895Z   Downgrading      : nvidia-container-toolkit-1.17.8-1.x86_64               4/8 
2025-12-04T09:41:39.9609196Z   Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64               4/8 
2025-12-04T09:41:39.9679432Z   Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64               5/8 
2025-12-04T09:41:39.9680393Z   Cleanup          : nvidia-container-toolkit-1.18.1-1.x86_64               5/8 
2025-12-04T09:41:40.0046739Z   Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64               5/8 
2025-12-04T09:41:40.0115655Z   Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64              6/8 
2025-12-04T09:41:40.0117073Z   Cleanup          : libnvidia-container-tools-1.18.1-1.x86_64              6/8 
2025-12-04T09:41:40.0501390Z   Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64              6/8 
2025-12-04T09:41:40.0573901Z   Running scriptlet: libnvidia-container1-1.18.1-1.x86_64                   7/8 
2025-12-04T09:41:40.0575076Z   Cleanup          : libnvidia-container1-1.18.1-1.x86_64                   7/8 
2025-12-04T09:41:40.0949969Z   Running scriptlet: libnvidia-container1-1.18.1-1.x86_64                   7/8 
2025-12-04T09:41:40.1022592Z   Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8 
2025-12-04T09:41:40.1023949Z   Cleanup          : nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8 
2025-12-04T09:41:40.1357865Z   Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8 
2025-12-04T09:41:40.1927045Z   Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64               8/8 
2025-12-04T09:41:41.1002479Z   Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8 
2025-12-04T09:41:41.1003951Z   Verifying        : libnvidia-container-tools-1.17.8-1.x86_64              1/8 
2025-12-04T09:41:41.1005258Z   Verifying        : libnvidia-container-tools-1.18.1-1.x86_64              2/8 
2025-12-04T09:41:41.1006216Z   Verifying        : libnvidia-container1-1.17.8-1.x86_64                   3/8 
2025-12-04T09:41:41.1007813Z   Verifying        : libnvidia-container1-1.18.1-1.x86_64                   4/8 
2025-12-04T09:41:41.1009101Z   Verifying        : nvidia-container-toolkit-1.17.8-1.x86_64               5/8 
2025-12-04T09:41:41.1010377Z   Verifying        : nvidia-container-toolkit-1.18.1-1.x86_64               6/8 
2025-12-04T09:41:41.1011669Z   Verifying        : nvidia-container-toolkit-base-1.17.8-1.x86_64          7/8 
2025-12-04T09:41:41.2634107Z   Verifying        : nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8================================================================================
2025-12-04T09:41:41.2635314Z WARNING:
2025-12-04T09:41:41.2635749Z   A newer release of "Amazon Linux" is available.
2025-12-04T09:41:41.2636226Z 
2025-12-04T09:41:41.2636397Z   Available Versions:
2025-12-04T09:41:41.2636684Z 
2025-12-04T09:41:41.2636884Z   Version 2023.9.20250929:
2025-12-04T09:41:41.2637488Z     Run the following command to upgrade to 2023.9.20250929:
2025-12-04T09:41:41.2638022Z 
2025-12-04T09:41:41.2638263Z       dnf upgrade --releasever=2023.9.20250929
2025-12-04T09:41:41.2638698Z 
2025-12-04T09:41:41.2638880Z     Release notes:
2025-12-04T09:41:41.2639681Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20250929.html
2025-12-04T09:41:41.2640469Z 
2025-12-04T09:41:41.2640646Z   Version 2023.9.20251014:
2025-12-04T09:41:41.2641603Z     Run the following command to upgrade to 2023.9.20251014:
2025-12-04T09:41:41.2642150Z 
2025-12-04T09:41:41.2642395Z       dnf upgrade --releasever=2023.9.20251014
2025-12-04T09:41:41.2642859Z 
2025-12-04T09:41:41.2643039Z     Release notes:
2025-12-04T09:41:41.2643850Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251014.html
2025-12-04T09:41:41.2644640Z 
2025-12-04T09:41:41.2644826Z   Version 2023.9.20251020:
2025-12-04T09:41:41.2645449Z     Run the following command to upgrade to 2023.9.20251020:
2025-12-04T09:41:41.2645990Z 
2025-12-04T09:41:41.2646207Z       dnf upgrade --releasever=2023.9.20251020
2025-12-04T09:41:41.2646672Z 
2025-12-04T09:41:41.2646844Z     Release notes:
2025-12-04T09:41:41.2647590Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251020.html
2025-12-04T09:41:41.2648338Z 
2025-12-04T09:41:41.2648530Z   Version 2023.9.20251027:
2025-12-04T09:41:41.2649141Z     Run the following command to upgrade to 2023.9.20251027:
2025-12-04T09:41:41.2649672Z 
2025-12-04T09:41:41.2649885Z       dnf upgrade --releasever=2023.9.20251027
2025-12-04T09:41:41.2650286Z 
2025-12-04T09:41:41.2650453Z     Release notes:
2025-12-04T09:41:41.2651200Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251027.html
2025-12-04T09:41:41.2651949Z 
2025-12-04T09:41:41.2652111Z   Version 2023.9.20251105:
2025-12-04T09:41:41.2652731Z     Run the following command to upgrade to 2023.9.20251105:
2025-12-04T09:41:41.2653207Z 
2025-12-04T09:41:41.2653437Z       dnf upgrade --releasever=2023.9.20251105
2025-12-04T09:41:41.2653903Z 
2025-12-04T09:41:41.2654092Z     Release notes:
2025-12-04T09:41:41.2654924Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251105.html
2025-12-04T09:41:41.2655688Z 
2025-12-04T09:41:41.2655874Z   Version 2023.9.20251110:
2025-12-04T09:41:41.2656563Z     Run the following command to upgrade to 2023.9.20251110:
2025-12-04T09:41:41.2657088Z 
2025-12-04T09:41:41.2657309Z       dnf upgrade --releasever=2023.9.20251110
2025-12-04T09:41:41.2657753Z 
2025-12-04T09:41:41.2657928Z     Release notes:
2025-12-04T09:41:41.2658796Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251110.html
2025-12-04T09:41:41.2659672Z 
2025-12-04T09:41:41.2659883Z   Version 2023.9.20251117:
2025-12-04T09:41:41.2660605Z     Run the following command to upgrade to 2023.9.20251117:
2025-12-04T09:41:41.2661210Z 
2025-12-04T09:41:41.2661470Z       dnf upgrade --releasever=2023.9.20251117
2025-12-04T09:41:41.2662004Z 
2025-12-04T09:41:41.2662211Z     Release notes:
2025-12-04T09:41:41.2663570Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251117.html
2025-12-04T09:41:41.2664489Z 
2025-12-04T09:41:41.2664723Z ================================================================================
2025-12-04T09:41:41.3341112Z  
2025-12-04T09:41:41.3341407Z 
2025-12-04T09:41:41.3341553Z Downgraded:
2025-12-04T09:41:41.3342301Z   libnvidia-container-tools-1.17.8-1.x86_64                                     
2025-12-04T09:41:41.3343424Z   libnvidia-container1-1.17.8-1.x86_64                                          
2025-12-04T09:41:41.3344496Z   nvidia-container-toolkit-1.17.8-1.x86_64                                      
2025-12-04T09:41:41.3345799Z   nvidia-container-toolkit-base-1.17.8-1.x86_64                                 
2025-12-04T09:41:41.3346498Z 
2025-12-04T09:41:41.3346663Z Complete!
2025-12-04T09:41:41.3922208Z + sudo systemctl restart docker
2025-12-04T09:41:48.1116974Z Thu Dec  4 09:41:48 2025       
2025-12-04T09:41:48.1117469Z +-----------------------------------------------------------------------------+
2025-12-04T09:41:48.1118099Z | NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
2025-12-04T09:41:48.1118691Z |-------------------------------+----------------------+----------------------+
2025-12-04T09:41:48.1119294Z | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
2025-12-04T09:41:48.1120251Z | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
2025-12-04T09:41:48.1120774Z |                               |                      |               MIG M. |
2025-12-04T09:41:48.1121181Z |===============================+======================+======================|
2025-12-04T09:41:48.1218257Z |   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
2025-12-04T09:41:48.1218827Z | N/A   30C    P0    25W /  70W |      2MiB / 15360MiB |      7%      Default |
2025-12-04T09:41:48.1219362Z |                               |                      |                  N/A |
2025-12-04T09:41:48.1219844Z +-------------------------------+----------------------+----------------------+
2025-12-04T09:41:48.1220326Z                                                                                
2025-12-04T09:41:48.1220777Z +-----------------------------------------------------------------------------+
2025-12-04T09:41:48.1221290Z | Processes:                                                                  |
2025-12-04T09:41:48.1221945Z |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
2025-12-04T09:41:48.1222439Z |        ID   ID                                                   Usage      |
2025-12-04T09:41:48.1222846Z |=============================================================================|
2025-12-04T09:41:48.1223364Z |  No running processes found                                                 |
2025-12-04T09:41:48.1223926Z +-----------------------------------------------------------------------------+
2025-12-04T09:41:48.2001394Z Unable to find image 'public.ecr.aws/docker/library/python:3.13' locally
2025-12-04T09:41:48.3711485Z 3.13: Pulling from docker/library/python
2025-12-04T09:41:48.4584206Z 53c88f1dfeb7: Pulling fs layer
2025-12-04T09:41:48.4584641Z eae668646f44: Pulling fs layer
2025-12-04T09:41:48.4584971Z ff2e6e687b6c: Pulling fs layer
2025-12-04T09:41:48.4585311Z 7c40a3faff76: Pulling fs layer
2025-12-04T09:41:48.4585642Z 967a3b1c8fef: Pulling fs layer
2025-12-04T09:41:48.4586006Z a64e1a44f22a: Pulling fs layer
2025-12-04T09:41:48.4586332Z 52655f8a5bcc: Pulling fs layer
2025-12-04T09:41:48.4586649Z 7c40a3faff76: Waiting
2025-12-04T09:41:48.4586972Z 52655f8a5bcc: Waiting
2025-12-04T09:41:48.4587236Z a64e1a44f22a: Waiting
2025-12-04T09:41:48.6231616Z eae668646f44: Verifying Checksum
2025-12-04T09:41:48.6232119Z eae668646f44: Download complete
2025-12-04T09:41:48.7283856Z 53c88f1dfeb7: Verifying Checksum
2025-12-04T09:41:48.7284298Z 53c88f1dfeb7: Download complete
2025-12-04T09:41:48.8283027Z 967a3b1c8fef: Verifying Checksum
2025-12-04T09:41:48.8283720Z 967a3b1c8fef: Download complete
2025-12-04T09:41:48.8316716Z ff2e6e687b6c: Verifying Checksum
2025-12-04T09:41:48.8317100Z ff2e6e687b6c: Download complete
2025-12-04T09:41:48.8628566Z 52655f8a5bcc: Download complete
2025-12-04T09:41:49.0308187Z a64e1a44f22a: Download complete
2025-12-04T09:41:49.8395332Z 7c40a3faff76: Verifying Checksum
2025-12-04T09:41:49.8395775Z 7c40a3faff76: Download complete
2025-12-04T09:41:50.2241971Z 53c88f1dfeb7: Pull complete
2025-12-04T09:41:50.8395146Z eae668646f44: Pull complete
2025-12-04T09:41:52.8700681Z ff2e6e687b6c: Pull complete
2025-12-04T09:41:58.7127817Z 7c40a3faff76: Pull complete
2025-12-04T09:41:58.9479790Z 967a3b1c8fef: Pull complete
2025-12-04T09:41:59.6374952Z a64e1a44f22a: Pull complete
2025-12-04T09:41:59.6601271Z 52655f8a5bcc: Pull complete
2025-12-04T09:41:59.6731897Z Digest: sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0
2025-12-04T09:41:59.6773400Z Status: Downloaded newer image for public.ecr.aws/docker/library/python:3.13
2025-12-04T09:42:07.0261155Z Thu Dec  4 09:42:07 2025       
2025-12-04T09:42:07.0261831Z +-----------------------------------------------------------------------------+
2025-12-04T09:42:07.0262639Z | NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
2025-12-04T09:42:07.0263428Z |-------------------------------+----------------------+----------------------+
2025-12-04T09:42:07.0264634Z | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
2025-12-04T09:42:07.0265574Z | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
2025-12-04T09:42:07.0266342Z |                               |                      |               MIG M. |
2025-12-04T09:42:07.0266969Z |===============================+======================+======================|
2025-12-04T09:42:07.0420177Z |   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
2025-12-04T09:42:07.0421098Z | N/A   29C    P8     9W /  70W |      2MiB / 15360MiB |      0%      Default |
2025-12-04T09:42:07.0421859Z |                               |                      |                  N/A |
2025-12-04T09:42:07.0422608Z +-------------------------------+----------------------+----------------------+
2025-12-04T09:42:07.0423255Z                                                                                
2025-12-04T09:42:07.0423884Z +-----------------------------------------------------------------------------+
2025-12-04T09:42:07.0424572Z | Processes:                                                                  |
2025-12-04T09:42:07.0425264Z |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
2025-12-04T09:42:07.0425912Z |        ID   ID                                                   Usage      |
2025-12-04T09:42:07.0426460Z |=============================================================================|
2025-12-04T09:42:07.0427157Z |  No running processes found                                                 |
2025-12-04T09:42:07.0427923Z +-----------------------------------------------------------------------------+
2025-12-04T09:42:07.9120854Z Command completed after 1 attempt(s).
2025-12-04T09:42:07.9224746Z Prepare all required actions
2025-12-04T09:42:07.9260393Z ##[group]Run ./.github/actions/get-workflow-job-id
2025-12-04T09:42:07.9260805Z with:
2025-12-04T09:42:07.9261538Z   github-token: ***
2025-12-04T09:42:07.9261800Z env:
2025-12-04T09:42:07.9262054Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:07.9262368Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:07.9262725Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:07.9263150Z ##[endgroup]
2025-12-04T09:42:07.9281234Z ##[group]Run set -eux
2025-12-04T09:42:07.9281646Z [36;1mset -eux[0m
2025-12-04T09:42:07.9299557Z [36;1mpython3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}"[0m
2025-12-04T09:42:07.9311880Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:07.9312331Z env:
2025-12-04T09:42:07.9312818Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:07.9313122Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:07.9313557Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:07.9314203Z   GITHUB_TOKEN: ***
2025-12-04T09:42:07.9314466Z ##[endgroup]
2025-12-04T09:42:07.9351151Z + python3 .github/scripts/get_workflow_job_id.py 19922826259 i-00bb8650059fae3eb
2025-12-04T09:42:10.3245226Z Setting output job-id=57119749248
2025-12-04T09:42:10.3246186Z Setting output job-name=linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable)
2025-12-04T09:42:10.3385605Z ##[group]Run python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84
2025-12-04T09:42:10.3386511Z [36;1mpython3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84[0m
2025-12-04T09:42:10.3387665Z [36;1mpython3 -m tools.stats.monitor --log-interval "$MONITOR_LOG_INTERVAL" --data-collect-interval "$MONITOR_DATA_COLLECT_INTERVAL" > usage_log.txt 2>&1 &[0m
2025-12-04T09:42:10.3388707Z [36;1mecho "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:42:10.3395639Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:10.3396068Z env:
2025-12-04T09:42:10.3396326Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:10.3396641Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:10.3396994Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:10.3397412Z   JOB_ID: 57119749248
2025-12-04T09:42:10.3398197Z   JOB_NAME: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable)
2025-12-04T09:42:10.3398988Z   WORKFLOW_NAME: periodic
2025-12-04T09:42:10.3399305Z   WORKFLOW_RUN_ID: 19922826259
2025-12-04T09:42:10.3399639Z   MONITOR_LOG_INTERVAL: 5
2025-12-04T09:42:10.3399947Z   MONITOR_DATA_COLLECT_INTERVAL: 1
2025-12-04T09:42:10.3400288Z ##[endgroup]
2025-12-04T09:42:10.6583014Z Defaulting to user installation because normal site-packages is not writeable
2025-12-04T09:42:11.0738950Z Collecting psutil==5.9.8
2025-12-04T09:42:11.0935787Z   Downloading psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (288 kB)
2025-12-04T09:42:11.1749841Z Collecting dataclasses_json==0.6.7
2025-12-04T09:42:11.1788039Z   Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB)
2025-12-04T09:42:11.2105806Z Collecting nvidia-ml-py==11.525.84
2025-12-04T09:42:11.2142817Z   Downloading nvidia_ml_py-11.525.84-py3-none-any.whl (34 kB)
2025-12-04T09:42:11.3482074Z Collecting marshmallow<4.0.0,>=3.18.0
2025-12-04T09:42:11.3523998Z   Downloading marshmallow-3.26.1-py3-none-any.whl (50 kB)
2025-12-04T09:42:11.3781840Z Collecting typing-inspect<1,>=0.4.0
2025-12-04T09:42:11.3820490Z   Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)
2025-12-04T09:42:11.4451680Z Collecting packaging>=17.0
2025-12-04T09:42:11.4489518Z   Downloading packaging-25.0-py3-none-any.whl (66 kB)
2025-12-04T09:42:11.4755715Z Collecting mypy-extensions>=0.3.0
2025-12-04T09:42:11.4792245Z   Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB)
2025-12-04T09:42:11.5337970Z Collecting typing-extensions>=3.7.4
2025-12-04T09:42:11.5376870Z   Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
2025-12-04T09:42:11.6395058Z Installing collected packages: typing-extensions, packaging, mypy-extensions, typing-inspect, marshmallow, psutil, nvidia-ml-py, dataclasses-json
2025-12-04T09:42:11.9634855Z Successfully installed dataclasses-json-0.6.7 marshmallow-3.26.1 mypy-extensions-1.1.0 nvidia-ml-py-11.525.84 packaging-25.0 psutil-5.9.8 typing-extensions-4.15.0 typing-inspect-0.9.0
2025-12-04T09:42:12.1839988Z Prepare all required actions
2025-12-04T09:42:12.1840448Z Getting action download info
2025-12-04T09:42:12.3833948Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6)
2025-12-04T09:42:12.6477573Z Download action repository 'actions/download-artifact@v4' (SHA:d3f86a106a0bac45b974a628896c90dbdf5c8093)
2025-12-04T09:42:13.0263440Z ##[group]Run ./.github/actions/download-build-artifacts
2025-12-04T09:42:13.0264197Z with:
2025-12-04T09:42:13.0264696Z   name: linux-jammy-cuda12.4-py3.10-gcc11
2025-12-04T09:42:13.0265376Z   s3-bucket: gha-artifacts
2025-12-04T09:42:13.0265899Z env:
2025-12-04T09:42:13.0266319Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:13.0266861Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:13.0267512Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:13.0268259Z ##[endgroup]
2025-12-04T09:42:13.0325414Z ##[group]Run seemethere/download-artifact-s3@v4
2025-12-04T09:42:13.0326177Z with:
2025-12-04T09:42:13.0326756Z   name: linux-jammy-cuda12.4-py3.10-gcc11
2025-12-04T09:42:13.0327422Z   s3-bucket: gha-artifacts
2025-12-04T09:42:13.0327936Z   region: us-east-1
2025-12-04T09:42:13.0328409Z env:
2025-12-04T09:42:13.0328822Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:13.0329370Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:13.0330038Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:13.0330787Z ##[endgroup]
2025-12-04T09:42:13.5540172Z (node:68798) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.
2025-12-04T09:42:13.5540819Z 
2025-12-04T09:42:13.5541108Z Please migrate your code to use AWS SDK for JavaScript (v3).
2025-12-04T09:42:13.5541733Z For more information, check the migration guide at https://a.co/7PzMCcy
2025-12-04T09:42:13.5542396Z (Use `node --trace-warnings ...` to show where the warning was created)
2025-12-04T09:42:13.7659879Z Found 1 objects with prefix pytorch/pytorch/19922826259/linux-jammy-cuda12.4-py3.10-gcc11/
2025-12-04T09:42:13.7660776Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip
2025-12-04T09:42:20.4963200Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip
2025-12-04T09:42:20.4969636Z Artifact download has finished successfully
2025-12-04T09:42:20.5171981Z ##[group]Run unzip -o artifacts.zip
2025-12-04T09:42:20.5172382Z [36;1munzip -o artifacts.zip[0m
2025-12-04T09:42:20.5179428Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:20.5179879Z env:
2025-12-04T09:42:20.5180126Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:20.5180445Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:20.5180814Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:20.5181229Z ##[endgroup]
2025-12-04T09:42:20.5250842Z Archive:  artifacts.zip
2025-12-04T09:42:20.5252375Z    creating: dist/
2025-12-04T09:42:22.5370331Z   inflating: dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl  
2025-12-04T09:42:22.5513992Z   inflating: dist/.ninja_log         
2025-12-04T09:42:22.5514746Z    creating: build/custom_test_artifacts/
2025-12-04T09:42:22.5515268Z    creating: build/custom_test_artifacts/custom-op-build/
2025-12-04T09:42:22.5515858Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/
2025-12-04T09:42:22.5516589Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/pkgRedirects/
2025-12-04T09:42:22.5524188Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeConfigureLog.yaml  
2025-12-04T09:42:22.5525008Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/
2025-12-04T09:42:22.5525811Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeSystem.cmake  
2025-12-04T09:42:22.5526672Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/
2025-12-04T09:42:22.5527497Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/tmp/
2025-12-04T09:42:22.5528798Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c  
2025-12-04T09:42:22.5530079Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/a.out  
2025-12-04T09:42:22.5531007Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake  
2025-12-04T09:42:22.5532042Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/
2025-12-04T09:42:22.5532888Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/
2025-12-04T09:42:22.5534285Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2025-12-04T09:42:22.5535818Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out  
2025-12-04T09:42:22.5536973Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake  
2025-12-04T09:42:22.5538741Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin  
2025-12-04T09:42:22.5540682Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin  
2025-12-04T09:42:22.5541655Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/
2025-12-04T09:42:22.5542522Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/
2025-12-04T09:42:22.5604344Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii  
2025-12-04T09:42:22.5669015Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp  
2025-12-04T09:42:22.5670292Z  extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id  
2025-12-04T09:42:22.5737805Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii  
2025-12-04T09:42:22.5739055Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c  
2025-12-04T09:42:22.5740303Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu  
2025-12-04T09:42:22.5741599Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c  
2025-12-04T09:42:22.5742847Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx  
2025-12-04T09:42:22.5744080Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin  
2025-12-04T09:42:22.5745303Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin  
2025-12-04T09:42:22.5746528Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c  
2025-12-04T09:42:22.5747722Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o  
2025-12-04T09:42:22.5748859Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin  
2025-12-04T09:42:22.5749953Z  extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c  
2025-12-04T09:42:22.5751016Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin  
2025-12-04T09:42:22.5752096Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c  
2025-12-04T09:42:22.5753145Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o  
2025-12-04T09:42:22.5754454Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu  
2025-12-04T09:42:22.5832755Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out  
2025-12-04T09:42:22.5833739Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake  
2025-12-04T09:42:22.5916743Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin  
2025-12-04T09:42:22.5917685Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeScratch/
2025-12-04T09:42:22.5918417Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/
2025-12-04T09:42:22.5919198Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache  
2025-12-04T09:42:22.5919997Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/
2025-12-04T09:42:22.5920892Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts  
2025-12-04T09:42:22.5921901Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make  
2025-12-04T09:42:22.5922873Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make  
2025-12-04T09:42:22.5923779Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt  
2025-12-04T09:42:22.5924705Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake  
2025-12-04T09:42:22.5925644Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make  
2025-12-04T09:42:22.5926582Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake  
2025-12-04T09:42:22.5927520Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make  
2025-12-04T09:42:22.5928447Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make  
2025-12-04T09:42:22.5946761Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d  
2025-12-04T09:42:22.6165476Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o  
2025-12-04T09:42:22.6166390Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/
2025-12-04T09:42:22.6167345Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts  
2025-12-04T09:42:22.6168402Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make  
2025-12-04T09:42:22.6169426Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make  
2025-12-04T09:42:22.6170377Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt  
2025-12-04T09:42:22.6171523Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake  
2025-12-04T09:42:22.6172511Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make  
2025-12-04T09:42:22.6173501Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake  
2025-12-04T09:42:22.6174494Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make  
2025-12-04T09:42:22.6175467Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make  
2025-12-04T09:42:22.6195126Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d  
2025-12-04T09:42:22.6285320Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o  
2025-12-04T09:42:22.6286616Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2025-12-04T09:42:22.6287570Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt  
2025-12-04T09:42:22.6288408Z  extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks  
2025-12-04T09:42:22.6289196Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2  
2025-12-04T09:42:22.6290108Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake  
2025-12-04T09:42:22.6290889Z   inflating: build/custom_test_artifacts/custom-op-build/detect_cuda_version.cc  
2025-12-04T09:42:22.6293237Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt  
2025-12-04T09:42:22.6294057Z   inflating: build/custom_test_artifacts/custom-op-build/Makefile  
2025-12-04T09:42:22.6294770Z   inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake  
2025-12-04T09:42:22.6485923Z   inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so  
2025-12-04T09:42:22.6548506Z   inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops  
2025-12-04T09:42:22.6549119Z    creating: build/custom_test_artifacts/jit-hook-build/
2025-12-04T09:42:22.6549684Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/
2025-12-04T09:42:22.6550369Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/pkgRedirects/
2025-12-04T09:42:22.6557880Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeConfigureLog.yaml  
2025-12-04T09:42:22.6558678Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/
2025-12-04T09:42:22.6559463Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeSystem.cmake  
2025-12-04T09:42:22.6560303Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/
2025-12-04T09:42:22.6561109Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/tmp/
2025-12-04T09:42:22.6562173Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c  
2025-12-04T09:42:22.6563717Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/a.out  
2025-12-04T09:42:22.6564966Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake  
2025-12-04T09:42:22.6565898Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/
2025-12-04T09:42:22.6566746Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/
2025-12-04T09:42:22.6567980Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2025-12-04T09:42:22.6569478Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out  
2025-12-04T09:42:22.6570508Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake  
2025-12-04T09:42:22.6572532Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin  
2025-12-04T09:42:22.6574446Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin  
2025-12-04T09:42:22.6575528Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/
2025-12-04T09:42:22.6576462Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/
2025-12-04T09:42:22.6638544Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii  
2025-12-04T09:42:22.6703468Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp  
2025-12-04T09:42:22.6704731Z  extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id  
2025-12-04T09:42:22.6772259Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii  
2025-12-04T09:42:22.6773489Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c  
2025-12-04T09:42:22.6774737Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu  
2025-12-04T09:42:22.6776156Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c  
2025-12-04T09:42:22.6777450Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx  
2025-12-04T09:42:22.6778663Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin  
2025-12-04T09:42:22.6779891Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin  
2025-12-04T09:42:22.6781106Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c  
2025-12-04T09:42:22.6782293Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o  
2025-12-04T09:42:22.6783401Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin  
2025-12-04T09:42:22.6784494Z  extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c  
2025-12-04T09:42:22.6785557Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin  
2025-12-04T09:42:22.6786620Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c  
2025-12-04T09:42:22.6787645Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o  
2025-12-04T09:42:22.6788701Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu  
2025-12-04T09:42:22.6866770Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out  
2025-12-04T09:42:22.6867932Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake  
2025-12-04T09:42:22.6949330Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin  
2025-12-04T09:42:22.6950256Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeScratch/
2025-12-04T09:42:22.6950986Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/
2025-12-04T09:42:22.6951748Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache  
2025-12-04T09:42:22.6952555Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/
2025-12-04T09:42:22.6953476Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts  
2025-12-04T09:42:22.6954528Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make  
2025-12-04T09:42:22.6955525Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make  
2025-12-04T09:42:22.6956456Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt  
2025-12-04T09:42:22.6957405Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake  
2025-12-04T09:42:22.6958372Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make  
2025-12-04T09:42:22.6959337Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake  
2025-12-04T09:42:22.6960302Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make  
2025-12-04T09:42:22.6961441Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make  
2025-12-04T09:42:22.6979285Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d  
2025-12-04T09:42:22.7048525Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o  
2025-12-04T09:42:22.7049779Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2025-12-04T09:42:22.7050701Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt  
2025-12-04T09:42:22.7051540Z  extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks  
2025-12-04T09:42:22.7052304Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2  
2025-12-04T09:42:22.7053058Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake  
2025-12-04T09:42:22.7053817Z   inflating: build/custom_test_artifacts/jit-hook-build/detect_cuda_version.cc  
2025-12-04T09:42:22.7056553Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt  
2025-12-04T09:42:22.7057415Z   inflating: build/custom_test_artifacts/jit-hook-build/Makefile  
2025-12-04T09:42:22.7058156Z   inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake  
2025-12-04T09:42:22.7100857Z   inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks  
2025-12-04T09:42:22.7101528Z    creating: build/custom_test_artifacts/custom-backend-build/
2025-12-04T09:42:22.7102163Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/
2025-12-04T09:42:22.7102911Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/pkgRedirects/
2025-12-04T09:42:22.7110824Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeConfigureLog.yaml  
2025-12-04T09:42:22.7111687Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/
2025-12-04T09:42:22.7112548Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeSystem.cmake  
2025-12-04T09:42:22.7113468Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/
2025-12-04T09:42:22.7114343Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/tmp/
2025-12-04T09:42:22.7115372Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c  
2025-12-04T09:42:22.7116604Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/a.out  
2025-12-04T09:42:22.7117576Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake  
2025-12-04T09:42:22.7118505Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/
2025-12-04T09:42:22.7119419Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/
2025-12-04T09:42:22.7120901Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2025-12-04T09:42:22.7122460Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out  
2025-12-04T09:42:22.7123537Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake  
2025-12-04T09:42:22.7125162Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin  
2025-12-04T09:42:22.7127232Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin  
2025-12-04T09:42:22.7128278Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/
2025-12-04T09:42:22.7129212Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/
2025-12-04T09:42:22.7191176Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii  
2025-12-04T09:42:22.7255140Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp  
2025-12-04T09:42:22.7256534Z  extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id  
2025-12-04T09:42:22.7324219Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii  
2025-12-04T09:42:22.7325525Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c  
2025-12-04T09:42:22.7326848Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu  
2025-12-04T09:42:22.7328220Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c  
2025-12-04T09:42:22.7329553Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx  
2025-12-04T09:42:22.7330846Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin  
2025-12-04T09:42:22.7332126Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin  
2025-12-04T09:42:22.7333438Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c  
2025-12-04T09:42:22.7334699Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o  
2025-12-04T09:42:22.7335895Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin  
2025-12-04T09:42:22.7337117Z  extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c  
2025-12-04T09:42:22.7338259Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin  
2025-12-04T09:42:22.7339411Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c  
2025-12-04T09:42:22.7340517Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o  
2025-12-04T09:42:22.7341655Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu  
2025-12-04T09:42:22.7419027Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out  
2025-12-04T09:42:22.7420050Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake  
2025-12-04T09:42:22.7501046Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin  
2025-12-04T09:42:22.7502037Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeScratch/
2025-12-04T09:42:22.7502834Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/
2025-12-04T09:42:22.7503664Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache  
2025-12-04T09:42:22.7504543Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/
2025-12-04T09:42:22.7505523Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts  
2025-12-04T09:42:22.7506667Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make  
2025-12-04T09:42:22.7507755Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make  
2025-12-04T09:42:22.7508761Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt  
2025-12-04T09:42:22.7509984Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake  
2025-12-04T09:42:22.7511039Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make  
2025-12-04T09:42:22.7512096Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake  
2025-12-04T09:42:22.7513233Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make  
2025-12-04T09:42:22.7514256Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make  
2025-12-04T09:42:22.7515364Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d  
2025-12-04T09:42:22.7645098Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o  
2025-12-04T09:42:22.7646150Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/
2025-12-04T09:42:22.7647185Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts  
2025-12-04T09:42:22.7648358Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make  
2025-12-04T09:42:22.7649499Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make  
2025-12-04T09:42:22.7650561Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt  
2025-12-04T09:42:22.7651646Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake  
2025-12-04T09:42:22.7652759Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make  
2025-12-04T09:42:22.7653871Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake  
2025-12-04T09:42:22.7654977Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make  
2025-12-04T09:42:22.7656066Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make  
2025-12-04T09:42:22.7674598Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d  
2025-12-04T09:42:22.7735212Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o  
2025-12-04T09:42:22.7736432Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2025-12-04T09:42:22.7737446Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt  
2025-12-04T09:42:22.7738338Z  extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks  
2025-12-04T09:42:22.7739183Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2  
2025-12-04T09:42:22.7740001Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake  
2025-12-04T09:42:22.7797009Z   inflating: build/custom_test_artifacts/custom-backend-build/detect_cuda_version.cc  
2025-12-04T09:42:22.7797840Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt  
2025-12-04T09:42:22.7798552Z   inflating: build/custom_test_artifacts/custom-backend-build/Makefile  
2025-12-04T09:42:22.7799269Z   inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake  
2025-12-04T09:42:22.7857997Z   inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so  
2025-12-04T09:42:22.7902113Z   inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend  
2025-12-04T09:42:22.7902685Z    creating: build/lib/
2025-12-04T09:42:22.7993629Z   inflating: build/lib/libprotobuf-lite.a  
2025-12-04T09:42:22.8482852Z   inflating: build/lib/libprotobuf.a  
2025-12-04T09:42:22.9030052Z   inflating: build/lib/libprotoc.a   
2025-12-04T09:42:22.9041096Z   inflating: build/lib/libpthreadpool.a  
2025-12-04T09:42:22.9050247Z   inflating: build/lib/libcpuinfo.a  
2025-12-04T09:42:22.9059104Z   inflating: build/lib/libcpuinfo_internals.a  
2025-12-04T09:42:22.9060119Z   inflating: build/lib/libclog.a     
2025-12-04T09:42:22.9081472Z   inflating: build/lib/libpytorch_qnnpack.a  
2025-12-04T09:42:22.9084173Z   inflating: build/lib/libnnpack_reference_layers.a  
2025-12-04T09:42:22.9104149Z   inflating: build/lib/libnnpack.a   
2025-12-04T09:42:22.9310599Z   inflating: build/lib/libmicrokernels-prod.a  
2025-12-04T09:42:23.0281747Z   inflating: build/lib/libmicrokernels-all.a  
2025-12-04T09:42:23.0359030Z   inflating: build/lib/libgtest.a    
2025-12-04T09:42:23.0378583Z   inflating: build/lib/libgmock.a    
2025-12-04T09:42:23.0379454Z   inflating: build/lib/libgtest_main.a  
2025-12-04T09:42:23.0380307Z   inflating: build/lib/libgmock_main.a  
2025-12-04T09:42:23.0480453Z   inflating: build/lib/libXNNPACK.a  
2025-12-04T09:42:23.0563954Z   inflating: build/lib/libbenchmark.a  
2025-12-04T09:42:23.0564865Z   inflating: build/lib/libbenchmark_main.a  
2025-12-04T09:42:23.0573852Z   inflating: build/lib/libittnotify.a  
2025-12-04T09:42:23.0646856Z   inflating: build/lib/libasmjit.a   
2025-12-04T09:42:23.0647771Z   inflating: build/lib/libjitprofiling.a  
2025-12-04T09:42:23.1931774Z   inflating: build/lib/libfbgemm.a   
2025-12-04T09:42:23.1965328Z   inflating: build/lib/libtensorpipe_uv.a  
2025-12-04T09:42:23.2561927Z   inflating: build/lib/libtensorpipe.a  
2025-12-04T09:42:23.2830949Z   inflating: build/lib/libtensorpipe_cuda.a  
2025-12-04T09:42:23.2980104Z   inflating: build/lib/libgloo.a     
2025-12-04T09:42:23.3031933Z   inflating: build/lib/libonnx_proto.a  
2025-12-04T09:42:23.3502561Z   inflating: build/lib/libgloo_cuda.a  
2025-12-04T09:42:23.4285990Z   inflating: build/lib/libonnx.a     
2025-12-04T09:42:24.5376582Z   inflating: build/lib/libdnnl.a     
2025-12-04T09:42:24.5397945Z   inflating: build/lib/libfmt.a      
2025-12-04T09:42:24.5927159Z   inflating: build/lib/libkineto.a   
2025-12-04T09:42:24.6056523Z   inflating: build/lib/libc10.so     
2025-12-04T09:42:24.6111632Z   inflating: build/lib/libc10_cuda.so  
2025-12-04T09:42:24.6113225Z   inflating: build/lib/libtorch_global_deps.so  
2025-12-04T09:42:24.6115200Z   inflating: build/lib/libcaffe2_nvrtc.so  
2025-12-04T09:42:28.0275579Z   inflating: build/lib/libtorch_cpu.so  
2025-12-04T09:42:29.8261881Z   inflating: build/lib/libtorch_cuda.so  
2025-12-04T09:42:29.8266535Z   inflating: build/lib/libshm.so     
2025-12-04T09:42:29.8268162Z   inflating: build/lib/libtorch.so   
2025-12-04T09:42:29.8321744Z   inflating: build/lib/libtorch_cuda_linalg.so  
2025-12-04T09:42:29.8324463Z   inflating: build/lib/libc10d_cuda_test.so  
2025-12-04T09:42:29.8403501Z   inflating: build/lib/libtorchbind_test.so  
2025-12-04T09:42:29.8424908Z   inflating: build/lib/libjitbackend_test.so  
2025-12-04T09:42:29.8451314Z   inflating: build/lib/libbackend_with_compiler.so  
2025-12-04T09:42:29.8480641Z   inflating: build/lib/libaoti_custom_ops.so  
2025-12-04T09:42:30.1114245Z   inflating: build/lib/libtorch_python.so  
2025-12-04T09:42:30.1155123Z   inflating: build/lib/libnnapi_backend.so  
2025-12-04T09:42:30.1155572Z    creating: build/bin/
2025-12-04T09:42:30.1664713Z   inflating: build/bin/protoc-3.13.0.0  
2025-12-04T09:42:30.2173889Z   inflating: build/bin/protoc        
2025-12-04T09:42:30.2240208Z   inflating: build/bin/c10_AllocatorConfig_test  
2025-12-04T09:42:30.2302411Z   inflating: build/bin/c10_CompileTimeFunctionPointer_test  
2025-12-04T09:42:30.2366028Z   inflating: build/bin/c10_DeviceGuard_test  
2025-12-04T09:42:30.2430186Z   inflating: build/bin/c10_Device_test  
2025-12-04T09:42:30.2503390Z   inflating: build/bin/c10_DispatchKeySet_test  
2025-12-04T09:42:30.2564156Z   inflating: build/bin/c10_StreamGuard_test  
2025-12-04T09:42:30.2631215Z   inflating: build/bin/c10_Scalar_test  
2025-12-04T09:42:30.2700232Z   inflating: build/bin/c10_SymInt_test  
2025-12-04T09:42:30.2768876Z   inflating: build/bin/c10_InlineStreamGuard_test  
2025-12-04T09:42:30.2835921Z   inflating: build/bin/c10_InlineDeviceGuard_test  
2025-12-04T09:42:30.2904974Z   inflating: build/bin/c10_SizesAndStrides_test  
2025-12-04T09:42:30.2966509Z   inflating: build/bin/c10_ArrayRef_test  
2025-12-04T09:42:30.3027304Z   inflating: build/bin/c10_ConstexprCrc_test  
2025-12-04T09:42:30.3112908Z   inflating: build/bin/c10_cow_test  
2025-12-04T09:42:30.3178010Z   inflating: build/bin/c10_Bitset_test  
2025-12-04T09:42:30.3239707Z   inflating: build/bin/c10_DeadlockDetection_test  
2025-12-04T09:42:30.3309497Z   inflating: build/bin/c10_Enumerate_test  
2025-12-04T09:42:30.3372370Z   inflating: build/bin/c10_Half_test  
2025-12-04T09:42:30.3437895Z   inflating: build/bin/c10_IntrusiveList_test  
2025-12-04T09:42:30.3503876Z   inflating: build/bin/c10_NetworkFlow_test  
2025-12-04T09:42:30.3572592Z   inflating: build/bin/c10_LeftRight_test  
2025-12-04T09:42:30.3634346Z   inflating: build/bin/c10_Synchronized_test  
2025-12-04T09:42:30.3696710Z   inflating: build/bin/c10_Semaphore_test  
2025-12-04T09:42:30.3764684Z   inflating: build/bin/c10_ThreadLocal_test  
2025-12-04T09:42:30.3828842Z   inflating: build/bin/c10_TypeIndex_test  
2025-12-04T09:42:30.3892856Z   inflating: build/bin/c10_accumulate_test  
2025-12-04T09:42:30.3961325Z   inflating: build/bin/c10_bfloat16_test  
2025-12-04T09:42:30.4030936Z   inflating: build/bin/c10_complex_math_test  
2025-12-04T09:42:30.4093219Z   inflating: build/bin/c10_bit_cast_test  
2025-12-04T09:42:30.4154691Z   inflating: build/bin/c10_error_test  
2025-12-04T09:42:30.4222562Z   inflating: build/bin/c10_complex_test  
2025-12-04T09:42:30.4287506Z   inflating: build/bin/c10_exception_test  
2025-12-04T09:42:30.4349440Z   inflating: build/bin/c10_flags_test  
2025-12-04T09:42:30.4411800Z   inflating: build/bin/c10_generic_math_test  
2025-12-04T09:42:30.4596277Z   inflating: build/bin/c10_intrusive_ptr_test  
2025-12-04T09:42:30.4659325Z   inflating: build/bin/c10_irange_test  
2025-12-04T09:42:30.4725386Z   inflating: build/bin/c10_lazy_test  
2025-12-04T09:42:30.4795644Z   inflating: build/bin/c10_logging_test  
2025-12-04T09:42:30.4857457Z   inflating: build/bin/c10_nofatal_test  
2025-12-04T09:42:30.4948125Z   inflating: build/bin/c10_optional_test  
2025-12-04T09:42:30.5023936Z   inflating: build/bin/c10_ordered_preserving_dict_test  
2025-12-04T09:42:30.5089548Z   inflating: build/bin/c10_registry_test  
2025-12-04T09:42:30.5268849Z   inflating: build/bin/c10_small_vector_test  
2025-12-04T09:42:30.5332450Z   inflating: build/bin/c10_ssize_test  
2025-12-04T09:42:30.5402594Z   inflating: build/bin/c10_string_util_test  
2025-12-04T09:42:30.5456418Z   inflating: build/bin/c10_intrusive_ptr_benchmark  
2025-12-04T09:42:30.5519086Z   inflating: build/bin/c10_tempfile_test  
2025-12-04T09:42:30.5579765Z   inflating: build/bin/c10_string_view_test  
2025-12-04T09:42:30.5648848Z   inflating: build/bin/c10_typeid_test  
2025-12-04T09:42:30.5714255Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_thread_and_block_and_device  
2025-12-04T09:42:30.5779815Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_multiple_blocks  
2025-12-04T09:42:30.5843761Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_from_2_processes  
2025-12-04T09:42:30.5911750Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_blocks_and_threads  
2025-12-04T09:42:30.5972934Z   inflating: build/bin/c10_cuda_CUDATest  
2025-12-04T09:42:30.6038580Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_stream  
2025-12-04T09:42:30.6104071Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_1_var_test  
2025-12-04T09:42:30.6169548Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_same_block  
2025-12-04T09:42:30.6837972Z   inflating: build/bin/vec_test_all_types_DEFAULT  
2025-12-04T09:42:30.7525938Z   inflating: build/bin/vec_test_all_types_AVX512  
2025-12-04T09:42:30.8223455Z   inflating: build/bin/vec_test_all_types_AVX2  
2025-12-04T09:42:30.8284792Z   inflating: build/bin/test_vec_half_DEFAULT  
2025-12-04T09:42:30.8401717Z   inflating: build/bin/test_aoti_abi_check  
2025-12-04T09:42:30.8463418Z   inflating: build/bin/test_vec_half_AVX512  
2025-12-04T09:42:30.8525879Z   inflating: build/bin/test_vec_half_AVX2  
2025-12-04T09:42:30.8591123Z   inflating: build/bin/BackoffTest   
2025-12-04T09:42:30.8656703Z   inflating: build/bin/FileStoreTest  
2025-12-04T09:42:30.8726755Z   inflating: build/bin/TCPStoreTest  
2025-12-04T09:42:30.8793262Z   inflating: build/bin/HashStoreTest  
2025-12-04T09:42:30.8808848Z   inflating: build/bin/ProcessGroupMPITest  
2025-12-04T09:42:30.8812976Z   inflating: build/bin/torch_shm_manager  
2025-12-04T09:42:30.8902298Z   inflating: build/bin/Dict_test     
2025-12-04T09:42:30.8966966Z   inflating: build/bin/Dimname_test  
2025-12-04T09:42:30.9046178Z   inflating: build/bin/MaybeOwned_test  
2025-12-04T09:42:30.9116081Z   inflating: build/bin/NamedTensor_test  
2025-12-04T09:42:30.9188351Z   inflating: build/bin/apply_utils_test  
2025-12-04T09:42:30.9260214Z   inflating: build/bin/atest         
2025-12-04T09:42:30.9338353Z   inflating: build/bin/basic         
2025-12-04T09:42:30.9405257Z   inflating: build/bin/broadcast_test  
2025-12-04T09:42:30.9467807Z   inflating: build/bin/cpu_allocator_test  
2025-12-04T09:42:30.9539214Z   inflating: build/bin/cpu_generator_test  
2025-12-04T09:42:30.9604387Z   inflating: build/bin/cpu_profiling_allocator_test  
2025-12-04T09:42:30.9714582Z   inflating: build/bin/cpu_rng_test  
2025-12-04T09:42:30.9777726Z   inflating: build/bin/dlconvertor_test  
2025-12-04T09:42:30.9848400Z   inflating: build/bin/extension_backend_test  
2025-12-04T09:42:30.9917004Z   inflating: build/bin/half_test     
2025-12-04T09:42:31.0033791Z   inflating: build/bin/ivalue_test   
2025-12-04T09:42:31.0096236Z   inflating: build/bin/lazy_tensor_test  
2025-12-04T09:42:31.0161674Z   inflating: build/bin/math_kernel_test  
2025-12-04T09:42:31.0227599Z   inflating: build/bin/memory_format_test  
2025-12-04T09:42:31.0293730Z   inflating: build/bin/memory_overlapping_test  
2025-12-04T09:42:31.0359618Z   inflating: build/bin/mobile_memory_cleanup  
2025-12-04T09:42:31.0428563Z   inflating: build/bin/native_test   
2025-12-04T09:42:31.0491375Z   inflating: build/bin/operator_name_test  
2025-12-04T09:42:31.0554152Z   inflating: build/bin/operators_test  
2025-12-04T09:42:31.0618844Z   inflating: build/bin/packedtensoraccessor_test  
2025-12-04T09:42:31.0701304Z   inflating: build/bin/pow_test      
2025-12-04T09:42:31.0770518Z   inflating: build/bin/quantized_test  
2025-12-04T09:42:31.0832565Z   inflating: build/bin/reduce_ops_test  
2025-12-04T09:42:31.0896011Z   inflating: build/bin/reportMemoryUsage_test  
2025-12-04T09:42:31.0964480Z   inflating: build/bin/scalar_tensor_test  
2025-12-04T09:42:31.1035407Z   inflating: build/bin/scalar_test   
2025-12-04T09:42:31.1099183Z   inflating: build/bin/StorageUtils_test  
2025-12-04T09:42:31.1163540Z   inflating: build/bin/stride_properties_test  
2025-12-04T09:42:31.1258596Z   inflating: build/bin/tensor_iterator_test  
2025-12-04T09:42:31.1325652Z   inflating: build/bin/test_parallel  
2025-12-04T09:42:31.1388249Z   inflating: build/bin/thread_init_test  
2025-12-04T09:42:31.1455735Z   inflating: build/bin/type_ptr_test  
2025-12-04T09:42:31.1528669Z   inflating: build/bin/type_test     
2025-12-04T09:42:31.1594156Z   inflating: build/bin/undefined_tensor_test  
2025-12-04T09:42:31.1655282Z   inflating: build/bin/verify_api_visibility  
2025-12-04T09:42:31.1741495Z   inflating: build/bin/legacy_vmap_test  
2025-12-04T09:42:31.1804744Z   inflating: build/bin/weakref_test  
2025-12-04T09:42:31.1868141Z   inflating: build/bin/wrapdim_test  
2025-12-04T09:42:31.1931495Z   inflating: build/bin/xla_tensor_test  
2025-12-04T09:42:31.2004652Z   inflating: build/bin/IListRef_test  
2025-12-04T09:42:31.2130089Z   inflating: build/bin/List_test     
2025-12-04T09:42:31.2210630Z   inflating: build/bin/KernelFunction_test  
2025-12-04T09:42:31.2352871Z   inflating: build/bin/kernel_function_legacy_test  
2025-12-04T09:42:31.2466884Z   inflating: build/bin/kernel_function_test  
2025-12-04T09:42:31.2615606Z   inflating: build/bin/kernel_lambda_legacy_test  
2025-12-04T09:42:31.2736703Z   inflating: build/bin/kernel_lambda_test  
2025-12-04T09:42:31.2810453Z   inflating: build/bin/kernel_stackbased_test  
2025-12-04T09:42:31.2924380Z   inflating: build/bin/make_boxed_from_unboxed_functor_test  
2025-12-04T09:42:31.2987533Z   inflating: build/bin/CppSignature_test  
2025-12-04T09:42:31.3055127Z   inflating: build/bin/backend_fallback_test  
2025-12-04T09:42:31.3116108Z   inflating: build/bin/op_allowlist_test  
2025-12-04T09:42:31.3472689Z   inflating: build/bin/op_registration_test  
2025-12-04T09:42:31.3554037Z   inflating: build/bin/inline_container_test  
2025-12-04T09:42:31.3620090Z   inflating: build/bin/cuda_allocator_test  
2025-12-04T09:42:31.3685827Z   inflating: build/bin/cuda_apply_test  
2025-12-04T09:42:31.3758887Z   inflating: build/bin/cuda_atomic_ops_test  
2025-12-04T09:42:31.3828520Z   inflating: build/bin/cuda_caching_host_allocator_test  
2025-12-04T09:42:31.3913052Z   inflating: build/bin/cuda_complex_math_test  
2025-12-04T09:42:31.3985733Z   inflating: build/bin/cuda_complex_test  
2025-12-04T09:42:31.4057520Z   inflating: build/bin/cuda_cub_test  
2025-12-04T09:42:31.4122841Z   inflating: build/bin/cuda_cublas_handle_pool_test  
2025-12-04T09:42:31.4184484Z   inflating: build/bin/cuda_device_test  
2025-12-04T09:42:31.4263648Z   inflating: build/bin/cuda_distributions_test  
2025-12-04T09:42:31.4327853Z   inflating: build/bin/cuda_dlconvertor_test  
2025-12-04T09:42:31.4393964Z   inflating: build/bin/cuda_event_test  
2025-12-04T09:42:31.4455490Z   inflating: build/bin/cuda_exchange_device_test  
2025-12-04T09:42:31.4525388Z   inflating: build/bin/cuda_generator_test  
2025-12-04T09:42:31.4587165Z   inflating: build/bin/cuda_half_test  
2025-12-04T09:42:31.4650653Z   inflating: build/bin/cuda_integer_divider_test  
2025-12-04T09:42:31.4712306Z   inflating: build/bin/cuda_optional_test  
2025-12-04T09:42:31.4776722Z   inflating: build/bin/cuda_packedtensoraccessor_test  
2025-12-04T09:42:31.4841672Z   inflating: build/bin/cuda_reportMemoryUsage_test  
2025-12-04T09:42:31.4903519Z   inflating: build/bin/cuda_allocatorTraceTracker_test  
2025-12-04T09:42:31.4978640Z   inflating: build/bin/cuda_stream_test  
2025-12-04T09:42:31.5043306Z   inflating: build/bin/cuda_vectorized_test  
2025-12-04T09:42:31.5105705Z   inflating: build/bin/cuda_cudnn_test  
2025-12-04T09:42:31.5508498Z   inflating: build/bin/test_lazy     
2025-12-04T09:42:31.5590524Z   inflating: build/bin/ProcessGroupGlooTest  
2025-12-04T09:42:31.5660520Z   inflating: build/bin/ProcessGroupGlooAsyncTest  
2025-12-04T09:42:31.6913569Z   inflating: build/bin/test_jit      
2025-12-04T09:42:31.6991984Z   inflating: build/bin/ProcessGroupNCCLTest  
2025-12-04T09:42:31.7067536Z   inflating: build/bin/ProcessGroupNCCLErrorsTest  
2025-12-04T09:42:31.7070758Z   inflating: build/bin/example_allreduce  
2025-12-04T09:42:31.7139246Z   inflating: build/bin/test_dist_autograd  
2025-12-04T09:42:31.7223007Z   inflating: build/bin/test_cpp_rpc  
2025-12-04T09:42:31.7225907Z   inflating: build/bin/parallel_benchmark  
2025-12-04T09:42:31.8566699Z   inflating: build/bin/test_api      
2025-12-04T09:42:31.8567474Z    creating: .additional_ci_files/
2025-12-04T09:42:31.8639237Z   inflating: .additional_ci_files/test-times.json  
2025-12-04T09:42:31.8902211Z   inflating: .additional_ci_files/test-class-times.json  
2025-12-04T09:42:31.8947340Z ##[group]Run rm artifacts.zip
2025-12-04T09:42:31.8947735Z [36;1mrm artifacts.zip[0m
2025-12-04T09:42:31.8955171Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:31.8955629Z env:
2025-12-04T09:42:31.8955898Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:31.8956206Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:31.8956810Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:31.8957253Z ##[endgroup]
2025-12-04T09:42:31.9550018Z ##[group]Run df -H
2025-12-04T09:42:31.9550349Z [36;1mdf -H[0m
2025-12-04T09:42:31.9557321Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:31.9557775Z env:
2025-12-04T09:42:31.9558037Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:31.9558528Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:31.9558897Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:31.9559321Z ##[endgroup]
2025-12-04T09:42:31.9610755Z Filesystem        Size  Used Avail Use% Mounted on
2025-12-04T09:42:31.9611220Z devtmpfs          4.2M     0  4.2M   0% /dev
2025-12-04T09:42:31.9611610Z tmpfs              34G     0   34G   0% /dev/shm
2025-12-04T09:42:31.9612007Z tmpfs              14G  562k   14G   1% /run
2025-12-04T09:42:31.9612393Z /dev/nvme0n1p1    161G   51G  111G  32% /
2025-12-04T09:42:31.9612775Z tmpfs              34G   17k   34G   1% /tmp
2025-12-04T09:42:31.9613370Z /dev/nvme0n1p128   11M  1.4M  9.2M  13% /boot/efi
2025-12-04T09:42:31.9613801Z tmpfs             6.7G     0  6.7G   0% /run/user/0
2025-12-04T09:42:31.9653074Z Prepare all required actions
2025-12-04T09:42:31.9654122Z Getting action download info
2025-12-04T09:42:32.1237741Z ##[group]Run ./.github/actions/download-td-artifacts
2025-12-04T09:42:32.1238193Z with:
2025-12-04T09:42:32.1238440Z env:
2025-12-04T09:42:32.1238676Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:32.1238995Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:32.1239368Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:32.1239774Z ##[endgroup]
2025-12-04T09:42:32.1272750Z ##[group]Run seemethere/download-artifact-s3@v4
2025-12-04T09:42:32.1273174Z with:
2025-12-04T09:42:32.1273409Z   name: td_results
2025-12-04T09:42:32.1273720Z   s3-bucket: gha-artifacts
2025-12-04T09:42:32.1274033Z   region: us-east-1
2025-12-04T09:42:32.1274283Z env:
2025-12-04T09:42:32.1274528Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:32.1274834Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:32.1275201Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:32.1275683Z ##[endgroup]
2025-12-04T09:42:32.7814762Z (node:68824) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.
2025-12-04T09:42:32.7815353Z 
2025-12-04T09:42:32.7815591Z Please migrate your code to use AWS SDK for JavaScript (v3).
2025-12-04T09:42:32.7816232Z For more information, check the migration guide at https://a.co/7PzMCcy
2025-12-04T09:42:32.7817011Z (Use `node --trace-warnings ...` to show where the warning was created)
2025-12-04T09:42:32.8934156Z Found 1 objects with prefix pytorch/pytorch/19922826259/td_results/
2025-12-04T09:42:32.8934917Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json
2025-12-04T09:42:32.9817745Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json
2025-12-04T09:42:32.9823886Z Artifact download has finished successfully
2025-12-04T09:42:33.0001408Z ##[group]Run mkdir -p .additional_ci_files
2025-12-04T09:42:33.0001867Z [36;1mmkdir -p .additional_ci_files[0m
2025-12-04T09:42:33.0002379Z [36;1mmv td_results.json .additional_ci_files/td_results.json || true[0m
2025-12-04T09:42:33.0010056Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:33.0010506Z env:
2025-12-04T09:42:33.0010764Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:33.0011079Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:33.0011432Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:33.0011858Z ##[endgroup]
2025-12-04T09:42:33.0144192Z ##[group]Run .github/scripts/parse_ref.py
2025-12-04T09:42:33.0144667Z [36;1m.github/scripts/parse_ref.py[0m
2025-12-04T09:42:33.0151278Z shell: /usr/bin/bash -e {0}
2025-12-04T09:42:33.0151604Z env:
2025-12-04T09:42:33.0151858Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:33.0152174Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:33.0152531Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:33.0152958Z ##[endgroup]
2025-12-04T09:42:33.0399015Z Setting output branch=main
2025-12-04T09:42:33.0563865Z Prepare all required actions
2025-12-04T09:42:33.0564310Z Getting action download info
2025-12-04T09:42:33.1985589Z ##[group]Run ./.github/actions/filter-test-configs
2025-12-04T09:42:33.1986009Z with:
2025-12-04T09:42:33.1986650Z   github-token: ***
2025-12-04T09:42:33.1993668Z   test-matrix: {"include": [{"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]}
2025-12-04T09:42:33.2001374Z   job-name: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable)
2025-12-04T09:42:33.2002155Z env:
2025-12-04T09:42:33.2002410Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:33.2002711Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:33.2003080Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:33.2003499Z ##[endgroup]
2025-12-04T09:42:33.2045027Z ##[group]Run nick-fields/retry@v3.0.0
2025-12-04T09:42:33.2045374Z with:
2025-12-04T09:42:33.2045617Z   shell: bash
2025-12-04T09:42:33.2045878Z   timeout_minutes: 10
2025-12-04T09:42:33.2046165Z   max_attempts: 5
2025-12-04T09:42:33.2046429Z   retry_wait_seconds: 30
2025-12-04T09:42:33.2047373Z   command: set -eux
# PyYAML 6.0 doesn't work with MacOS x86 anymore
# This must run on Python-3.7 (AmazonLinux2) so can't use request=3.32.2
python3 -m pip install requests==2.27.1 pyyaml==6.0.2

2025-12-04T09:42:33.2048381Z   polling_interval_seconds: 1
2025-12-04T09:42:33.2048699Z   warning_on_retry: true
2025-12-04T09:42:33.2049006Z   continue_on_error: false
2025-12-04T09:42:33.2049305Z env:
2025-12-04T09:42:33.2049539Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:33.2049859Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:33.2050224Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:33.2050836Z   GITHUB_TOKEN: ***
2025-12-04T09:42:33.2051095Z ##[endgroup]
2025-12-04T09:42:33.3166150Z + python3 -m pip install requests==2.27.1 pyyaml==6.0.2
2025-12-04T09:42:33.5963797Z Defaulting to user installation because normal site-packages is not writeable
2025-12-04T09:42:33.7293956Z Collecting requests==2.27.1
2025-12-04T09:42:33.7489429Z   Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB)
2025-12-04T09:42:33.9526432Z Collecting pyyaml==6.0.2
2025-12-04T09:42:33.9581732Z   Downloading PyYAML-6.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (737 kB)
2025-12-04T09:42:34.4326186Z Collecting charset-normalizer~=2.0.0
2025-12-04T09:42:34.4370809Z   Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB)
2025-12-04T09:42:34.4435092Z Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (2.10)
2025-12-04T09:42:34.4439416Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (1.25.10)
2025-12-04T09:42:34.4978566Z Collecting certifi>=2017.4.17
2025-12-04T09:42:34.5025346Z   Downloading certifi-2025.11.12-py3-none-any.whl (159 kB)
2025-12-04T09:42:34.6081533Z Installing collected packages: charset-normalizer, certifi, requests, pyyaml
2025-12-04T09:42:34.7450941Z Successfully installed certifi-2025.11.12 charset-normalizer-2.0.12 pyyaml-6.0.2 requests-2.27.1
2025-12-04T09:42:35.2951354Z Command completed after 1 attempt(s).
2025-12-04T09:42:35.3002790Z ##[group]Run set -x
2025-12-04T09:42:35.3003090Z [36;1mset -x[0m
2025-12-04T09:42:35.3003360Z [36;1m[0m
2025-12-04T09:42:35.3003842Z [36;1m# Use relative path here as this could be checked out anywhere, not necessarily[0m
2025-12-04T09:42:35.3004405Z [36;1m# in runner workspace[0m
2025-12-04T09:42:35.3004864Z [36;1mpython3 "${GITHUB_ACTION_PATH}/../../scripts/parse_ref.py"[0m
2025-12-04T09:42:35.3012661Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:35.3013111Z env:
2025-12-04T09:42:35.3013354Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:35.3013674Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:35.3014046Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:35.3014455Z ##[endgroup]
2025-12-04T09:42:35.3045808Z + python3 /home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/filter-test-configs/../../scripts/parse_ref.py
2025-12-04T09:42:35.3255165Z Setting output branch=main
2025-12-04T09:42:35.3315731Z ##[group]Run echo "Workflow: ${GITHUB_WORKFLOW}"
2025-12-04T09:42:35.3316228Z [36;1mecho "Workflow: ${GITHUB_WORKFLOW}"[0m
2025-12-04T09:42:35.3316633Z [36;1mecho "Job name: ${JOB_NAME}"[0m
2025-12-04T09:42:35.3316977Z [36;1m[0m
2025-12-04T09:42:35.3317419Z [36;1m# Use relative path here as this could be checked out anywhere, not necessarily[0m
2025-12-04T09:42:35.3317970Z [36;1m# in runner workspace[0m
2025-12-04T09:42:35.3318489Z [36;1mpython3 "${GITHUB_ACTION_PATH}/../../scripts/filter_test_configs.py" \[0m
2025-12-04T09:42:35.3319059Z [36;1m  --workflow "${GITHUB_WORKFLOW}" \[0m
2025-12-04T09:42:35.3319455Z [36;1m  --job-name "${JOB_NAME}" \[0m
2025-12-04T09:42:35.3326719Z [36;1m  --test-matrix "{"include": [{"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]}" \[0m
2025-12-04T09:42:35.3334255Z [36;1m  --selected-test-configs "" \[0m
2025-12-04T09:42:35.3334656Z [36;1m  --pr-number "${PR_NUMBER}" \[0m
2025-12-04T09:42:35.3335025Z [36;1m  --tag "${TAG}" \[0m
2025-12-04T09:42:35.3335366Z [36;1m  --event-name "${EVENT_NAME}" \[0m
2025-12-04T09:42:35.3335730Z [36;1m  --schedule "${SCHEDULE}" \[0m
2025-12-04T09:42:35.3336095Z [36;1m  --branch "${HEAD_BRANCH}"[0m
2025-12-04T09:42:35.3342972Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:35.3343406Z env:
2025-12-04T09:42:35.3343660Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:35.3343978Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:35.3344330Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:35.3345047Z   GITHUB_TOKEN: ***
2025-12-04T09:42:35.3345767Z   JOB_NAME: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable)
2025-12-04T09:42:35.3346562Z   PR_NUMBER: 
2025-12-04T09:42:35.3346801Z   TAG: 
2025-12-04T09:42:35.3347058Z   EVENT_NAME: schedule
2025-12-04T09:42:35.3347348Z   SCHEDULE: 29 8 * * *
2025-12-04T09:42:35.3347617Z   HEAD_BRANCH: main
2025-12-04T09:42:35.3347884Z ##[endgroup]
2025-12-04T09:42:35.3375234Z Workflow: periodic
2025-12-04T09:42:35.3375999Z Job name: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable)
2025-12-04T09:42:35.5382421Z Setting output keep-going=True
2025-12-04T09:42:35.5382852Z Setting output ci-verbose-test-logs=False
2025-12-04T09:42:35.5383257Z Setting output ci-test-showlocals=False
2025-12-04T09:42:35.5383652Z Setting output ci-no-test-timeout=False
2025-12-04T09:42:35.5384033Z Setting output ci-no-td=False
2025-12-04T09:42:35.5384416Z Setting output ci-td-distributed=False
2025-12-04T09:42:35.5384797Z Setting output is-unstable=True
2025-12-04T09:42:35.5385150Z Setting output reenabled-issues=
2025-12-04T09:42:35.5400908Z Setting output test-matrix={"include": [{"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]}
2025-12-04T09:42:35.5416917Z Setting output is-test-matrix-empty=False
2025-12-04T09:42:35.5537286Z ##[group]Run echo "Filtered matrix:"
2025-12-04T09:42:35.5537756Z [36;1mecho "Filtered matrix:"[0m
2025-12-04T09:42:35.5553429Z [36;1mecho "{"include": [{"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]}"[0m
2025-12-04T09:42:35.5569414Z [36;1m[0m
2025-12-04T09:42:35.5569666Z [36;1mecho[0m
2025-12-04T09:42:35.5569986Z [36;1mecho "Is the current job unstable? True"[0m
2025-12-04T09:42:35.5570378Z [36;1m[0m
2025-12-04T09:42:35.5570618Z [36;1mecho[0m
2025-12-04T09:42:35.5570919Z [36;1mecho "Is keep-going label set? True"[0m
2025-12-04T09:42:35.5571643Z [36;1m[0m
2025-12-04T09:42:35.5571892Z [36;1mecho[0m
2025-12-04T09:42:35.5572173Z [36;1mecho "Reenabled issues? "[0m
2025-12-04T09:42:35.5579238Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:35.5579691Z env:
2025-12-04T09:42:35.5579951Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:35.5580254Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:35.5580625Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:35.5581050Z ##[endgroup]
2025-12-04T09:42:35.5608647Z Filtered matrix:
2025-12-04T09:42:35.5628017Z {include: [{config: legacy_nvidia_driver, shard: 1, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable}, {config: legacy_nvidia_driver, shard: 1, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: legacy_nvidia_driver, shard: 1, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: legacy_nvidia_driver, shard: 1, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: legacy_nvidia_driver, shard: 2, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable}, {config: legacy_nvidia_driver, shard: 2, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: legacy_nvidia_driver, shard: 2, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: legacy_nvidia_driver, shard: 2, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: legacy_nvidia_driver, shard: 3, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable}, {config: legacy_nvidia_driver, shard: 3, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: legacy_nvidia_driver, shard: 3, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: legacy_nvidia_driver, shard: 3, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: legacy_nvidia_driver, shard: 4, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable}, {config: legacy_nvidia_driver, shard: 4, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: legacy_nvidia_driver, shard: 4, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: legacy_nvidia_driver, shard: 4, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: legacy_nvidia_driver, shard: 5, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable}, {config: legacy_nvidia_driver, shard: 5, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: legacy_nvidia_driver, shard: 5, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: legacy_nvidia_driver, shard: 5, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}]}
2025-12-04T09:42:35.5643534Z 
2025-12-04T09:42:35.5643686Z Is the current job unstable? True
2025-12-04T09:42:35.5643938Z 
2025-12-04T09:42:35.5644067Z Is keep-going label set? True
2025-12-04T09:42:35.5644285Z 
2025-12-04T09:42:35.5644414Z Reenabled issues? 
2025-12-04T09:42:35.5681867Z ##[group]Run echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}"
2025-12-04T09:42:35.5682500Z [36;1mecho "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:42:35.5689300Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:35.5689734Z env:
2025-12-04T09:42:35.5689991Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:35.5690310Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:35.5690668Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:35.5691091Z   JOB_TIMEOUT: 600
2025-12-04T09:42:35.5691363Z ##[endgroup]
2025-12-04T09:42:35.5745465Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"
2025-12-04T09:42:35.5746099Z [36;1menv | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T09:42:35.5746641Z [36;1menv | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T09:42:35.5753070Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:35.5753520Z env:
2025-12-04T09:42:35.5753771Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:35.5754073Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:35.5754443Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:35.5754864Z ##[endgroup]
2025-12-04T09:42:35.5864726Z ##[group]Run set -x
2025-12-04T09:42:35.5865126Z [36;1mset -x[0m
2025-12-04T09:42:35.5865392Z [36;1m[0m
2025-12-04T09:42:35.5865687Z [36;1mif [[ $TEST_CONFIG == 'multigpu' ]]; then[0m
2025-12-04T09:42:35.5866137Z [36;1m  TEST_COMMAND=.ci/pytorch/multigpu-test.sh[0m
2025-12-04T09:42:35.5866610Z [36;1melif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then[0m
2025-12-04T09:42:35.5867193Z [36;1m  TEST_COMMAND=.ci/onnx/test.sh[0m
2025-12-04T09:42:35.5867536Z [36;1melse[0m
2025-12-04T09:42:35.5867836Z [36;1m  TEST_COMMAND=.ci/pytorch/test.sh[0m
2025-12-04T09:42:35.5868204Z [36;1mfi[0m
2025-12-04T09:42:35.5868431Z [36;1m[0m
2025-12-04T09:42:35.5868737Z [36;1m# Leaving 1GB for the runner and other things[0m
2025-12-04T09:42:35.5869424Z [36;1mTOTAL_AVAILABLE_MEMORY_IN_GB=$(awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo)[0m
2025-12-04T09:42:35.5870453Z [36;1m# https://docs.docker.com/engine/containers/resource_constraints/#--memory-swap-details, the 3GB swap[0m
2025-12-04T09:42:35.5871572Z [36;1m# comes from https://github.com/pytorch/test-infra/pull/6058[0m
2025-12-04T09:42:35.5872209Z [36;1mTOTAL_MEMORY_WITH_SWAP=$(("${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}" + 3))[0m
2025-12-04T09:42:35.5872710Z [36;1m[0m
2025-12-04T09:42:35.5873012Z [36;1mif [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then[0m
2025-12-04T09:42:35.5873425Z [36;1m  SHM_OPTS=[0m
2025-12-04T09:42:35.5873724Z [36;1m  JENKINS_USER=[0m
2025-12-04T09:42:35.5874121Z [36;1m  # ensure that docker container cleanly exits in 12 hours[0m
2025-12-04T09:42:35.5874687Z [36;1m  # if for some reason cleanup action doesn't stop container[0m
2025-12-04T09:42:35.5875164Z [36;1m  # when job is cancelled[0m
2025-12-04T09:42:35.5875529Z [36;1m  DOCKER_SHELL_CMD="sleep 12h"[0m
2025-12-04T09:42:35.5875911Z [36;1m  USED_IMAGE="${DOCKER_IMAGE_S390X}"[0m
2025-12-04T09:42:35.5876283Z [36;1melse[0m
2025-12-04T09:42:35.5876573Z [36;1m  SHM_OPTS="--shm-size=${SHM_SIZE}"[0m
2025-12-04T09:42:35.5876961Z [36;1m  JENKINS_USER="--user jenkins"[0m
2025-12-04T09:42:35.5877332Z [36;1m  DOCKER_SHELL_CMD=[0m
2025-12-04T09:42:35.5877671Z [36;1m  USED_IMAGE="${DOCKER_IMAGE}"[0m
2025-12-04T09:42:35.5878008Z [36;1mfi[0m
2025-12-04T09:42:35.5878251Z [36;1m[0m
2025-12-04T09:42:35.5878650Z [36;1m# detached container should get cleaned up by teardown_ec2_linux[0m
2025-12-04T09:42:35.5879293Z [36;1m# TODO: Stop building test binaries as part of the build phase[0m
2025-12-04T09:42:35.5880012Z [36;1m# Used for GPU_FLAG, SHM_OPTS, JENKINS_USER and DOCKER_SHELL_CMD since that doesn't play nice[0m
2025-12-04T09:42:35.5880654Z [36;1m# shellcheck disable=SC2086,SC2090[0m
2025-12-04T09:42:35.5881047Z [36;1mcontainer_name=$(docker run \[0m
2025-12-04T09:42:35.5881401Z [36;1m  ${GPU_FLAG:-} \[0m
2025-12-04T09:42:35.5881756Z [36;1m  ${SCCACHE_SERVER_PORT_DOCKER_FLAG:-} \[0m
2025-12-04T09:42:35.5882164Z [36;1m  -e BUILD_ENVIRONMENT \[0m
2025-12-04T09:42:35.5882517Z [36;1m  -e PR_NUMBER \[0m
2025-12-04T09:42:35.5882826Z [36;1m  -e GITHUB_ACTIONS \[0m
2025-12-04T09:42:35.5883168Z [36;1m  -e GITHUB_REPOSITORY \[0m
2025-12-04T09:42:35.5883520Z [36;1m  -e GITHUB_WORKFLOW \[0m
2025-12-04T09:42:35.5883845Z [36;1m  -e GITHUB_JOB \[0m
2025-12-04T09:42:35.5884164Z [36;1m  -e GITHUB_RUN_ID \[0m
2025-12-04T09:42:35.5884495Z [36;1m  -e GITHUB_RUN_NUMBER \[0m
2025-12-04T09:42:35.5884834Z [36;1m  -e GITHUB_RUN_ATTEMPT \[0m
2025-12-04T09:42:35.5885189Z [36;1m  -e JOB_ID \[0m
2025-12-04T09:42:35.5885494Z [36;1m  -e JOB_NAME \[0m
2025-12-04T09:42:35.5885799Z [36;1m  -e BASE_SHA \[0m
2025-12-04T09:42:35.5886085Z [36;1m  -e BRANCH \[0m
2025-12-04T09:42:35.5886375Z [36;1m  -e SHA1 \[0m
2025-12-04T09:42:35.5886670Z [36;1m  -e AWS_DEFAULT_REGION \[0m
2025-12-04T09:42:35.5887003Z [36;1m  -e IN_WHEEL_TEST \[0m
2025-12-04T09:42:35.5887331Z [36;1m  -e SHARD_NUMBER \[0m
2025-12-04T09:42:35.5887656Z [36;1m  -e TEST_CONFIG \[0m
2025-12-04T09:42:35.5887968Z [36;1m  -e NUM_TEST_SHARDS \[0m
2025-12-04T09:42:35.5888484Z [36;1m  -e REENABLED_ISSUES \[0m
2025-12-04T09:42:35.5888858Z [36;1m  -e CONTINUE_THROUGH_ERROR \[0m
2025-12-04T09:42:35.5889215Z [36;1m  -e VERBOSE_TEST_LOGS \[0m
2025-12-04T09:42:35.5889566Z [36;1m  -e TEST_SHOWLOCALS \[0m
2025-12-04T09:42:35.5889908Z [36;1m  -e NO_TEST_TIMEOUT \[0m
2025-12-04T09:42:35.5890239Z [36;1m  -e NO_TD \[0m
2025-12-04T09:42:35.5904781Z [36;1m  -e TD_DISTRIBUTED \[0m
2025-12-04T09:42:35.5905355Z [36;1m  -e PR_LABELS \[0m
2025-12-04T09:42:35.5905719Z [36;1m  -e MAX_JOBS="$(nproc --ignore=2)" \[0m
2025-12-04T09:42:35.5906107Z [36;1m  -e SCCACHE_BUCKET \[0m
2025-12-04T09:42:35.5906446Z [36;1m  -e SCCACHE_REGION \[0m
2025-12-04T09:42:35.5906773Z [36;1m  -e XLA_CUDA \[0m
2025-12-04T09:42:35.5907102Z [36;1m  -e XLA_CLANG_CACHE_S3_BUCKET_NAME \[0m
2025-12-04T09:42:35.5907534Z [36;1m  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \[0m
2025-12-04T09:42:35.5907972Z [36;1m  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \[0m
2025-12-04T09:42:35.5908426Z [36;1m  -e SKIP_SCCACHE_INITIALIZATION=1 \[0m
2025-12-04T09:42:35.5908819Z [36;1m  -e HUGGING_FACE_HUB_TOKEN \[0m
2025-12-04T09:42:35.5909211Z [36;1m  -e VLLM_TEST_HUGGING_FACE_TOKEN \[0m
2025-12-04T09:42:35.5909618Z [36;1m  -e SCRIBE_GRAPHQL_ACCESS_TOKEN \[0m
2025-12-04T09:42:35.5909987Z [36;1m  -e DASHBOARD_TAG \[0m
2025-12-04T09:42:35.5910327Z [36;1m  -e ARTIFACTS_FILE_SUFFIX \[0m
2025-12-04T09:42:35.5910764Z [36;1m  --memory="${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}g" \[0m
2025-12-04T09:42:35.5911262Z [36;1m  --memory-swap="${TOTAL_MEMORY_WITH_SWAP}g" \[0m
2025-12-04T09:42:35.5911737Z [36;1m  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \[0m
2025-12-04T09:42:35.5912207Z [36;1m  --security-opt seccomp=unconfined \[0m
2025-12-04T09:42:35.5912608Z [36;1m  --cap-add=SYS_PTRACE \[0m
2025-12-04T09:42:35.5912939Z [36;1m  --ipc=host \[0m
2025-12-04T09:42:35.5913239Z [36;1m  ${SHM_OPTS} \[0m
2025-12-04T09:42:35.5913535Z [36;1m  --tty \[0m
2025-12-04T09:42:35.5913796Z [36;1m  --detach \[0m
2025-12-04T09:42:35.5914116Z [36;1m  --name="${container_name}" \[0m
2025-12-04T09:42:35.5914492Z [36;1m  ${JENKINS_USER} \[0m
2025-12-04T09:42:35.5914896Z [36;1m  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \[0m
2025-12-04T09:42:35.5915353Z [36;1m  -w /var/lib/jenkins/workspace \[0m
2025-12-04T09:42:35.5915728Z [36;1m  "${USED_IMAGE}" \[0m
2025-12-04T09:42:35.5916052Z [36;1m  ${DOCKER_SHELL_CMD}[0m
2025-12-04T09:42:35.5916356Z [36;1m)[0m
2025-12-04T09:42:35.5916751Z [36;1mecho "DOCKER_CONTAINER_ID=${container_name}" >> "${GITHUB_ENV}"[0m
2025-12-04T09:42:35.5917236Z [36;1m[0m
2025-12-04T09:42:35.5917529Z [36;1mif [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then[0m
2025-12-04T09:42:35.5918220Z [36;1m  docker exec -t "${container_name}" sh -c "python3 -m pip install -r .ci/docker/requirements-ci.txt"[0m
2025-12-04T09:42:35.5918834Z [36;1mfi[0m
2025-12-04T09:42:35.5919077Z [36;1m[0m
2025-12-04T09:42:35.5919652Z [36;1mdocker exec -t "${container_name}" sh -c "python3 -m pip install $(echo dist/*.whl)[opt-einsum] && ${TEST_COMMAND}"[0m
2025-12-04T09:42:35.5926673Z shell: /usr/bin/bash -e {0}
2025-12-04T09:42:35.5927001Z env:
2025-12-04T09:42:35.5927242Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:35.5927557Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:35.5927935Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:35.5928443Z   BUILD_ENVIRONMENT: linux-jammy-cuda12.4-py3.10-gcc11
2025-12-04T09:42:35.5928859Z   PR_NUMBER: 
2025-12-04T09:42:35.5929142Z   GITHUB_REPOSITORY: pytorch/pytorch
2025-12-04T09:42:35.5929502Z   GITHUB_WORKFLOW: periodic
2025-12-04T09:42:35.5929800Z   GITHUB_JOB: test
2025-12-04T09:42:35.5930074Z   GITHUB_RUN_ID: 19922826259
2025-12-04T09:42:35.5930390Z   GITHUB_RUN_NUMBER: 19107
2025-12-04T09:42:35.5930681Z   GITHUB_RUN_ATTEMPT: 1
2025-12-04T09:42:35.5930962Z   JOB_ID: 57119749248
2025-12-04T09:42:35.5931676Z   JOB_NAME: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable)
2025-12-04T09:42:35.5932571Z   BRANCH: main
2025-12-04T09:42:35.5932886Z   SHA1: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:42:35.5933344Z   BASE_SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:42:35.5933767Z   TEST_CONFIG: legacy_nvidia_driver
2025-12-04T09:42:35.5934098Z   SHARD_NUMBER: 1
2025-12-04T09:42:35.5934366Z   NUM_TEST_SHARDS: 5
2025-12-04T09:42:35.5934644Z   EXTRA_FLAGS: 
2025-12-04T09:42:35.5934974Z   OP_BENCHMARK_TESTS: 
2025-12-04T09:42:35.5935262Z   REENABLED_ISSUES: 
2025-12-04T09:42:35.5935559Z   CONTINUE_THROUGH_ERROR: True
2025-12-04T09:42:35.5935879Z   VERBOSE_TEST_LOGS: False
2025-12-04T09:42:35.5936189Z   TEST_SHOWLOCALS: False
2025-12-04T09:42:35.5936601Z   NO_TEST_TIMEOUT: False
2025-12-04T09:42:35.5936878Z   NO_TD: False
2025-12-04T09:42:35.5937143Z   TD_DISTRIBUTED: False
2025-12-04T09:42:35.5937504Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2025-12-04T09:42:35.5937928Z   SCCACHE_REGION: us-east-1
2025-12-04T09:42:35.5938218Z   SHM_SIZE: 2g
2025-12-04T09:42:35.5939144Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:42:35.5940835Z   DOCKER_IMAGE_S390X: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:42:35.5941855Z   XLA_CUDA: 
2025-12-04T09:42:35.5942260Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
2025-12-04T09:42:35.5942796Z   PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 1
2025-12-04T09:42:35.5943172Z   PYTORCH_TEST_RERUN_DISABLED_TESTS: 0
2025-12-04T09:42:35.5943512Z   DASHBOARD_TAG: 
2025-12-04T09:42:35.5944031Z   VLLM_TEST_HUGGING_FACE_TOKEN: ***
2025-12-04T09:42:35.5944521Z   HUGGING_FACE_HUB_TOKEN: ***
2025-12-04T09:42:35.5945002Z   SCRIBE_GRAPHQL_ACCESS_TOKEN: ***
2025-12-04T09:42:35.5945622Z   ARTIFACTS_FILE_SUFFIX: test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248
2025-12-04T09:42:35.5946250Z ##[endgroup]
2025-12-04T09:42:35.5974814Z + [[ legacy_nvidia_driver == \m\u\l\t\i\g\p\u ]]
2025-12-04T09:42:35.5975312Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *onnx* ]]
2025-12-04T09:42:35.5975735Z + TEST_COMMAND=.ci/pytorch/test.sh
2025-12-04T09:42:35.5978687Z ++ awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo
2025-12-04T09:42:35.6000500Z + TOTAL_AVAILABLE_MEMORY_IN_GB='61.094 '
2025-12-04T09:42:35.6000932Z + TOTAL_MEMORY_WITH_SWAP=64
2025-12-04T09:42:35.6001316Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *\s\3\9\0\x* ]]
2025-12-04T09:42:35.6001747Z + SHM_OPTS=--shm-size=2g
2025-12-04T09:42:35.6002060Z + JENKINS_USER='--user jenkins'
2025-12-04T09:42:35.6002371Z + DOCKER_SHELL_CMD=
2025-12-04T09:42:35.6003302Z + USED_IMAGE=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:42:35.6009881Z +++ nproc --ignore=2
2025-12-04T09:42:35.6042406Z ++ docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e GITHUB_REPOSITORY -e GITHUB_WORKFLOW -e GITHUB_JOB -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e JOB_ID -e JOB_NAME -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e REENABLED_ISSUES -e CONTINUE_THROUGH_ERROR -e VERBOSE_TEST_LOGS -e TEST_SHOWLOCALS -e NO_TEST_TIMEOUT -e NO_TD -e TD_DISTRIBUTED -e PR_LABELS -e MAX_JOBS=14 -e SCCACHE_BUCKET -e SCCACHE_REGION -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS -e SKIP_SCCACHE_INITIALIZATION=1 -e HUGGING_FACE_HUB_TOKEN -e VLLM_TEST_HUGGING_FACE_TOKEN -e SCRIBE_GRAPHQL_ACCESS_TOKEN -e DASHBOARD_TAG -e ARTIFACTS_FILE_SUFFIX --memory=61g --memory-swap=64g --env-file=/tmp/github_env_19922826259 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --shm-size=2g --tty --detach --name= --user jenkins -v /home/ec2-user/actions-runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:42:43.9244569Z + container_name=764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T09:42:43.9245467Z + echo DOCKER_CONTAINER_ID=764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T09:42:43.9246987Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *\s\3\9\0\x* ]]
2025-12-04T09:42:43.9251539Z ++ echo dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl
2025-12-04T09:42:43.9254332Z + docker exec -t 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad sh -c 'python3 -m pip install dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl[opt-einsum] && .ci/pytorch/test.sh'
2025-12-04T09:42:44.4999894Z Processing ./dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl (from torch==2.10.0a0+gitffd9b0f)
2025-12-04T09:42:45.3792517Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.18.0)
2025-12-04T09:42:45.3796700Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (4.12.2)
2025-12-04T09:42:45.3801911Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.13.3)
2025-12-04T09:42:45.3807092Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2.8.8)
2025-12-04T09:42:45.3811244Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.1.6)
2025-12-04T09:42:45.3816729Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2025.10.0)
2025-12-04T09:42:45.3832292Z Requirement already satisfied: opt-einsum>=3.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.3.0)
2025-12-04T09:42:45.4271393Z Requirement already satisfied: numpy>=1.7 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from opt-einsum>=3.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.22.4)
2025-12-04T09:42:45.4294710Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.3.0)
2025-12-04T09:42:45.4362796Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.0.3)
2025-12-04T09:42:45.8660017Z Installing collected packages: torch
2025-12-04T09:42:58.4856658Z Successfully installed torch-2.10.0a0+gitffd9b0f
2025-12-04T09:42:58.5780059Z + export TERM=vt100
2025-12-04T09:42:58.5780403Z + TERM=vt100
2025-12-04T09:42:58.5782655Z ++ dirname .ci/pytorch/test.sh
2025-12-04T09:42:58.5791310Z + source .ci/pytorch/common.sh
2025-12-04T09:42:58.5794913Z +++ dirname .ci/pytorch/common.sh
2025-12-04T09:42:58.5802641Z ++ source .ci/pytorch/common_utils.sh
2025-12-04T09:42:58.5804358Z +++ declare -f -t trap_add
2025-12-04T09:42:58.5810381Z ++ set -ex -o pipefail
2025-12-04T09:42:58.5810754Z ++ [[ linux-jammy-cuda12.4-py3.10-gcc11 == *rocm* ]]
2025-12-04T09:42:58.5811164Z ++ BUILD_TEST_LIBTORCH=0
2025-12-04T09:42:58.5815609Z ++ dirname .ci/pytorch/test.sh
2025-12-04T09:42:58.5823597Z + source .ci/pytorch/common-build.sh
2025-12-04T09:42:58.5825344Z ++ [[ linux-jammy-cuda12.4-py3.10-gcc11 != *win-* ]]
2025-12-04T09:42:58.5832079Z ++++ dirname .ci/pytorch/common-build.sh
2025-12-04T09:42:58.5840553Z +++ cd .ci/pytorch
2025-12-04T09:42:58.5840865Z +++ pwd -P
2025-12-04T09:42:58.5842867Z ++ script_dir=/var/lib/jenkins/workspace/.ci/pytorch
2025-12-04T09:42:58.5843471Z ++ [[ linux-jammy-cuda12.4-py3.10-gcc11 == *-pch* ]]
2025-12-04T09:42:58.5843880Z ++ which sccache
2025-12-04T09:42:58.5862248Z ++ [[ -z ossci-compiler-cache-circleci-v2 ]]
2025-12-04T09:42:58.5862862Z ++ sccache --stop-server
2025-12-04T09:42:58.5893382Z ++ true
2025-12-04T09:42:58.5893836Z ++ rm -f /var/lib/jenkins/sccache_error.log
2025-12-04T09:42:58.5904021Z ++ trap_add sccache_epilogue EXIT
2025-12-04T09:42:58.5904422Z ++ trap_add_cmd=sccache_epilogue
2025-12-04T09:42:58.5904748Z ++ shift
2025-12-04T09:42:58.5904995Z ++ for trap_add_name in "$@"
2025-12-04T09:42:58.5911151Z ++++ trap -p EXIT
2025-12-04T09:42:58.5913791Z +++ eval 'extract_trap_cmd '
2025-12-04T09:42:58.5914098Z ++++ extract_trap_cmd
2025-12-04T09:42:58.5914392Z ++++ printf '%s\n' ''
2025-12-04T09:42:58.5914832Z +++ printf '%s\n' sccache_epilogue
2025-12-04T09:42:58.5916845Z ++ trap -- '
2025-12-04T09:42:58.5917103Z sccache_epilogue' EXIT
2025-12-04T09:42:58.5917394Z ++ [[ -n 1 ]]
2025-12-04T09:42:58.5917847Z ++ echo 'Skipping sccache server initialization, setting environment variables'
2025-12-04T09:42:58.5918540Z Skipping sccache server initialization, setting environment variables
2025-12-04T09:42:58.5919078Z ++ export SCCACHE_IDLE_TIMEOUT=0
2025-12-04T09:42:58.5919435Z ++ SCCACHE_IDLE_TIMEOUT=0
2025-12-04T09:42:58.5919845Z ++ export SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T09:42:58.5920356Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T09:42:58.5929113Z ++ export RUST_LOG=sccache::server=error
2025-12-04T09:42:58.5929554Z ++ RUST_LOG=sccache::server=error
2025-12-04T09:42:58.5929898Z ++ sccache --zero-stats
2025-12-04T09:42:58.7150924Z Statistics zeroed.
2025-12-04T09:42:58.7156126Z ++ which ccache
2025-12-04T09:42:58.7180321Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 != *rocm* ]]
2025-12-04T09:42:58.7180891Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 != *s390x* ]]
2025-12-04T09:42:58.7181342Z + [[ -d /var/lib/jenkins/workspace ]]
2025-12-04T09:42:58.7183293Z ++ stat -c %u /var/lib/jenkins/workspace
2025-12-04T09:42:58.7197930Z + WORKSPACE_ORIGINAL_OWNER_ID=1000
2025-12-04T09:42:58.7198315Z + trap_add cleanup_workspace EXIT
2025-12-04T09:42:58.7198748Z + trap_add_cmd=cleanup_workspace
2025-12-04T09:42:58.7199174Z + shift
2025-12-04T09:42:58.7199614Z + for trap_add_name in "$@"
2025-12-04T09:42:58.7206745Z +++ trap -p EXIT
2025-12-04T09:42:58.7209850Z ++ eval 'extract_trap_cmd trap -- '\''
2025-12-04T09:42:58.7210364Z sccache_epilogue'\'' EXIT'
2025-12-04T09:42:58.7210720Z +++ extract_trap_cmd trap -- '
2025-12-04T09:42:58.7211056Z sccache_epilogue' EXIT
2025-12-04T09:42:58.7211340Z +++ printf '%s\n' '
2025-12-04T09:42:58.7211623Z sccache_epilogue'
2025-12-04T09:42:58.7211922Z ++ printf '%s\n' cleanup_workspace
2025-12-04T09:42:58.7212828Z + trap -- '
2025-12-04T09:42:58.7213168Z sccache_epilogue
2025-12-04T09:42:58.7213485Z cleanup_workspace' EXIT
2025-12-04T09:42:58.7213836Z + sudo chown -R jenkins /var/lib/jenkins/workspace
2025-12-04T09:42:59.4586627Z + git config --global --add safe.directory /var/lib/jenkins/workspace
2025-12-04T09:42:59.4606215Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *cuda* ]]
2025-12-04T09:42:59.4609383Z ++ python -c 'import os;import numba.cuda; print(os.path.dirname(numba.cuda.__file__))'
2025-12-04T09:42:59.9614820Z + NUMBA_CUDA_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda
2025-12-04T09:42:59.9615578Z + '[' -n /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ']'
2025-12-04T09:42:59.9621095Z +++ realpath .ci/pytorch/test.sh
2025-12-04T09:42:59.9631140Z ++ dirname /var/lib/jenkins/workspace/.ci/pytorch/test.sh
2025-12-04T09:42:59.9649483Z + NUMBA_PATCH=/var/lib/jenkins/workspace/.ci/pytorch/numba-cuda-13.patch
2025-12-04T09:42:59.9650254Z + pushd /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda
2025-12-04T09:42:59.9651256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ~/workspace
2025-12-04T09:42:59.9651814Z + patch -p4
2025-12-04T09:42:59.9665244Z patching file cudadrv/driver.py
2025-12-04T09:42:59.9665617Z Hunk #1 succeeded at 357 (offset -8 lines).
2025-12-04T09:42:59.9674795Z + popd
2025-12-04T09:42:59.9675081Z ~/workspace
2025-12-04T09:42:59.9675401Z + echo 'Environment variables:'
2025-12-04T09:42:59.9675921Z Environment variables:
2025-12-04T09:42:59.9676206Z + env
2025-12-04T09:42:59.9686014Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T09:42:59.9686791Z CONTINUE_THROUGH_ERROR=True
2025-12-04T09:42:59.9687352Z BUILD_ENVIRONMENT=linux-jammy-cuda12.4-py3.10-gcc11
2025-12-04T09:42:59.9688037Z VLLM_TEST_HUGGING_FACE_TOKEN=***
2025-12-04T09:42:59.9688383Z HOSTNAME=764ff984146f
2025-12-04T09:42:59.9689081Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_685e94f7-4594-411d-afae-acf4a383301b
2025-12-04T09:42:59.9689849Z GITHUB_ACTION=__run_3
2025-12-04T09:42:59.9690169Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1
2025-12-04T09:42:59.9690533Z GITHUB_RUN_NUMBER=19107
2025-12-04T09:42:59.9690849Z TEST_CONFIG=legacy_nvidia_driver
2025-12-04T09:42:59.9691195Z GITHUB_REPOSITORY_OWNER_ID=21003710
2025-12-04T09:42:59.9691586Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all
2025-12-04T09:42:59.9691962Z SCCACHE_IDLE_TIMEOUT=0
2025-12-04T09:42:59.9692454Z SCRIBE_GRAPHQL_ACCESS_TOKEN=***
2025-12-04T09:42:59.9692818Z GITHUB_TRIGGERING_ACTOR=huydhn
2025-12-04T09:42:59.9693155Z GITHUB_REF_TYPE=branch
2025-12-04T09:42:59.9693502Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:42:59.9693887Z XLA_CUDA=
2025-12-04T09:42:59.9694153Z NCCL_LIB_DIR=/usr/local/cuda/lib64/
2025-12-04T09:42:59.9694630Z HUGGING_FACE_HUB_TOKEN=***
2025-12-04T09:42:59.9695128Z ***
2025-12-04T09:42:59.9695377Z GITHUB_REPOSITORY_ID=65600975
2025-12-04T09:42:59.9695705Z GITHUB_ACTIONS=true
2025-12-04T09:42:59.9695998Z NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:59.9696465Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T09:42:59.9696937Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:42:59.9697386Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:42:59.9698002Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/periodic.yml@refs/heads/main
2025-12-04T09:42:59.9698573Z UCC_HOME=/usr
2025-12-04T09:42:59.9698846Z VERBOSE_TEST_LOGS=False
2025-12-04T09:42:59.9699150Z GITHUB_REF=refs/heads/main
2025-12-04T09:42:59.9699455Z SHARD_NUMBER=1
2025-12-04T09:42:59.9699729Z GITHUB_REF_PROTECTED=true
2025-12-04T09:42:59.9700028Z HOME=/var/lib/jenkins
2025-12-04T09:42:59.9700353Z GITHUB_API_URL=https://api.github.com
2025-12-04T09:42:59.9700742Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0
2025-12-04T09:42:59.9701135Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152
2025-12-04T09:42:59.9701537Z USE_SYSTEM_NCCL=1
2025-12-04T09:42:59.9701802Z NUM_TEST_SHARDS=5
2025-12-04T09:42:59.9702064Z UCX_HOME=/usr
2025-12-04T09:42:59.9702730Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_685e94f7-4594-411d-afae-acf4a383301b
2025-12-04T09:42:59.9703922Z JOB_NAME=linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable)
2025-12-04T09:42:59.9705090Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_685e94f7-4594-411d-afae-acf4a383301b
2025-12-04T09:42:59.9706056Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json
2025-12-04T09:42:59.9706660Z GITHUB_EVENT_NAME=schedule
2025-12-04T09:42:59.9706968Z DASHBOARD_TAG=
2025-12-04T09:42:59.9707236Z GITHUB_RUN_ID=19922826259
2025-12-04T09:42:59.9707529Z INSTALLED_OPENBLAS=
2025-12-04T09:42:59.9708267Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_685e94f7-4594-411d-afae-acf4a383301b
2025-12-04T09:42:59.9709068Z GITHUB_ACTOR=huydhn
2025-12-04T09:42:59.9709325Z PR_NUMBER=
2025-12-04T09:42:59.9709572Z DESIRED_CUDA=12.4
2025-12-04T09:42:59.9710074Z GITHUB_RUN_ATTEMPT=1
2025-12-04T09:42:59.9710369Z ANACONDA_PYTHON_VERSION=3.10
2025-12-04T09:42:59.9710767Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql
2025-12-04T09:42:59.9711185Z TERM=vt100
2025-12-04T09:42:59.9711424Z INSTALLED_VISION=yes
2025-12-04T09:42:59.9711706Z BRANCH=main
2025-12-04T09:42:59.9711968Z SCCACHE_REGION=us-east-1
2025-12-04T09:42:59.9712270Z OPENSSL_ROOT_DIR=/opt/openssl
2025-12-04T09:42:59.9712693Z BUILD_AOT_INDUCTOR_TEST=
2025-12-04T09:42:59.9712997Z CUDA_PATH=/usr/local/cuda
2025-12-04T09:42:59.9713610Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux
2025-12-04T09:42:59.9714286Z GITHUB_SERVER_URL=https://github.com
2025-12-04T09:42:59.9714702Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96
2025-12-04T09:42:59.9715101Z REENABLED_ISSUES=
2025-12-04T09:42:59.9715350Z DOCS=
2025-12-04T09:42:59.9715575Z SHLVL=1
2025-12-04T09:42:59.9715805Z MAX_JOBS=14
2025-12-04T09:42:59.9716042Z GITHUB_ACTOR_ID=475357
2025-12-04T09:42:59.9716441Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:42:59.9716897Z GITHUB_REF_NAME=main
2025-12-04T09:42:59.9717353Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla
2025-12-04T09:42:59.9717854Z GITHUB_JOB=test
2025-12-04T09:42:59.9718123Z NO_TEST_TIMEOUT=False
2025-12-04T09:42:59.9718415Z TD_DISTRIBUTED=False
2025-12-04T09:42:59.9718722Z GITHUB_REPOSITORY=pytorch/pytorch
2025-12-04T09:42:59.9719074Z GITHUB_RETENTION_DAYS=90
2025-12-04T09:42:59.9719379Z OPENSSL_DIR=/opt/openssl
2025-12-04T09:42:59.9719676Z GITHUB_ACTION_REPOSITORY=
2025-12-04T09:42:59.9720603Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T09:42:59.9721572Z GITHUB_BASE_REF=
2025-12-04T09:42:59.9721824Z INSTALLED_ACL=
2025-12-04T09:42:59.9722373Z ARTIFACTS_FILE_SUFFIX=test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248
2025-12-04T09:42:59.9722999Z CI=true
2025-12-04T09:42:59.9723246Z GITHUB_REPOSITORY_OWNER=pytorch
2025-12-04T09:42:59.9723628Z RUST_LOG=sccache::server=error
2025-12-04T09:42:59.9723947Z JOB_ID=57119749248
2025-12-04T09:42:59.9724211Z GITHUB_HEAD_REF=
2025-12-04T09:42:59.9724463Z GITHUB_ACTION_REF=
2025-12-04T09:42:59.9724795Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
2025-12-04T09:42:59.9725207Z TEST_SHOWLOCALS=False
2025-12-04T09:42:59.9725495Z GITHUB_WORKFLOW=periodic
2025-12-04T09:42:59.9725811Z DEBIAN_FRONTEND=noninteractive
2025-12-04T09:42:59.9726567Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_685e94f7-4594-411d-afae-acf4a383301b
2025-12-04T09:42:59.9727311Z NO_TD=False
2025-12-04T09:42:59.9727584Z SKIP_SCCACHE_INITIALIZATION=1
2025-12-04T09:42:59.9727944Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/
2025-12-04T09:42:59.9728297Z _=/usr/bin/env
2025-12-04T09:42:59.9728718Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda
2025-12-04T09:42:59.9729345Z ++ python -c 'import site; print(site.getsitepackages()[0])'
2025-12-04T09:42:59.9837988Z + TORCH_INSTALL_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch
2025-12-04T09:42:59.9838701Z + TORCH_BIN_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin
2025-12-04T09:42:59.9839408Z + TORCH_LIB_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib
2025-12-04T09:42:59.9840135Z + TORCH_TEST_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/test
2025-12-04T09:42:59.9840658Z + BUILD_DIR=build
2025-12-04T09:42:59.9840952Z + BUILD_RENAMED_DIR=build_renamed
2025-12-04T09:42:59.9841316Z + BUILD_BIN_DIR=build/bin
2025-12-04T09:42:59.9841619Z + SHARD_NUMBER=1
2025-12-04T09:42:59.9841876Z + NUM_TEST_SHARDS=5
2025-12-04T09:42:59.9842178Z + export TORCH_SERIALIZATION_DEBUG=1
2025-12-04T09:42:59.9842549Z + TORCH_SERIALIZATION_DEBUG=1
2025-12-04T09:42:59.9842861Z + export VALGRIND=ON
2025-12-04T09:42:59.9843137Z + VALGRIND=ON
2025-12-04T09:42:59.9843697Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *clang9* ]]
2025-12-04T09:42:59.9844156Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *xpu* ]]
2025-12-04T09:42:59.9844558Z + detect_cuda_arch
2025-12-04T09:42:59.9844885Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *cuda* ]]
2025-12-04T09:42:59.9845281Z + command -v nvidia-smi
2025-12-04T09:42:59.9845580Z /usr/bin/nvidia-smi
2025-12-04T09:42:59.9849919Z ++ nvidia-smi --query-gpu=compute_cap --format=csv
2025-12-04T09:42:59.9850795Z ++ tail -n 1
2025-12-04T09:43:00.0081815Z + TORCH_CUDA_ARCH_LIST=7.5
2025-12-04T09:43:00.0082233Z + export TORCH_CUDA_ARCH_LIST
2025-12-04T09:43:00.0082629Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *s390x* ]]
2025-12-04T09:43:00.0083027Z + [[ 0 == \1 ]]
2025-12-04T09:43:00.0083288Z + [[ True == \1 ]]
2025-12-04T09:43:00.0083620Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 != *bazel* ]]
2025-12-04T09:43:00.0085760Z ++ realpath build/custom_test_artifacts
2025-12-04T09:43:00.0105830Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/workspace/build/custom_test_artifacts
2025-12-04T09:43:00.0106434Z + [[ -n '' ]]
2025-12-04T09:43:00.0106718Z + echo 'Environment variables'
2025-12-04T09:43:00.0107040Z Environment variables
2025-12-04T09:43:00.0107319Z + env
2025-12-04T09:43:00.0126233Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T09:43:00.0126841Z CONTINUE_THROUGH_ERROR=True
2025-12-04T09:43:00.0127361Z BUILD_ENVIRONMENT=linux-jammy-cuda12.4-py3.10-gcc11
2025-12-04T09:43:00.0128076Z VLLM_TEST_HUGGING_FACE_TOKEN=***
2025-12-04T09:43:00.0128464Z HOSTNAME=764ff984146f
2025-12-04T09:43:00.0129155Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_685e94f7-4594-411d-afae-acf4a383301b
2025-12-04T09:43:00.0129904Z GITHUB_ACTION=__run_3
2025-12-04T09:43:00.0130219Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1
2025-12-04T09:43:00.0130576Z GITHUB_RUN_NUMBER=19107
2025-12-04T09:43:00.0130874Z TEST_CONFIG=legacy_nvidia_driver
2025-12-04T09:43:00.0131228Z GITHUB_REPOSITORY_OWNER_ID=21003710
2025-12-04T09:43:00.0131620Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all
2025-12-04T09:43:00.0131980Z SCCACHE_IDLE_TIMEOUT=0
2025-12-04T09:43:00.0132440Z SCRIBE_GRAPHQL_ACCESS_TOKEN=***
2025-12-04T09:43:00.0132789Z GITHUB_TRIGGERING_ACTOR=huydhn
2025-12-04T09:43:00.0133159Z GITHUB_REF_TYPE=branch
2025-12-04T09:43:00.0133479Z TORCH_CUDA_ARCH_LIST=7.5
2025-12-04T09:43:00.0133925Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:43:00.0134486Z XLA_CUDA=
2025-12-04T09:43:00.0134800Z NCCL_LIB_DIR=/usr/local/cuda/lib64/
2025-12-04T09:43:00.0135509Z HUGGING_FACE_HUB_TOKEN=***
2025-12-04T09:43:00.0135950Z ***
2025-12-04T09:43:00.0136199Z GITHUB_REPOSITORY_ID=65600975
2025-12-04T09:43:00.0136702Z GITHUB_ACTIONS=true
2025-12-04T09:43:00.0136997Z NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:43:00.0137395Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T09:43:00.0137846Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:43:00.0138297Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:43:00.0138938Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/periodic.yml@refs/heads/main
2025-12-04T09:43:00.0139502Z UCC_HOME=/usr
2025-12-04T09:43:00.0139774Z TORCH_SERIALIZATION_DEBUG=1
2025-12-04T09:43:00.0140097Z VERBOSE_TEST_LOGS=False
2025-12-04T09:43:00.0140399Z GITHUB_REF=refs/heads/main
2025-12-04T09:43:00.0140689Z SHARD_NUMBER=1
2025-12-04T09:43:00.0140961Z GITHUB_REF_PROTECTED=true
2025-12-04T09:43:00.0141275Z HOME=/var/lib/jenkins
2025-12-04T09:43:00.0141585Z GITHUB_API_URL=https://api.github.com
2025-12-04T09:43:00.0141970Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0
2025-12-04T09:43:00.0142374Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152
2025-12-04T09:43:00.0142762Z USE_SYSTEM_NCCL=1
2025-12-04T09:43:00.0143033Z NUM_TEST_SHARDS=5
2025-12-04T09:43:00.0143295Z UCX_HOME=/usr
2025-12-04T09:43:00.0143954Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_685e94f7-4594-411d-afae-acf4a383301b
2025-12-04T09:43:00.0145426Z JOB_NAME=linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable)
2025-12-04T09:43:00.0146598Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_685e94f7-4594-411d-afae-acf4a383301b
2025-12-04T09:43:00.0147560Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json
2025-12-04T09:43:00.0148146Z GITHUB_EVENT_NAME=schedule
2025-12-04T09:43:00.0148567Z DASHBOARD_TAG=
2025-12-04T09:43:00.0148835Z GITHUB_RUN_ID=19922826259
2025-12-04T09:43:00.0149125Z INSTALLED_OPENBLAS=
2025-12-04T09:43:00.0149858Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_685e94f7-4594-411d-afae-acf4a383301b
2025-12-04T09:43:00.0150657Z GITHUB_ACTOR=huydhn
2025-12-04T09:43:00.0150916Z PR_NUMBER=
2025-12-04T09:43:00.0151165Z DESIRED_CUDA=12.4
2025-12-04T09:43:00.0151438Z GITHUB_RUN_ATTEMPT=1
2025-12-04T09:43:00.0151721Z VALGRIND=ON
2025-12-04T09:43:00.0151978Z ANACONDA_PYTHON_VERSION=3.10
2025-12-04T09:43:00.0152380Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql
2025-12-04T09:43:00.0152794Z TERM=vt100
2025-12-04T09:43:00.0153030Z INSTALLED_VISION=yes
2025-12-04T09:43:00.0153313Z BRANCH=main
2025-12-04T09:43:00.0153579Z SCCACHE_REGION=us-east-1
2025-12-04T09:43:00.0153888Z OPENSSL_ROOT_DIR=/opt/openssl
2025-12-04T09:43:00.0154231Z BUILD_AOT_INDUCTOR_TEST=
2025-12-04T09:43:00.0154540Z CUDA_PATH=/usr/local/cuda
2025-12-04T09:43:00.0155155Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux
2025-12-04T09:43:00.0155863Z GITHUB_SERVER_URL=https://github.com
2025-12-04T09:43:00.0156287Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96
2025-12-04T09:43:00.0156681Z REENABLED_ISSUES=
2025-12-04T09:43:00.0156947Z DOCS=
2025-12-04T09:43:00.0157176Z SHLVL=1
2025-12-04T09:43:00.0157398Z MAX_JOBS=14
2025-12-04T09:43:00.0157649Z GITHUB_ACTOR_ID=475357
2025-12-04T09:43:00.0158044Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:43:00.0158512Z GITHUB_REF_NAME=main
2025-12-04T09:43:00.0158946Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla
2025-12-04T09:43:00.0159446Z GITHUB_JOB=test
2025-12-04T09:43:00.0159710Z NO_TEST_TIMEOUT=False
2025-12-04T09:43:00.0159985Z TD_DISTRIBUTED=False
2025-12-04T09:43:00.0160288Z GITHUB_REPOSITORY=pytorch/pytorch
2025-12-04T09:43:00.0160641Z GITHUB_RETENTION_DAYS=90
2025-12-04T09:43:00.0160942Z OPENSSL_DIR=/opt/openssl
2025-12-04T09:43:00.0161255Z GITHUB_ACTION_REPOSITORY=
2025-12-04T09:43:00.0162184Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T09:43:00.0163138Z GITHUB_BASE_REF=
2025-12-04T09:43:00.0163405Z INSTALLED_ACL=
2025-12-04T09:43:00.0163945Z ARTIFACTS_FILE_SUFFIX=test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248
2025-12-04T09:43:00.0164569Z CI=true
2025-12-04T09:43:00.0164814Z GITHUB_REPOSITORY_OWNER=pytorch
2025-12-04T09:43:00.0165202Z RUST_LOG=sccache::server=error
2025-12-04T09:43:00.0165522Z JOB_ID=57119749248
2025-12-04T09:43:00.0165781Z GITHUB_HEAD_REF=
2025-12-04T09:43:00.0166047Z GITHUB_ACTION_REF=
2025-12-04T09:43:00.0166381Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
2025-12-04T09:43:00.0166779Z TEST_SHOWLOCALS=False
2025-12-04T09:43:00.0167074Z GITHUB_WORKFLOW=periodic
2025-12-04T09:43:00.0167392Z DEBIAN_FRONTEND=noninteractive
2025-12-04T09:43:00.0168125Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_685e94f7-4594-411d-afae-acf4a383301b
2025-12-04T09:43:00.0168887Z NO_TD=False
2025-12-04T09:43:00.0169154Z SKIP_SCCACHE_INITIALIZATION=1
2025-12-04T09:43:00.0169501Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/
2025-12-04T09:43:00.0170032Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda
2025-12-04T09:43:00.0170535Z _=/usr/bin/env
2025-12-04T09:43:00.0170828Z + echo 'Testing pytorch'
2025-12-04T09:43:00.0171381Z Testing pytorch
2025-12-04T09:43:00.0171833Z + export LANG=C.UTF-8
2025-12-04T09:43:00.0172133Z + LANG=C.UTF-8
2025-12-04T09:43:00.0172378Z + PR_NUMBER=
2025-12-04T09:43:00.0172671Z + [[ legacy_nvidia_driver == \d\e\f\a\u\l\t ]]
2025-12-04T09:43:00.0173111Z + [[ legacy_nvidia_driver == \d\i\s\t\r\i\b\u\t\e\d ]]
2025-12-04T09:43:00.0173522Z + [[ legacy_nvidia_driver == \s\l\o\w ]]
2025-12-04T09:43:00.0173990Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *slow-gradcheck* ]]
2025-12-04T09:43:00.0174611Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *cuda* ]]
2025-12-04T09:43:00.0175066Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda
2025-12-04T09:43:00.0175469Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda
2025-12-04T09:43:00.0175848Z + [[ legacy_nvidia_driver == *crossref* ]]
2025-12-04T09:43:00.0176380Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *rocm* ]]
2025-12-04T09:43:00.0176822Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *xpu* ]]
2025-12-04T09:43:00.0177291Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 != *-bazel-* ]]
2025-12-04T09:43:00.0177713Z + pip_install ninja==1.10.2
2025-12-04T09:43:00.0178141Z + pip_install_pkg='python3 -m pip install --progress-bar off'
2025-12-04T09:43:00.0178695Z + python3 -m pip install --progress-bar off ninja==1.10.2
2025-12-04T09:43:00.5129125Z Collecting ninja==1.10.2
2025-12-04T09:43:00.5481268Z   Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (5.0 kB)
2025-12-04T09:43:00.5598120Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB)
2025-12-04T09:43:00.9938755Z Installing collected packages: ninja
2025-12-04T09:43:00.9939416Z   Attempting uninstall: ninja
2025-12-04T09:43:00.9948456Z     Found existing installation: ninja 1.11.1.4
2025-12-04T09:43:00.9973054Z     Uninstalling ninja-1.11.1.4:
2025-12-04T09:43:01.0040520Z       Successfully uninstalled ninja-1.11.1.4
2025-12-04T09:43:01.0425193Z Successfully installed ninja-1.10.2
2025-12-04T09:43:01.1171965Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T09:43:01.1173914Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T09:43:01.1175178Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *aarch64* ]]
2025-12-04T09:43:01.1175662Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *asan* ]]
2025-12-04T09:43:01.1176212Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *-debug* ]]
2025-12-04T09:43:01.1176757Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 != *-bazel-* ]]
2025-12-04T09:43:01.1177409Z + echo 'We are not in debug mode: linux-jammy-cuda12.4-py3.10-gcc11. Expect the assertion to pass'
2025-12-04T09:43:01.1178229Z We are not in debug mode: linux-jammy-cuda12.4-py3.10-gcc11. Expect the assertion to pass
2025-12-04T09:43:01.1178796Z + cd test
2025-12-04T09:43:01.1179225Z + python -c 'import torch; torch._C._crash_if_debug_asserts_fail(424242)'
2025-12-04T09:43:02.9360036Z + [[ legacy_nvidia_driver == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]]
2025-12-04T09:43:02.9360912Z + [[ legacy_nvidia_driver == \n\o\g\p\u\_\A\V\X\5\1\2 ]]
2025-12-04T09:43:02.9361791Z + [[ legacy_nvidia_driver == \l\e\g\a\c\y\_\n\v\i\d\i\a\_\d\r\i\v\e\r ]]
2025-12-04T09:43:02.9362647Z + cd test
2025-12-04T09:43:02.9363270Z + python -c 'import torch; torch.rand(2, 2, device='\''cuda'\'')'
2025-12-04T09:43:07.9735167Z + export USE_LEGACY_DRIVER=1
2025-12-04T09:43:07.9735567Z + USE_LEGACY_DRIVER=1
2025-12-04T09:43:07.9741445Z + DYNAMO_BENCHMARK_FLAGS=()
2025-12-04T09:43:07.9742377Z + [[ legacy_nvidia_driver == *pr_time_benchmarks* ]]
2025-12-04T09:43:07.9742838Z + [[ legacy_nvidia_driver == *dynamo_eager* ]]
2025-12-04T09:43:07.9743252Z + [[ legacy_nvidia_driver == *aot_eager* ]]
2025-12-04T09:43:07.9743655Z + [[ legacy_nvidia_driver == *aot_inductor* ]]
2025-12-04T09:43:07.9744092Z + [[ legacy_nvidia_driver == *max_autotune_inductor* ]]
2025-12-04T09:43:07.9744854Z + [[ legacy_nvidia_driver == *inductor* ]]
2025-12-04T09:43:07.9745246Z + [[ legacy_nvidia_driver == *dynamic* ]]
2025-12-04T09:43:07.9745634Z + [[ legacy_nvidia_driver == *cpu* ]]
2025-12-04T09:43:07.9745991Z + [[ legacy_nvidia_driver == *xpu* ]]
2025-12-04T09:43:07.9746381Z + DYNAMO_BENCHMARK_FLAGS+=(--device cuda)
2025-12-04T09:43:07.9779713Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *libtorch* ]]
2025-12-04T09:43:07.9780400Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *-bazel-* ]]
2025-12-04T09:43:07.9783248Z + cd test
2025-12-04T09:43:07.9783976Z + python -c 'import torch; print(torch.__config__.show())'
2025-12-04T09:43:10.7580655Z PyTorch built with:
2025-12-04T09:43:10.7581020Z   - GCC 11.4
2025-12-04T09:43:10.7581269Z   - C++ Version: 201703
2025-12-04T09:43:10.7581955Z   - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
2025-12-04T09:43:10.7582815Z   - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)
2025-12-04T09:43:10.7583368Z   - OpenMP 201511 (a.k.a. OpenMP 4.5)
2025-12-04T09:43:10.7583763Z   - LAPACK is enabled (usually provided by MKL)
2025-12-04T09:43:10.7584162Z   - NNPACK is enabled
2025-12-04T09:43:10.7584473Z   - CPU capability usage: AVX512
2025-12-04T09:43:10.7584804Z   - CUDA Runtime 12.4
2025-12-04T09:43:10.7585213Z   - NVCC architecture flags: -gencode;arch=compute_75,code=sm_75
2025-12-04T09:43:10.7585686Z   - CuDNN 90.1
2025-12-04T09:43:10.7591398Z   - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32, CUDA_VERSION=12.4, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Werror -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=ON, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 
2025-12-04T09:43:10.7597396Z 
2025-12-04T09:43:11.2179114Z + cd test
2025-12-04T09:43:11.2179588Z + python -c 'import torch; print(torch.__config__.parallel_info())'
2025-12-04T09:43:12.7055366Z ATen/Parallel:
2025-12-04T09:43:12.7055741Z 	at::get_num_threads() : 8
2025-12-04T09:43:12.7056100Z 	at::get_num_interop_threads() : 8
2025-12-04T09:43:12.7056543Z OpenMP 201511 (a.k.a. OpenMP 4.5)
2025-12-04T09:43:12.7056908Z 	omp_get_max_threads() : 8
2025-12-04T09:43:12.7057572Z Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
2025-12-04T09:43:12.7058288Z 	mkl_get_max_threads() : 8
2025-12-04T09:43:12.7058731Z Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)
2025-12-04T09:43:12.7059271Z std::thread::hardware_concurrency() : 16
2025-12-04T09:43:12.7059659Z Environment variables:
2025-12-04T09:43:12.7059962Z 	OMP_NUM_THREADS : [not set]
2025-12-04T09:43:12.7060272Z 	MKL_NUM_THREADS : [not set]
2025-12-04T09:43:12.7060600Z ATen parallel backend: OpenMP
2025-12-04T09:43:12.7060814Z 
2025-12-04T09:43:13.0284302Z + [[ legacy_nvidia_driver == *numpy_2* ]]
2025-12-04T09:43:13.0285238Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *aarch64* ]]
2025-12-04T09:43:13.0286081Z + [[ legacy_nvidia_driver == *backward* ]]
2025-12-04T09:43:13.0286645Z + [[ legacy_nvidia_driver == *libtorch_agnostic_targetting* ]]
2025-12-04T09:43:13.0287411Z + [[ legacy_nvidia_driver == *xla* ]]
2025-12-04T09:43:13.0287770Z + [[ legacy_nvidia_driver == *vllm* ]]
2025-12-04T09:43:13.0288153Z + [[ legacy_nvidia_driver == *executorch* ]]
2025-12-04T09:43:13.0288577Z + [[ legacy_nvidia_driver == \j\i\t\_\l\e\g\a\c\y ]]
2025-12-04T09:43:13.0289016Z + [[ legacy_nvidia_driver == \q\u\a\n\t\i\z\a\t\i\o\n ]]
2025-12-04T09:43:13.0289488Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *libtorch* ]]
2025-12-04T09:43:13.0290072Z + [[ legacy_nvidia_driver == distributed ]]
2025-12-04T09:43:13.0290501Z + [[ legacy_nvidia_driver == *operator_benchmark* ]]
2025-12-04T09:43:13.0290973Z + [[ legacy_nvidia_driver == *operator_microbenchmark* ]]
2025-12-04T09:43:13.0291475Z + [[ legacy_nvidia_driver == *attention_microbenchmark* ]]
2025-12-04T09:43:13.0291964Z + [[ legacy_nvidia_driver == *inductor_distributed* ]]
2025-12-04T09:43:13.0292405Z + [[ legacy_nvidia_driver == *inductor-halide* ]]
2025-12-04T09:43:13.0292848Z + [[ legacy_nvidia_driver == *inductor-pallas* ]]
2025-12-04T09:43:13.0293307Z + [[ legacy_nvidia_driver == *inductor-triton-cpu* ]]
2025-12-04T09:43:13.0293783Z + [[ legacy_nvidia_driver == *inductor-micro-benchmark* ]]
2025-12-04T09:43:13.0294312Z + [[ legacy_nvidia_driver == *aoti_cross_compile_for_windows* ]]
2025-12-04T09:43:13.0294796Z + [[ legacy_nvidia_driver == *huggingface* ]]
2025-12-04T09:43:13.0295186Z + [[ legacy_nvidia_driver == *timm* ]]
2025-12-04T09:43:13.0295558Z + [[ legacy_nvidia_driver == cachebench ]]
2025-12-04T09:43:13.0295964Z + [[ legacy_nvidia_driver == verify_cachebench ]]
2025-12-04T09:43:13.0296466Z + [[ legacy_nvidia_driver == *torchbench* ]]
2025-12-04T09:43:13.0296891Z + [[ legacy_nvidia_driver == *inductor_cpp_wrapper* ]]
2025-12-04T09:43:13.0297339Z + [[ legacy_nvidia_driver == *inductor_core* ]]
2025-12-04T09:43:13.0297757Z + [[ legacy_nvidia_driver == *inductor* ]]
2025-12-04T09:43:13.0298130Z + [[ legacy_nvidia_driver == *einops* ]]
2025-12-04T09:43:13.0298523Z + [[ legacy_nvidia_driver == *dynamo_core* ]]
2025-12-04T09:43:13.0298938Z + [[ legacy_nvidia_driver == *dynamo_wrapped* ]]
2025-12-04T09:43:13.0299377Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *rocm* ]]
2025-12-04T09:43:13.0299753Z + [[ 1 == 1 ]]
2025-12-04T09:43:13.0300008Z + [[ 5 -gt 1 ]]
2025-12-04T09:43:13.0300306Z + test_lazy_tensor_meta_reference_disabled
2025-12-04T09:43:13.0300770Z + export TORCH_DISABLE_FUNCTIONALIZATION_META_REFERENCE=1
2025-12-04T09:43:13.0301280Z + TORCH_DISABLE_FUNCTIONALIZATION_META_REFERENCE=1
2025-12-04T09:43:13.0301796Z + echo 'Testing lazy tensor operations without meta reference'
2025-12-04T09:43:13.0302314Z Testing lazy tensor operations without meta reference
2025-12-04T09:43:13.0302879Z + python test/run_test.py --include lazy/test_ts_opinfo.py --verbose
2025-12-04T09:43:20.0796294Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/workspace/test/.pytorch-disabled-tests.json
2025-12-04T09:43:20.1337892Z Ignoring disabled issues:  ['']
2025-12-04T09:43:20.1456221Z Found test times from artifacts
2025-12-04T09:43:20.1916896Z Found test times from artifacts
2025-12-04T09:43:20.1932548Z Running all tests
2025-12-04T09:43:20.1936032Z Running parallel tests on 1 processes
2025-12-04T09:43:20.1936887Z Name: tests to run (est. time: 0.01min)
2025-12-04T09:43:20.1937293Z   Serial tests (1):
2025-12-04T09:43:20.1937613Z     lazy/test_ts_opinfo 1/1
2025-12-04T09:43:20.1937922Z   Parallel tests (0):
2025-12-04T09:43:20.1938266Z Name: excluded (est. time: 0.0min)
2025-12-04T09:43:20.1938606Z   Serial tests (0):
2025-12-04T09:43:20.1938869Z   Parallel tests (0):
2025-12-04T09:43:20.1939474Z Running lazy/test_ts_opinfo 1/1 ... [2025-12-04 09:43:20.193787][1828.576685955]
2025-12-04T09:43:20.1940011Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:43:20.1945360Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'lazy/test_ts_opinfo.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:43:20.194284]
2025-12-04T09:43:27.1191829Z 
2025-12-04T09:43:27.1192939Z lazy/test_ts_opinfo 1/1 was successful, full logs can be found in artifacts with path test/test-reports/lazy.test_ts_opinfo_1.1_4d268f7078430bdf_.log
2025-12-04T09:43:27.1195547Z Running 5 items in this shard: test/lazy/test_ts_opinfo.py::TestLazyTensor::testConvolutionBackward, test/lazy/test_ts_opinfo.py::TestLazyTensor::test_tensor_ctr, test/lazy/test_ts_opinfo.py::TestLazyTensor::test_view_mark_step_preserved, test/lazy/test_ts_opinfo.py::TestLazyDynamicOps::test_adaptiveavgpool3d_dynamic, test/lazy/test_ts_opinfo.py::TestLazyDynamicOps::test_nonzero_dynamic
2025-12-04T09:43:27.1197851Z 
2025-12-04T09:43:27.1198185Z Finished lazy/test_ts_opinfo 1/1 ... [2025-12-04 09:43:27.119110][1835.502006988], took 0.12min
2025-12-04T09:43:27.1199894Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/lazy.test_ts_opinfo/lazy.test_ts_opinfo-8eadd60536af3632.xml
2025-12-04T09:43:27.5569574Z Uploading artifacts took 0.12 seconds
2025-12-04T09:43:34.8761968Z Running test batch 'tests to run' cost 14.68 seconds
2025-12-04T09:43:35.7681941Z 
2025-12-04T09:43:35.7682611Z real	0m22.739s
2025-12-04T09:43:35.7682923Z user	0m21.851s
2025-12-04T09:43:35.7683181Z sys	0m7.066s
2025-12-04T09:43:35.7683569Z + export -n TORCH_DISABLE_FUNCTIONALIZATION_META_REFERENCE
2025-12-04T09:43:35.7684019Z + test_without_numpy
2025-12-04T09:43:35.7686931Z ++ dirname .ci/pytorch/test.sh
2025-12-04T09:43:35.7699930Z + pushd .ci/pytorch
2025-12-04T09:43:35.7700328Z ~/workspace/.ci/pytorch ~/workspace
2025-12-04T09:43:35.7701314Z + python -c 'import sys;sys.path.insert(0, '\''fake_numpy'\'');from unittest import TestCase;import torch;x=torch.randn(3,3);TestCase().assertRaises(RuntimeError, lambda: x.numpy())'
2025-12-04T09:43:36.6233326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:283: UserWarning: Failed to initialize NumPy: Sorry PyTorch, but our NumPy is in the other folder (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/utils/tensor_numpy.cpp:84.)
2025-12-04T09:43:36.6235039Z   cpu = _conversion_method_template(device=torch.device("cpu"))
2025-12-04T09:43:37.4980412Z + python -c 'import sys;sys.path.insert(0, '\''fake_numpy'\'');import torch;print(torch.tensor([torch.tensor(0.), torch.tensor(1.)]))'
2025-12-04T09:43:38.3685354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:283: UserWarning: Failed to initialize NumPy: Sorry PyTorch, but our NumPy is in the other folder (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/utils/tensor_numpy.cpp:84.)
2025-12-04T09:43:38.3687111Z   cpu = _conversion_method_template(device=torch.device("cpu"))
2025-12-04T09:43:38.8912295Z tensor([0., 1.])
2025-12-04T09:43:39.1956603Z + [[ legacy_nvidia_driver == *dynamo_wrapped* ]]
2025-12-04T09:43:39.1957266Z + python -c 'import sys;sys.path.insert(0, '\''fake_numpy'\'');import torch; import torch.onnx'
2025-12-04T09:43:40.0496488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:283: UserWarning: Failed to initialize NumPy: Sorry PyTorch, but our NumPy is in the other folder (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/utils/tensor_numpy.cpp:84.)
2025-12-04T09:43:40.0498192Z   cpu = _conversion_method_template(device=torch.device("cpu"))
2025-12-04T09:43:40.9395881Z + popd
2025-12-04T09:43:40.9396204Z ~/workspace
2025-12-04T09:43:40.9396471Z + install_torchvision
2025-12-04T09:43:40.9396797Z + local orig_preload
2025-12-04T09:43:40.9397168Z + local commit
2025-12-04T09:43:40.9400756Z ++ get_pinned_commit vision
2025-12-04T09:43:40.9401165Z ++ cat .github/ci_commit_pins/vision.txt
2025-12-04T09:43:40.9416620Z + commit=617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:43:40.9417032Z + orig_preload=
2025-12-04T09:43:40.9417315Z + '[' -n '' ']'
2025-12-04T09:43:40.9417628Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *cuda* ]]
2025-12-04T09:43:40.9418048Z + export FORCE_CUDA=1
2025-12-04T09:43:40.9419389Z + FORCE_CUDA=1
2025-12-04T09:43:40.9419682Z + export WITH_CUDA=1
2025-12-04T09:43:40.9419948Z + WITH_CUDA=1
2025-12-04T09:43:40.9420631Z + pip_build_and_install git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e dist/vision
2025-12-04T09:43:40.9421691Z + local build_target=git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:43:40.9422495Z + local wheel_dir=dist/vision
2025-12-04T09:43:40.9422826Z + local found_whl=0
2025-12-04T09:43:40.9423118Z + for file in "${wheel_dir}"/*.whl
2025-12-04T09:43:40.9423454Z + [[ -f dist/vision/*.whl ]]
2025-12-04T09:43:40.9423766Z + '[' 0 == 0 ']'
2025-12-04T09:43:40.9424557Z + python3 -m pip wheel --no-build-isolation --no-deps -w dist/vision git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:43:41.3131189Z Collecting git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:43:41.3137280Z   Cloning https://github.com/pytorch/vision.git (to revision 617079d944b0e72632311c30ae2bbdf1168b901e) to /tmp/pip-req-build-rqa6hlff
2025-12-04T09:43:41.3319124Z   Running command git clone --filter=blob:none --quiet https://github.com/pytorch/vision.git /tmp/pip-req-build-rqa6hlff
2025-12-04T09:43:43.0192050Z   Running command git rev-parse -q --verify 'sha^617079d944b0e72632311c30ae2bbdf1168b901e'
2025-12-04T09:43:43.0215382Z   Running command git fetch -q https://github.com/pytorch/vision.git 617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:43:43.1436052Z   Resolved https://github.com/pytorch/vision.git to commit 617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:43:46.8213661Z   Preparing metadata (pyproject.toml) ... [?25l- \ | done
2025-12-04T09:43:46.8253188Z [?25hBuilding wheels for collected packages: torchvision
2025-12-04T09:45:19.2915843Z   Building wheel for torchvision (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | done
2025-12-04T09:45:19.2982303Z [?25h  Created wheel for torchvision: filename=torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl size=1821704 sha256=219a7f84513fcaa1896571ae4e982082b0713184231b39ddfb0215ffbe02c5c6
2025-12-04T09:45:19.2984543Z   Stored in directory: /var/lib/jenkins/.cache/pip/wheels/12/b2/29/1f82685c5b5173629e1f36a9b93989ce92ce563e5fb91d27ac
2025-12-04T09:45:19.3027148Z Successfully built torchvision
2025-12-04T09:45:19.3976411Z + for file in "${wheel_dir}"/*.whl
2025-12-04T09:45:19.3977081Z + pip_install_whl dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl
2025-12-04T09:45:19.3977879Z + args=('dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl')
2025-12-04T09:45:19.3978410Z + local args
2025-12-04T09:45:19.3978877Z + [[ dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl == *\ * ]]
2025-12-04T09:45:19.3979497Z + for path in "${args[@]}"
2025-12-04T09:45:19.3980048Z + echo 'Installing dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl'
2025-12-04T09:45:19.3980867Z Installing dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl
2025-12-04T09:45:19.3981807Z + python3 -mpip install --no-index --no-deps dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl
2025-12-04T09:45:19.7764321Z Processing ./dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl
2025-12-04T09:45:19.7906948Z Installing collected packages: torchvision
2025-12-04T09:45:20.3296105Z Successfully installed torchvision-0.25.0a0+617079d
2025-12-04T09:45:20.3989952Z + '[' -n '' ']'
2025-12-04T09:45:20.3990322Z + test_python_shard 1
2025-12-04T09:45:20.3990613Z + [[ -z 5 ]]
2025-12-04T09:45:20.3991580Z + python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --exclude-quantization-tests --shard 1 5 --verbose --upload-artifacts-while-running
2025-12-04T09:45:27.6312028Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/workspace/test/.pytorch-disabled-tests.json
2025-12-04T09:45:27.6426136Z Found test times from artifacts
2025-12-04T09:45:27.6872583Z Found test times from artifacts
2025-12-04T09:45:27.6888120Z Running all tests
2025-12-04T09:45:27.7780345Z Running parallel tests on 1 processes
2025-12-04T09:45:27.7792908Z Name: tests to run (est. time: 288.55min)
2025-12-04T09:45:27.7794086Z   Serial tests (146):
2025-12-04T09:45:27.7794403Z     inductor/test_aot_inductor 1/6
2025-12-04T09:45:27.7794761Z     inductor/test_aot_inductor 6/6
2025-12-04T09:45:27.7795201Z     inductor/test_torchinductor_codegen_dynamic_shapes 2/4
2025-12-04T09:45:27.7795735Z     inductor/test_torchinductor_opinfo 2/17
2025-12-04T09:45:27.7796141Z     inductor/test_torchinductor_opinfo 7/17
2025-12-04T09:45:27.7796537Z     inductor/test_torchinductor_opinfo 12/17
2025-12-04T09:45:27.7796946Z     inductor/test_torchinductor_opinfo 17/17
2025-12-04T09:45:27.7797366Z     inductor/test_cuda_select_algorithm 3/5
2025-12-04T09:45:27.7797749Z     inductor/test_compile_subprocess 3/3
2025-12-04T09:45:27.7798137Z     inductor/test_flex_decoding 1/1
2025-12-04T09:45:27.7798500Z     inductor/test_deterministic 5/8
2025-12-04T09:45:27.7798853Z     inductor/test_fp8 1/1
2025-12-04T09:45:27.7799158Z     dynamo/test_model_output 1/1
2025-12-04T09:45:27.7799502Z     inductor/test_triton_kernels 1/1
2025-12-04T09:45:27.7799870Z     inductor/test_loop_ordering 1/1
2025-12-04T09:45:27.7800207Z     export/test_serdes 1/1
2025-12-04T09:45:27.7800522Z     dynamo/test_backends 1/1
2025-12-04T09:45:27.7811874Z     inductor/test_aot_inductor_package 1/1
2025-12-04T09:45:27.7812630Z     inductor/test_padding 1/1
2025-12-04T09:45:27.7813240Z     dynamo/test_aot_compile 1/1
2025-12-04T09:45:27.7813577Z     dynamo/test_sets 1/1
2025-12-04T09:45:27.7813921Z     dynamo/test_wrap_inductor_compiled_regions 1/1
2025-12-04T09:45:27.7814330Z     test_sparse 2/2
2025-12-04T09:45:27.7814606Z     test_decomp 3/17
2025-12-04T09:45:27.7814905Z     test_decomp 8/17
2025-12-04T09:45:27.7815174Z     test_decomp 13/17
2025-12-04T09:45:27.7815476Z     test_ops_fwd_gradients 1/2
2025-12-04T09:45:27.7815796Z     test_meta 2/5
2025-12-04T09:45:27.7816050Z     test_ops_jit 2/2
2025-12-04T09:45:27.7816422Z     test_nestedtensor 3/4
2025-12-04T09:45:27.7816726Z     test_ops 2/11
2025-12-04T09:45:27.7816976Z     test_ops 7/11
2025-12-04T09:45:27.7817271Z     functorch/test_dims 1/1
2025-12-04T09:45:27.7817597Z     functorch/test_ops 1/7
2025-12-04T09:45:27.7817899Z     functorch/test_ops 6/7
2025-12-04T09:45:27.7818229Z     inductor/test_select_algorithm 1/1
2025-12-04T09:45:27.7818599Z     inductor/test_cpu_repro 1/3
2025-12-04T09:45:27.7818935Z     inductor/test_custom_lowering 1/1
2025-12-04T09:45:27.7819295Z     inductor/test_perf 1/1
2025-12-04T09:45:27.7819620Z     inductor/test_binary_folding 1/1
2025-12-04T09:45:27.7820045Z     inductor/test_mkldnn_pattern_matcher 3/3
2025-12-04T09:45:27.7820444Z     inductor/test_cutlass_backend 1/1
2025-12-04T09:45:27.7820809Z     inductor/test_ck_backend 1/1
2025-12-04T09:45:27.7821165Z     inductor/test_gpu_cpp_wrapper 1/1
2025-12-04T09:45:27.7821528Z     inductor/test_cutedsl_template 1/1
2025-12-04T09:45:27.7821906Z     inductor/test_benchmark_fusion 1/1
2025-12-04T09:45:27.7822267Z     dynamo/test_modules 1/1
2025-12-04T09:45:27.7822580Z     dynamo/test_recompiles 1/1
2025-12-04T09:45:27.7822921Z     export/test_tree_utils 1/1
2025-12-04T09:45:27.7823263Z     inductor/test_triton_wrapper 1/1
2025-12-04T09:45:27.7823629Z     inductor/test_static_cuda_launcher 1/1
2025-12-04T09:45:27.7824017Z     export/test_dynamic_shapes 1/1
2025-12-04T09:45:27.7824368Z     dynamo/test_sdpa 1/1
2025-12-04T09:45:27.7824661Z     dynamo/test_utils 1/1
2025-12-04T09:45:27.7824986Z     inductor/test_codegen_triton 1/1
2025-12-04T09:45:27.7825343Z     dynamo/test_frame_init 1/1
2025-12-04T09:45:27.7825668Z     inductor/test_device_assert 1/1
2025-12-04T09:45:27.7826027Z     dynamo/test_skip_non_tensor 1/1
2025-12-04T09:45:27.7826603Z     dynamo/test_skip_guard_eval_unsafe 1/1
2025-12-04T09:45:27.7826991Z     inductor/test_control_deps 1/1
2025-12-04T09:45:27.7827335Z     inductor/test_benchmarking 1/1
2025-12-04T09:45:27.7827697Z     inductor/test_helion_kernels 1/1
2025-12-04T09:45:27.7828059Z     inductor/test_quantization 1/1
2025-12-04T09:45:27.7828392Z     export/test_tools 1/1
2025-12-04T09:45:27.7828817Z     inductor/test_compiled_optimizers 1/3
2025-12-04T09:45:27.7829211Z     inductor/test_aot_inductor_utils 1/1
2025-12-04T09:45:27.7829576Z     inductor/test_control_flow 3/4
2025-12-04T09:45:27.7829938Z     inductor/test_minifier_isolate 1/1
2025-12-04T09:45:27.7830307Z     dynamo/test_error_messages 1/1
2025-12-04T09:45:27.7830653Z     dynamo/test_fake_distributed 1/1
2025-12-04T09:45:27.7831011Z     dynamo/test_tree_map 1/1
2025-12-04T09:45:27.7831341Z     dynamo/test_minifier 1/1
2025-12-04T09:45:27.7831656Z     dynamo/test_guard_manager 1/1
2025-12-04T09:45:27.7832000Z     export/test_schema 1/1
2025-12-04T09:45:27.7832325Z     export/test_pass_infra 1/1
2025-12-04T09:45:27.7832670Z     dynamo/test_recompile_ux 1/1
2025-12-04T09:45:27.7833003Z     export/test_experimental 1/1
2025-12-04T09:45:27.7833348Z     export/test_converter 1/1
2025-12-04T09:45:27.7833683Z     dynamo/test_reorder_logs 1/1
2025-12-04T09:45:27.7834010Z     dynamo/test_subclasses 1/1
2025-12-04T09:45:27.7834350Z     dynamo/test_python_autograd 1/1
2025-12-04T09:45:27.7834717Z     export/test_draft_export 1/1
2025-12-04T09:45:27.7835028Z     test_package 1/1
2025-12-04T09:45:27.7835313Z     test_mkl_verbose 1/1
2025-12-04T09:45:27.7835625Z     test_comparison_utils 1/1
2025-12-04T09:45:27.7835947Z     functorch/test_ac_logging 1/1
2025-12-04T09:45:27.7836289Z     test_mkldnn_verbose 1/1
2025-12-04T09:45:27.7836609Z     test_cpp_api_parity 1/1
2025-12-04T09:45:27.7836906Z     test_autoload 1/1
2025-12-04T09:45:27.7837216Z     nn/attention/test_open_registry 1/1
2025-12-04T09:45:27.7837578Z     test_as_strided 1/1
2025-12-04T09:45:27.7837856Z     test_foreach 1/1
2025-12-04T09:45:27.7838139Z     xpu/test_gemm 1/1
2025-12-04T09:45:27.7838433Z     test_numpy_interop 1/1
2025-12-04T09:45:27.7838763Z     profiler/test_cpp_thread 1/1
2025-12-04T09:45:27.7839079Z     test_hub 1/1
2025-12-04T09:45:27.7839360Z     test_segment_reductions 1/1
2025-12-04T09:45:27.7839699Z     test_autograd_fallback 1/1
2025-12-04T09:45:27.7840010Z     test_type_hints 1/1
2025-12-04T09:45:27.7840362Z     functorch/test_aot_joint_with_descriptors 1/1
2025-12-04T09:45:27.7840767Z     test_fx_reinplace_pass 1/1
2025-12-04T09:45:27.7841095Z     functorch/test_control_flow 2/2
2025-12-04T09:45:27.7841449Z     test_subclass 1/1
2025-12-04T09:45:27.7841761Z     functorch/test_vmap_registrations 1/1
2025-12-04T09:45:27.7842137Z     nn/test_parametrization 1/1
2025-12-04T09:45:27.7842475Z     test_dynamic_shapes 1/1
2025-12-04T09:45:27.7842789Z     test_dispatch 1/1
2025-12-04T09:45:27.7843072Z     test_numba_integration 1/1
2025-12-04T09:45:27.7843413Z     test_functional_optim 1/1
2025-12-04T09:45:27.7843743Z     test_maskedtensor 1/1
2025-12-04T09:45:27.7844071Z     benchmark_utils/test_benchmark_utils 1/1
2025-12-04T09:45:27.7844447Z     test_scaled_matmul_cuda 1/1
2025-12-04T09:45:27.7844817Z     torch_np/numpy_tests/core/test_shape_base 1/1
2025-12-04T09:45:27.7845208Z     test_vulkan 1/1
2025-12-04T09:45:27.7845491Z     lazy/test_generator 1/1
2025-12-04T09:45:27.7845836Z     torch_np/numpy_tests/linalg/test_linalg 1/1
2025-12-04T09:45:27.7846255Z     torch_np/numpy_tests/core/test_dtype 1/1
2025-12-04T09:45:27.7846638Z     lazy/test_debug_util 1/1
2025-12-04T09:45:27.7846949Z     nn/test_load_state_dict 1/1
2025-12-04T09:45:27.7847273Z     test_shape_ops 1/1
2025-12-04T09:45:27.7847613Z     nn/test_module_hooks 1/1
2025-12-04T09:45:27.7847962Z     torch_np/numpy_tests/lib/test_twodim_base 1/1
2025-12-04T09:45:27.7848381Z     profiler/test_memory_profiler 1/1
2025-12-04T09:45:27.7848796Z     test_jit_llga_fuser 1/1
2025-12-04T09:45:27.7849096Z     optim/test_optim 1/1
2025-12-04T09:45:27.7849535Z     torch_np/numpy_tests/core/test_getlimits 1/1
2025-12-04T09:45:27.7849958Z     torch_np/test_ndarray_methods 1/1
2025-12-04T09:45:27.7850296Z     test_view_ops 1/1
2025-12-04T09:45:27.7850588Z     test_type_info 1/1
2025-12-04T09:45:27.7850898Z     functorch/test_aotdispatch 1/1
2025-12-04T09:45:27.7851257Z     test_scatter_gather_ops 1/1
2025-12-04T09:45:27.7851572Z     test_cuda_multigpu 1/1
2025-12-04T09:45:27.7852000Z     torch_np/numpy_tests/lib/test_index_tricks 1/1
2025-12-04T09:45:27.7852401Z     test_jit_autocast 1/1
2025-12-04T09:45:27.7852690Z     nn/test_pooling 1/1
2025-12-04T09:45:27.7852987Z     nn/test_embedding 1/1
2025-12-04T09:45:27.7853303Z     test_xnnpack_integration 1/1
2025-12-04T09:45:27.7853622Z     test_cuda_trace 1/1
2025-12-04T09:45:27.7853926Z     torch_np/test_reductions 1/1
2025-12-04T09:45:27.7854309Z     torch_np/numpy_tests/core/test_scalar_ctors 1/1
2025-12-04T09:45:27.7854734Z     torch_np/numpy_tests/lib/test_arraypad 1/1
2025-12-04T09:45:27.7855113Z     test_prims 1/1
2025-12-04T09:45:27.7855388Z     test_spectral_ops 1/1
2025-12-04T09:45:27.7855693Z     test_autoload_disable 1/1
2025-12-04T09:45:27.7856037Z     test_cpp_extensions_aot_ninja 1/1
2025-12-04T09:45:27.7856508Z     test_cpp_extensions_aot_no_ninja 1/1
2025-12-04T09:45:27.7856880Z   Parallel tests (0):
2025-12-04T09:45:27.7857174Z Name: excluded (est. time: 0.0min)
2025-12-04T09:45:27.7857519Z   Serial tests (0):
2025-12-04T09:45:27.7857796Z   Parallel tests (0):
2025-12-04T09:45:27.7858283Z Running inductor/test_aot_inductor 1/6 ... [2025-12-04 09:45:27.780266][1956.163164587]
2025-12-04T09:45:27.7858857Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:45:27.7860116Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor.py', '--shard-id=1', '--num-shards=6', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:45:27.780728]
2025-12-04T09:54:33.0179084Z 
2025-12-04T09:54:33.0180332Z PRINTING LOG FILE of inductor/test_aot_inductor 1/6 (test/test-reports/inductor.test_aot_inductor_1.6_cf1c969272c5d084_.log)
2025-12-04T09:54:33.0181602Z W1204 09:45:41.040000 1815 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T09:54:33.0183041Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-f2c58a9dfc31919e.xml
2025-12-04T09:54:33.0183959Z ============================= test session starts ==============================
2025-12-04T09:54:33.0184650Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:54:33.0185253Z cachedir: .pytest_cache
2025-12-04T09:54:33.0185975Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:54:33.0186774Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:54:33.0187124Z configfile: pytest.ini
2025-12-04T09:54:33.0187873Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:54:33.0188754Z collecting ... collected 934 items
2025-12-04T09:54:33.0189180Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T09:54:33.0277144Z Running 154 items in this shard: test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_add_complex_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_addmm_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_constant_tensor_name_collision_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_cpp_kernel_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_boolean_indexing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_mismatched_branch_output_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_non_tensor_predicates_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_predicate_on_cpu_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_simple_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_symint_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_replace_view_ops_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_device_moved_constant_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_duplicated_params_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fake_tensor_device_validation_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_free_inactive_buffer_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_int_list_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_mmaped_weights_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_weight_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_masked_select_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_misaligned_input_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_pad_non_zero_memory_leak_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_poi_multiple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_squeeze_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_pytree_inputs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_quanatized_int8_linear_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_quantized_linear_bias_none_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeated_user_defined_triton_kernel_embed_kernel_binary_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_replace_unbacked_symbol_with_backed_expr_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_reuse_kernel_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_multi_arch_embed_kernel_binary_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_stft_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_float_arg_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_reinterpret_view_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_with_none_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_mutated_autotuning_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_0_use_static_size_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbounded_expr_substitutions_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_update_constant_buffer_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_simple_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_conv_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_outer_code_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_unbacked_symint_closure_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_cudagraphs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_add_complex_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_addmm_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_amp_fallback_random_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_constant_tensor_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printer_codegen_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_assert_async_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_3_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_nested_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_non_tensor_predicates_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_use_buffers_from_outer_scope_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_original_fqn_and_dtype_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_device_moved_constant_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dynamic_smem_above_default_limit_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fill__fallback_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fqn_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_free_inactive_buffer_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_freezing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_linear_dynamic_maxautotune_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_non_default_gpu_device_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_on_gpu_device1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_proxy_executor_permute_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_pytree_inputs_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_complex_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_symint_item_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_extern_kernel_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_reinterpret_view_mem_leak_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_sympy_expr_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_2_use_static_size_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_simple_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_sym_expr_cond_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_grid_with_backed_symbols_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_size_weight_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_amp_fallback_random_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aot_inductor_consts_cpp_build_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printer_codegen_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printer_fp8_dtype_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_user_defined_triton_kernel_profiling_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_autotuning_args_reuse_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_4_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_codegen_int_array_var_fix_memory_leak_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_cpu_predicate_cuda_operands_max_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_nested_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_multiple_outputs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_replace_view_ops_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_constant_original_fqn_and_dtype_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_constant_type_propagation_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_smem_above_default_limit_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fallback_mem_leak_fix_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_foreach_multiple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fp8_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fqn_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_grid_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_masked_select_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_missing_output_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_mixed_device_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_nested_tensor_from_jagged_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_non_contiguous_output_alias_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_none_args_aot_codegen_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_misaligned_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_pad_non_zero_memory_leak_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_proxy_executor_permute_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_pytree_inputs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeated_user_defined_triton_kernel_embed_kernel_binary_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_replace_unbacked_symbol_with_backed_expr_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_reuse_kernel_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_fp8_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_embed_kernel_binary_False_max_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_small_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_stride_with_unbacked_expr_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sympy_cpp_printer_min_max_minmax1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_autotuning_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_with_none_input_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_with_none_inputs_and_equal_to_1_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_next_power_of_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_2_use_static_size_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbounded_expr_substitutions_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_using_model_name_for_files_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_mixed_device_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_mixed_device_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_outer_buffers_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_parameters_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_with_no_triton_profiler_mps
2025-12-04T09:54:33.0364232Z 
2025-12-04T09:54:33.0365106Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_1_cpu SKIPPED [0.0041s] (requires Intel GPU) [  0%]
2025-12-04T09:54:33.0366762Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_add_complex_cpu <- test/inductor/test_torchinductor.py PASSED [14.4244s] [  1%]
2025-12-04T09:54:33.0368280Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_addmm_cpu <- test/inductor/test_torchinductor.py PASSED [7.2112s] [  1%]
2025-12-04T09:54:33.0369806Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_constant_tensor_name_collision_cpu <- test/inductor/test_torchinductor.py PASSED [6.6210s] [  2%]
2025-12-04T09:54:33.0371641Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_cpp_kernel_cpu <- test/inductor/test_torchinductor.py PASSED [5.2311s] [  3%]
2025-12-04T09:54:33.0373207Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_boolean_indexing_cpu <- test/inductor/test_torchinductor.py PASSED [6.2206s] [  3%]
2025-12-04T09:54:33.0374714Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_1_cpu <- test/inductor/test_torchinductor.py PASSED [5.2596s] [  4%]
2025-12-04T09:54:33.0376860Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_mismatched_branch_output_dynamic_True_cpu W1204 09:46:28.042000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:54:33.0378899Z W1204 09:46:28.042000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:54:33.0380358Z W1204 09:46:28.043000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:54:33.0381258Z PASSED [6.1420s] [  5%]
2025-12-04T09:54:33.0382052Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_non_tensor_predicates_dynamic_False_cpu PASSED [5.2505s] [  5%]
2025-12-04T09:54:33.0383488Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_predicate_on_cpu_cpu <- test/inductor/test_torchinductor.py PASSED [5.8065s] [  6%]
2025-12-04T09:54:33.0385631Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_simple_cpu <- test/inductor/test_torchinductor.py W1204 09:46:45.223000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:54:33.0387712Z W1204 09:46:45.224000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:54:33.0388615Z PASSED [5.5656s] [  7%]
2025-12-04T09:54:33.0389475Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_symint_input_cpu <- test/inductor/test_torchinductor.py PASSED [5.5206s] [  7%]
2025-12-04T09:54:33.0391092Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_replace_view_ops_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (requires GPU) [  8%]
2025-12-04T09:54:33.0392737Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_device_moved_constant_cpu <- test/inductor/test_torchinductor.py PASSED [10.8606s] [  9%]
2025-12-04T09:54:33.0394265Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_duplicated_params_cpu <- test/inductor/test_torchinductor.py PASSED [5.3319s] [  9%]
2025-12-04T09:54:33.0395916Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fake_tensor_device_validation_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (requires GPU) [ 10%]
2025-12-04T09:54:33.0397798Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_free_inactive_buffer_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0029s] (requires GPU) [ 11%]
2025-12-04T09:54:33.0399388Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_int_list_input_cpu <- test/inductor/test_torchinductor.py PASSED [5.1502s] [ 11%]
2025-12-04T09:54:33.0400903Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_mmaped_weights_cpu <- test/inductor/test_torchinductor.py PASSED [13.8336s] [ 12%]
2025-12-04T09:54:33.0402721Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_weight_cpu SKIPPED [0.0003s] (install_free_tensors leads to OOM - https://github.com/pytorch/pytorch/issues/164062) [ 12%]
2025-12-04T09:54:33.0404444Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_masked_select_dynamic_cpu <- test/inductor/test_torchinductor.py PASSED [5.6881s] [ 13%]
2025-12-04T09:54:33.0406081Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_misaligned_input_1_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0037s] (CUDA/XPU test only) [ 14%]
2025-12-04T09:54:33.0408013Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_pad_non_zero_memory_leak_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (test is only for GPU_TYPE) [ 14%]
2025-12-04T09:54:33.0409693Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_poi_multiple_dynamic_cpu <- test/inductor/test_torchinductor.py PASSED [5.3685s] [ 15%]
2025-12-04T09:54:33.0411249Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_squeeze_cpu <- test/inductor/test_torchinductor.py PASSED [5.2204s] [ 16%]
2025-12-04T09:54:33.0412752Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_pytree_inputs_cpu <- test/inductor/test_torchinductor.py PASSED [5.3006s] [ 16%]
2025-12-04T09:54:33.0414088Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_quanatized_int8_linear_cpu PASSED [5.3603s] [ 17%]
2025-12-04T09:54:33.0415957Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_quantized_linear_bias_none_cpu [W1204 09:47:58.896568605 QuantizedLinear.cpp:379] Warning: fbgemm_pack_gemm_matrix_fp16 is deprecated and will be removed in a future PyTorch release. (function operator())
2025-12-04T09:54:33.0418123Z [W1204 09:48:03.075802721 QuantizedLinear.cpp:415] Warning: fbgemm_linear_fp16_weight_fp32_activation is deprecated and will be removed in a future PyTorch release. (function operator())
2025-12-04T09:54:33.0419140Z PASSED [5.3023s] [ 18%]
2025-12-04T09:54:33.0420096Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeated_user_defined_triton_kernel_embed_kernel_binary_True_cpu SKIPPED [0.0032s] (requires GPU) [ 18%]
2025-12-04T09:54:33.0421699Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_replace_unbacked_symbol_with_backed_expr_cpu SKIPPED [0.0030s] (requires triton) [ 19%]
2025-12-04T09:54:33.0423242Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_reuse_kernel_dynamic_cpu <- test/inductor/test_torchinductor.py PASSED [6.8573s] [ 20%]
2025-12-04T09:54:33.0424564Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_cpu PASSED [11.2020s] [ 20%]
2025-12-04T09:54:33.0425967Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_multi_arch_embed_kernel_binary_False_cpu SKIPPED [0.0003s] (Test is only supported on CUDA 12.8+) [ 21%]
2025-12-04T09:54:33.0427522Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_stft_cpu <- test/inductor/test_torchinductor.py PASSED [5.7993s] [ 22%]
2025-12-04T09:54:33.0428980Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_float_arg_dynamic_True_cpu SKIPPED [0.0033s] (requires GPU) [ 22%]
2025-12-04T09:54:33.0430673Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_True_cpu SKIPPED [0.0031s] (requires GPU) [ 23%]
2025-12-04T09:54:33.0432369Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_True_cpu SKIPPED [0.0034s] (requires GPU) [ 24%]
2025-12-04T09:54:33.0434066Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_cpu SKIPPED [0.0031s] (requires GPU) [ 24%]
2025-12-04T09:54:33.0435833Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_True_cpu SKIPPED [0.0029s] (requires GPU) [ 25%]
2025-12-04T09:54:33.0437530Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_False_cpu SKIPPED [0.0029s] (requires GPU) [ 25%]
2025-12-04T09:54:33.0439205Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_old_cpu SKIPPED [0.0029s] (requires GPU) [ 26%]
2025-12-04T09:54:33.0440845Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_new_cpu SKIPPED [0.0029s] (requires GPU) [ 27%]
2025-12-04T09:54:33.0442564Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_reinterpret_view_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (requires GPU) [ 27%]
2025-12-04T09:54:33.0444302Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_new_cpu SKIPPED [0.0029s] (requires GPU) [ 28%]
2025-12-04T09:54:33.0445971Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_old_cpu SKIPPED [0.0033s] (requires GPU) [ 29%]
2025-12-04T09:54:33.0447688Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_with_none_input_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0029s] (requires GPU) [ 29%]
2025-12-04T09:54:33.0449261Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_mutated_autotuning_cpu SKIPPED [0.0029s] (requires GPU) [ 30%]
2025-12-04T09:54:33.0450899Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_0_use_static_size_True_cpu SKIPPED [0.0030s] (Need triton for user-defined triton kernel) [ 31%]
2025-12-04T09:54:33.0453035Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbounded_expr_substitutions_cpu <- test/inductor/test_torchinductor.py W1204 09:48:33.031000 1815 site-packages/torch/_export/__init__.py:71] +============================+
2025-12-04T09:54:33.0454621Z W1204 09:48:33.031000 1815 site-packages/torch/_export/__init__.py:72] |     !!!   WARNING   !!!    |
2025-12-04T09:54:33.0455454Z W1204 09:48:33.031000 1815 site-packages/torch/_export/__init__.py:73] +============================+
2025-12-04T09:54:33.0457232Z W1204 09:48:33.031000 1815 site-packages/torch/_export/__init__.py:74] torch._export.aot_compile()/torch._export.aot_load() is being deprecated, please switch to directly calling torch._inductor.aoti_compile_and_package(torch.export.export())/torch._inductor.aoti_load_package() instead.
2025-12-04T09:54:33.0458694Z PASSED [5.4016s] [ 31%]
2025-12-04T09:54:33.0459615Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_update_constant_buffer_cpu <- test/inductor/test_torchinductor.py PASSED [5.2432s] [ 32%]
2025-12-04T09:54:33.0461800Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_simple_cpu <- test/inductor/test_torchinductor.py W1204 09:48:38.463000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:54:33.0463902Z W1204 09:48:38.463000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:54:33.0464899Z PASSED [5.8470s] [ 33%]
2025-12-04T09:54:33.0465670Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_conv_dynamic_False_cpu PASSED [6.0207s] [ 33%]
2025-12-04T09:54:33.0467748Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_outer_code_cpu <- test/inductor/test_torchinductor.py W1204 09:48:50.334000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:54:33.0469941Z W1204 09:48:50.334000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:54:33.0470838Z PASSED [5.9070s] [ 34%]
2025-12-04T09:54:33.0472514Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_unbacked_symint_closure_dynamic_True_cpu W1204 09:48:56.545000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:54:33.0474592Z W1204 09:48:56.545000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:54:33.0475492Z PASSED [6.2509s] [ 35%]
2025-12-04T09:54:33.0476509Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_cudagraphs_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0033s] (requires CUDA) [ 35%]
2025-12-04T09:54:33.0478081Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_add_complex_cuda <- test/inductor/test_torchinductor.py PASSED [11.3485s] [ 36%]
2025-12-04T09:54:33.0479520Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_addmm_cuda <- test/inductor/test_torchinductor.py PASSED [7.1668s] [ 37%]
2025-12-04T09:54:33.0481433Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_amp_fallback_random_cuda <- test/inductor/test_torchinductor.py W1204 09:49:21.138000 1815 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T09:54:33.0482798Z PASSED [6.3023s] [ 37%]
2025-12-04T09:54:33.0483693Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_constant_tensor_cuda <- test/inductor/test_torchinductor.py PASSED [5.4165s] [ 38%]
2025-12-04T09:54:33.0485269Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printer_codegen_cuda <- test/inductor/test_torchinductor.py PASSED [11.8141s] [ 38%]
2025-12-04T09:54:33.0486710Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_assert_async_cuda PASSED [5.6555s] [ 39%]
2025-12-04T09:54:33.0487824Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_3_cuda PASSED [12.6210s] [ 40%]
2025-12-04T09:54:33.0489777Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_nested_cuda <- test/inductor/test_torchinductor.py W1204 09:50:02.858000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:54:33.0491872Z W1204 09:50:02.858000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:54:33.0493311Z W1204 09:50:02.859000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:54:33.0494212Z PASSED [8.7682s] [ 40%]
2025-12-04T09:54:33.0494997Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_non_tensor_predicates_dynamic_False_cuda PASSED [6.0350s] [ 41%]
2025-12-04T09:54:33.0497341Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_use_buffers_from_outer_scope_cuda <- test/inductor/test_torchinductor.py W1204 09:50:17.550000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:54:33.0499540Z W1204 09:50:17.550000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:54:33.0500992Z W1204 09:50:17.551000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:54:33.0501958Z PASSED [6.6609s] [ 42%]
2025-12-04T09:54:33.0502908Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_original_fqn_and_dtype_cuda <- test/inductor/test_torchinductor.py PASSED [5.9463s] [ 42%]
2025-12-04T09:54:33.0504513Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_device_moved_constant_cuda <- test/inductor/test_torchinductor.py PASSED [10.5693s] [ 43%]
2025-12-04T09:54:33.0506255Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dynamic_smem_above_default_limit_cuda SKIPPED [0.0004s] (Skipping triton backend only since not big GPU (not enough SM)) [ 44%]
2025-12-04T09:54:33.0507950Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fill__fallback_cuda <- test/inductor/test_torchinductor.py PASSED [5.7898s] [ 44%]
2025-12-04T09:54:33.0509368Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fqn_cuda <- test/inductor/test_torchinductor.py PASSED [5.9852s] [ 45%]
2025-12-04T09:54:33.0510844Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_free_inactive_buffer_cuda <- test/inductor/test_torchinductor.py PASSED [5.8639s] [ 46%]
2025-12-04T09:54:33.0512329Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_freezing_cuda <- test/inductor/test_torchinductor.py PASSED [5.6392s] [ 46%]
2025-12-04T09:54:33.0513969Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_linear_dynamic_maxautotune_cuda SKIPPED [0.0004s] (Skipping triton backend only since not big GPU (not enough SM)) [ 47%]
2025-12-04T09:54:33.0515639Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_non_default_gpu_device_cuda SKIPPED [0.0002s] (requires multiple cuda devices) [ 48%]
2025-12-04T09:54:33.0517103Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_on_gpu_device1_cuda SKIPPED [0.0002s] (requires multiple cuda devices) [ 48%]
2025-12-04T09:54:33.0518595Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_proxy_executor_permute_cuda <- test/inductor/test_torchinductor.py PASSED [5.4209s] [ 49%]
2025-12-04T09:54:33.0520124Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_pytree_inputs_cuda <- test/inductor/test_torchinductor.py PASSED [5.8990s] [ 50%]
2025-12-04T09:54:33.0521639Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_complex_cuda <- test/inductor/test_torchinductor.py PASSED [5.7714s] [ 50%]
2025-12-04T09:54:33.0522972Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_cuda PASSED [11.4735s] [ 51%]
2025-12-04T09:54:33.0524260Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda ('RERUN', {'yellow': True}) [1.1028s] [ 51%]
2025-12-04T09:54:33.0525722Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda ('RERUN', {'yellow': True}) [0.6033s] [ 51%]
2025-12-04T09:54:33.0527098Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda FAILED [0.6080s] [ 51%]
2025-12-04T09:54:33.0527817Z 
2025-12-04T09:54:33.0527963Z ==================================== RERUNS ====================================
2025-12-04T09:54:33.0528679Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda _
2025-12-04T09:54:33.0529292Z Traceback (most recent call last):
2025-12-04T09:54:33.0530165Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 2019, in test_size_with_unbacked_add_and_mul_expr
2025-12-04T09:54:33.0531068Z     self.check_model(Repro(), example_inputs, dynamic_shapes=spec)
2025-12-04T09:54:33.0531862Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T09:54:33.0532559Z     actual = AOTIRunnerUtil.run(
2025-12-04T09:54:33.0533164Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T09:54:33.0533902Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T09:54:33.0534581Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T09:54:33.0535336Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T09:54:33.0536199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T09:54:33.0537070Z     return aot_inductor_minifier_wrapper(
2025-12-04T09:54:33.0537884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T09:54:33.0538648Z     raise e
2025-12-04T09:54:33.0539339Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T09:54:33.0540113Z     return func(
2025-12-04T09:54:33.0541080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T09:54:33.0541999Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T09:54:33.0542841Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T09:54:33.0543557Z     return compile_fx_aot(
2025-12-04T09:54:33.0544257Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T09:54:33.0545004Z     compiled_artifacts = compile_fx(
2025-12-04T09:54:33.0545732Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T09:54:33.0546448Z     return compile_fx(
2025-12-04T09:54:33.0547090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T09:54:33.0547845Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T09:54:33.0548689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T09:54:33.0549516Z     return _compile_fx_main(
2025-12-04T09:54:33.0550228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T09:54:33.0551076Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T09:54:33.0551938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T09:54:33.0552761Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T09:54:33.0553543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T09:54:33.0554314Z     return compile_fx_forward(
2025-12-04T09:54:33.0555057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T09:54:33.0555830Z     return inner_compile(
2025-12-04T09:54:33.0556317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T09:54:33.0556858Z     return func(*args, **kwds)
2025-12-04T09:54:33.0557574Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T09:54:33.0558476Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T09:54:33.0559486Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T09:54:33.0560307Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T09:54:33.0561114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T09:54:33.0561960Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T09:54:33.0562870Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T09:54:33.0563669Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T09:54:33.0564473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T09:54:33.0565477Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T09:54:33.0566470Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T09:54:33.0567262Z     _check_triton_bf16_support(graph)
2025-12-04T09:54:33.0568050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T09:54:33.0568863Z     warn_and_skip(node.get_device())
2025-12-04T09:54:33.0569595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T09:54:33.0570354Z     raise SkipFrame("BF16 is not supported")
2025-12-04T09:54:33.0570876Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:54:33.0571430Z 
2025-12-04T09:54:33.0571650Z To execute this test, run the following from the base repo dir:
2025-12-04T09:54:33.0572708Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda
2025-12-04T09:54:33.0573542Z 
2025-12-04T09:54:33.0573816Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:54:33.0574465Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:54:33.0574936Z unimplemented []
2025-12-04T09:54:33.0575274Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T09:54:33.0575849Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)]
2025-12-04T09:54:33.0576408Z graph_break []
2025-12-04T09:54:33.0576792Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:54:33.0577960Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:54:33.0579028Z   return cls.__new__(cls, *args)
2025-12-04T09:54:33.0579993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:54:33.0580960Z   warnings.warn(
2025-12-04T09:54:33.0581463Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda _
2025-12-04T09:54:33.0582075Z Traceback (most recent call last):
2025-12-04T09:54:33.0582866Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 2019, in test_size_with_unbacked_add_and_mul_expr
2025-12-04T09:54:33.0583781Z     self.check_model(Repro(), example_inputs, dynamic_shapes=spec)
2025-12-04T09:54:33.0584562Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T09:54:33.0585260Z     actual = AOTIRunnerUtil.run(
2025-12-04T09:54:33.0585876Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T09:54:33.0586535Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T09:54:33.0587213Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T09:54:33.0588134Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T09:54:33.0589006Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T09:54:33.0589793Z     return aot_inductor_minifier_wrapper(
2025-12-04T09:54:33.0590608Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T09:54:33.0591483Z     raise e
2025-12-04T09:54:33.0592159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T09:54:33.0592944Z     return func(
2025-12-04T09:54:33.0593664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T09:54:33.0594595Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T09:54:33.0595424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T09:54:33.0596141Z     return compile_fx_aot(
2025-12-04T09:54:33.0596844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T09:54:33.0597592Z     compiled_artifacts = compile_fx(
2025-12-04T09:54:33.0598316Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T09:54:33.0599038Z     return compile_fx(
2025-12-04T09:54:33.0599697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T09:54:33.0600434Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T09:54:33.0601278Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T09:54:33.0602110Z     return _compile_fx_main(
2025-12-04T09:54:33.0602829Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T09:54:33.0603667Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T09:54:33.0604529Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T09:54:33.0605351Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T09:54:33.0606130Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T09:54:33.0606896Z     return compile_fx_forward(
2025-12-04T09:54:33.0607636Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T09:54:33.0608407Z     return inner_compile(
2025-12-04T09:54:33.0608877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T09:54:33.0609418Z     return func(*args, **kwds)
2025-12-04T09:54:33.0610136Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T09:54:33.0611031Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T09:54:33.0611939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T09:54:33.0612756Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T09:54:33.0613573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T09:54:33.0614401Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T09:54:33.0615235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T09:54:33.0616031Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T09:54:33.0617022Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T09:54:33.0618009Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T09:54:33.0618994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T09:54:33.0619856Z     _check_triton_bf16_support(graph)
2025-12-04T09:54:33.0620642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T09:54:33.0621456Z     warn_and_skip(node.get_device())
2025-12-04T09:54:33.0622184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T09:54:33.0622975Z     raise SkipFrame("BF16 is not supported")
2025-12-04T09:54:33.0623488Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:54:33.0623894Z 
2025-12-04T09:54:33.0624113Z To execute this test, run the following from the base repo dir:
2025-12-04T09:54:33.0625170Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda
2025-12-04T09:54:33.0626008Z 
2025-12-04T09:54:33.0626293Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:54:33.0626924Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:54:33.0627401Z unimplemented []
2025-12-04T09:54:33.0627740Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T09:54:33.0628277Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)]
2025-12-04T09:54:33.0628765Z graph_break []
2025-12-04T09:54:33.0629149Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:54:33.0630338Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:54:33.0631393Z   return cls.__new__(cls, *args)
2025-12-04T09:54:33.0632358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:54:33.0633331Z   warnings.warn(
2025-12-04T09:54:33.0633720Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:54:33.0634179Z unimplemented []
2025-12-04T09:54:33.0634516Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T09:54:33.0635058Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)]
2025-12-04T09:54:33.0635535Z graph_break []
2025-12-04T09:54:33.0635907Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:54:33.0637088Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:54:33.0638145Z   return cls.__new__(cls, *args)
2025-12-04T09:54:33.0639081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:54:33.0640047Z   warnings.warn(
2025-12-04T09:54:33.0640369Z =================================== FAILURES ===================================
2025-12-04T09:54:33.0641002Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda _
2025-12-04T09:54:33.0641614Z Traceback (most recent call last):
2025-12-04T09:54:33.0642409Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 2019, in test_size_with_unbacked_add_and_mul_expr
2025-12-04T09:54:33.0643323Z     self.check_model(Repro(), example_inputs, dynamic_shapes=spec)
2025-12-04T09:54:33.0644178Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T09:54:33.0644879Z     actual = AOTIRunnerUtil.run(
2025-12-04T09:54:33.0645493Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T09:54:33.0646154Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T09:54:33.0646835Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T09:54:33.0647660Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T09:54:33.0648527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T09:54:33.0649313Z     return aot_inductor_minifier_wrapper(
2025-12-04T09:54:33.0650120Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T09:54:33.0650901Z     raise e
2025-12-04T09:54:33.0651577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T09:54:33.0652359Z     return func(
2025-12-04T09:54:33.0653071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T09:54:33.0653991Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T09:54:33.0654818Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T09:54:33.0655536Z     return compile_fx_aot(
2025-12-04T09:54:33.0656234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T09:54:33.0657071Z     compiled_artifacts = compile_fx(
2025-12-04T09:54:33.0657789Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T09:54:33.0658514Z     return compile_fx(
2025-12-04T09:54:33.0659175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T09:54:33.0659911Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T09:54:33.0660754Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T09:54:33.0661590Z     return _compile_fx_main(
2025-12-04T09:54:33.0662309Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T09:54:33.0663155Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T09:54:33.0664015Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T09:54:33.0664828Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T09:54:33.0665609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T09:54:33.0666372Z     return compile_fx_forward(
2025-12-04T09:54:33.0667109Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T09:54:33.0667880Z     return inner_compile(
2025-12-04T09:54:33.0668349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T09:54:33.0668893Z     return func(*args, **kwds)
2025-12-04T09:54:33.0669610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T09:54:33.0670509Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T09:54:33.0671584Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T09:54:33.0672399Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T09:54:33.0673350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T09:54:33.0674178Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T09:54:33.0675012Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T09:54:33.0675804Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T09:54:33.0676739Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T09:54:33.0677723Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T09:54:33.0678716Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T09:54:33.0679511Z     _check_triton_bf16_support(graph)
2025-12-04T09:54:33.0680318Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T09:54:33.0681119Z     warn_and_skip(node.get_device())
2025-12-04T09:54:33.0681850Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T09:54:33.0682619Z     raise SkipFrame("BF16 is not supported")
2025-12-04T09:54:33.0683135Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:54:33.0683538Z 
2025-12-04T09:54:33.0683756Z To execute this test, run the following from the base repo dir:
2025-12-04T09:54:33.0684808Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda
2025-12-04T09:54:33.0685643Z 
2025-12-04T09:54:33.0685929Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:54:33.0686555Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:54:33.0687033Z unimplemented []
2025-12-04T09:54:33.0687368Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T09:54:33.0687913Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)]
2025-12-04T09:54:33.0688384Z graph_break []
2025-12-04T09:54:33.0688762Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:54:33.0689944Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:54:33.0690997Z   return cls.__new__(cls, *args)
2025-12-04T09:54:33.0691949Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:54:33.0692917Z   warnings.warn(
2025-12-04T09:54:33.0693303Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:54:33.0693762Z unimplemented []
2025-12-04T09:54:33.0694099Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T09:54:33.0694645Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)]
2025-12-04T09:54:33.0695125Z graph_break []
2025-12-04T09:54:33.0695503Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:54:33.0696774Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:54:33.0697850Z   return cls.__new__(cls, *args)
2025-12-04T09:54:33.0698790Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:54:33.0699763Z   warnings.warn(
2025-12-04T09:54:33.0700154Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:54:33.0700691Z unimplemented []
2025-12-04T09:54:33.0701029Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T09:54:33.0701575Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)]
2025-12-04T09:54:33.0702064Z graph_break []
2025-12-04T09:54:33.0702431Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:54:33.0703616Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:54:33.0704826Z   return cls.__new__(cls, *args)
2025-12-04T09:54:33.0705767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:54:33.0706739Z   warnings.warn(
2025-12-04T09:54:33.0707662Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-f2c58a9dfc31919e.xml -
2025-12-04T09:54:33.0708716Z =========================== short test summary info ============================
2025-12-04T09:54:33.0709916Z FAILED [0.6080s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:54:33.0710930Z 
2025-12-04T09:54:33.0711146Z To execute this test, run the following from the base repo dir:
2025-12-04T09:54:33.0712209Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda
2025-12-04T09:54:33.0713043Z 
2025-12-04T09:54:33.0713323Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:54:33.0713906Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:54:33.0714467Z ======== 1 failed, 50 passed, 29 skipped, 2 rerun in 351.97s (0:05:51) =========
2025-12-04T09:54:33.0714940Z Got exit code 1
2025-12-04T09:54:33.0715215Z Retrying single test...
2025-12-04T09:54:33.0715841Z W1204 09:51:47.151000 8209 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T09:54:33.0716987Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-a793ea186f6e0edb.xml
2025-12-04T09:54:33.0717865Z ============================= test session starts ==============================
2025-12-04T09:54:33.0718516Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:54:33.0719120Z cachedir: .pytest_cache
2025-12-04T09:54:33.0719834Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:54:33.0720619Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:54:33.0720963Z configfile: pytest.ini
2025-12-04T09:54:33.0721697Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:54:33.0722605Z collecting ... collected 934 items / 153 deselected / 781 selected
2025-12-04T09:54:33.0723755Z stepcurrent: skipping 79 already run items. Running only test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda
2025-12-04T09:54:33.0724777Z Running 1 items in this shard
2025-12-04T09:54:33.0724999Z 
2025-12-04T09:54:33.0725660Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda ('RERUN', {'yellow': True}) [4.1418s] [100%]
2025-12-04T09:54:33.0727128Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda ('RERUN', {'yellow': True}) [0.5839s] [100%]
2025-12-04T09:54:33.0728605Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda FAILED [0.5984s] [100%]
2025-12-04T09:54:33.0729322Z 
2025-12-04T09:54:33.0729468Z ==================================== RERUNS ====================================
2025-12-04T09:54:33.0730111Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda _
2025-12-04T09:54:33.0730726Z Traceback (most recent call last):
2025-12-04T09:54:33.0731525Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 2019, in test_size_with_unbacked_add_and_mul_expr
2025-12-04T09:54:33.0732481Z     self.check_model(Repro(), example_inputs, dynamic_shapes=spec)
2025-12-04T09:54:33.0733282Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T09:54:33.0733980Z     actual = AOTIRunnerUtil.run(
2025-12-04T09:54:33.0734588Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T09:54:33.0735264Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T09:54:33.0735953Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T09:54:33.0736793Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T09:54:33.0737652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T09:54:33.0738455Z     return aot_inductor_minifier_wrapper(
2025-12-04T09:54:33.0739271Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T09:54:33.0740041Z     raise e
2025-12-04T09:54:33.0740725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T09:54:33.0741509Z     return func(
2025-12-04T09:54:33.0742223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T09:54:33.0743135Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T09:54:33.0743968Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T09:54:33.0744685Z     return compile_fx_aot(
2025-12-04T09:54:33.0745374Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T09:54:33.0746141Z     compiled_artifacts = compile_fx(
2025-12-04T09:54:33.0746864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T09:54:33.0747588Z     return compile_fx(
2025-12-04T09:54:33.0748234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T09:54:33.0748980Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T09:54:33.0749826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T09:54:33.0750656Z     return _compile_fx_main(
2025-12-04T09:54:33.0751359Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T09:54:33.0752209Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T09:54:33.0753066Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T09:54:33.0753869Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T09:54:33.0754661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T09:54:33.0755427Z     return compile_fx_forward(
2025-12-04T09:54:33.0756165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T09:54:33.0756921Z     return inner_compile(
2025-12-04T09:54:33.0757494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T09:54:33.0758039Z     return func(*args, **kwds)
2025-12-04T09:54:33.0758740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T09:54:33.0759650Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T09:54:33.0760622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T09:54:33.0761433Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T09:54:33.0762234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T09:54:33.0763076Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T09:54:33.0763914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T09:54:33.0764706Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T09:54:33.0765516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T09:54:33.0766526Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T09:54:33.0767518Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T09:54:33.0768313Z     _check_triton_bf16_support(graph)
2025-12-04T09:54:33.0769101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T09:54:33.0769917Z     warn_and_skip(node.get_device())
2025-12-04T09:54:33.0770652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T09:54:33.0771610Z     raise SkipFrame("BF16 is not supported")
2025-12-04T09:54:33.0772143Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:54:33.0772543Z 
2025-12-04T09:54:33.0772763Z To execute this test, run the following from the base repo dir:
2025-12-04T09:54:33.0773826Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda
2025-12-04T09:54:33.0774673Z 
2025-12-04T09:54:33.0774940Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:54:33.0775587Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:54:33.0776061Z unimplemented []
2025-12-04T09:54:33.0776474Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T09:54:33.0777014Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)]
2025-12-04T09:54:33.0777502Z graph_break []
2025-12-04T09:54:33.0777888Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:54:33.0779058Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:54:33.0780125Z   return cls.__new__(cls, *args)
2025-12-04T09:54:33.0781086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:54:33.0782064Z   warnings.warn(
2025-12-04T09:54:33.0782569Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda _
2025-12-04T09:54:33.0783179Z Traceback (most recent call last):
2025-12-04T09:54:33.0783971Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 2019, in test_size_with_unbacked_add_and_mul_expr
2025-12-04T09:54:33.0784867Z     self.check_model(Repro(), example_inputs, dynamic_shapes=spec)
2025-12-04T09:54:33.0785792Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T09:54:33.0786500Z     actual = AOTIRunnerUtil.run(
2025-12-04T09:54:33.0787117Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T09:54:33.0787777Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T09:54:33.0788456Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T09:54:33.0789321Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T09:54:33.0790171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T09:54:33.0790971Z     return aot_inductor_minifier_wrapper(
2025-12-04T09:54:33.0791776Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T09:54:33.0792560Z     raise e
2025-12-04T09:54:33.0793239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T09:54:33.0794021Z     return func(
2025-12-04T09:54:33.0794734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T09:54:33.0795656Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T09:54:33.0796482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T09:54:33.0797195Z     return compile_fx_aot(
2025-12-04T09:54:33.0797898Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T09:54:33.0798648Z     compiled_artifacts = compile_fx(
2025-12-04T09:54:33.0799366Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T09:54:33.0800088Z     return compile_fx(
2025-12-04T09:54:33.0800744Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T09:54:33.0801482Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T09:54:33.0802327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T09:54:33.0803160Z     return _compile_fx_main(
2025-12-04T09:54:33.0803865Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T09:54:33.0804719Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T09:54:33.0805578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T09:54:33.0806396Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T09:54:33.0807181Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T09:54:33.0807941Z     return compile_fx_forward(
2025-12-04T09:54:33.0808674Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T09:54:33.0809439Z     return inner_compile(
2025-12-04T09:54:33.0809909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T09:54:33.0810453Z     return func(*args, **kwds)
2025-12-04T09:54:33.0811168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T09:54:33.0812071Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T09:54:33.0812974Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T09:54:33.0813792Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T09:54:33.0814680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T09:54:33.0815511Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T09:54:33.0816420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T09:54:33.0817301Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T09:54:33.0818110Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T09:54:33.0819109Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T09:54:33.0820101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T09:54:33.0820893Z     _check_triton_bf16_support(graph)
2025-12-04T09:54:33.0821689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T09:54:33.0822502Z     warn_and_skip(node.get_device())
2025-12-04T09:54:33.0823230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T09:54:33.0823998Z     raise SkipFrame("BF16 is not supported")
2025-12-04T09:54:33.0824513Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:54:33.0824910Z 
2025-12-04T09:54:33.0825130Z To execute this test, run the following from the base repo dir:
2025-12-04T09:54:33.0826181Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda
2025-12-04T09:54:33.0827011Z 
2025-12-04T09:54:33.0827289Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:54:33.0827919Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:54:33.0828392Z unimplemented []
2025-12-04T09:54:33.0828732Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T09:54:33.0829265Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)]
2025-12-04T09:54:33.0829750Z graph_break []
2025-12-04T09:54:33.0830131Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:54:33.0831315Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:54:33.0832361Z   return cls.__new__(cls, *args)
2025-12-04T09:54:33.0833323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:54:33.0834294Z   warnings.warn(
2025-12-04T09:54:33.0834670Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:54:33.0835138Z unimplemented []
2025-12-04T09:54:33.0835470Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T09:54:33.0836012Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)]
2025-12-04T09:54:33.0836482Z graph_break []
2025-12-04T09:54:33.0850149Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:54:33.0851392Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:54:33.0852465Z   return cls.__new__(cls, *args)
2025-12-04T09:54:33.0853419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:54:33.0854403Z   warnings.warn(
2025-12-04T09:54:33.0854729Z =================================== FAILURES ===================================
2025-12-04T09:54:33.0855510Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda _
2025-12-04T09:54:33.0856137Z Traceback (most recent call last):
2025-12-04T09:54:33.0857018Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 2019, in test_size_with_unbacked_add_and_mul_expr
2025-12-04T09:54:33.0857944Z     self.check_model(Repro(), example_inputs, dynamic_shapes=spec)
2025-12-04T09:54:33.0858808Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T09:54:33.0859510Z     actual = AOTIRunnerUtil.run(
2025-12-04T09:54:33.0860132Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T09:54:33.0860800Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T09:54:33.0861491Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T09:54:33.0862256Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T09:54:33.0863133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T09:54:33.0863920Z     return aot_inductor_minifier_wrapper(
2025-12-04T09:54:33.0864735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T09:54:33.0865550Z     raise e
2025-12-04T09:54:33.0866236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T09:54:33.0867010Z     return func(
2025-12-04T09:54:33.0867729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T09:54:33.0868661Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T09:54:33.0869498Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T09:54:33.0870220Z     return compile_fx_aot(
2025-12-04T09:54:33.0870927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T09:54:33.0871882Z     compiled_artifacts = compile_fx(
2025-12-04T09:54:33.0872592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T09:54:33.0873331Z     return compile_fx(
2025-12-04T09:54:33.0874000Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T09:54:33.0874743Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T09:54:33.0875590Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T09:54:33.0876426Z     return _compile_fx_main(
2025-12-04T09:54:33.0877151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T09:54:33.0877993Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T09:54:33.0878852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T09:54:33.0879670Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T09:54:33.0880471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T09:54:33.0881228Z     return compile_fx_forward(
2025-12-04T09:54:33.0881969Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T09:54:33.0882746Z     return inner_compile(
2025-12-04T09:54:33.0883231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T09:54:33.0883776Z     return func(*args, **kwds)
2025-12-04T09:54:33.0884678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T09:54:33.0885601Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T09:54:33.0886508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T09:54:33.0887434Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T09:54:33.0888255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T09:54:33.0889097Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T09:54:33.0889942Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T09:54:33.0890746Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T09:54:33.0891569Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T09:54:33.0892552Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T09:54:33.0893549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T09:54:33.0894340Z     _check_triton_bf16_support(graph)
2025-12-04T09:54:33.0895148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T09:54:33.0895950Z     warn_and_skip(node.get_device())
2025-12-04T09:54:33.0896767Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T09:54:33.0897547Z     raise SkipFrame("BF16 is not supported")
2025-12-04T09:54:33.0898059Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:54:33.0898465Z 
2025-12-04T09:54:33.0898694Z To execute this test, run the following from the base repo dir:
2025-12-04T09:54:33.0899765Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda
2025-12-04T09:54:33.0900601Z 
2025-12-04T09:54:33.0900883Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:54:33.0901554Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:54:33.0902038Z unimplemented []
2025-12-04T09:54:33.0902377Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T09:54:33.0902933Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)]
2025-12-04T09:54:33.0903406Z graph_break []
2025-12-04T09:54:33.0903789Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:54:33.0904986Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:54:33.0906046Z   return cls.__new__(cls, *args)
2025-12-04T09:54:33.0907011Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:54:33.0907985Z   warnings.warn(
2025-12-04T09:54:33.0908382Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:54:33.0908846Z unimplemented []
2025-12-04T09:54:33.0909197Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T09:54:33.0909743Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)]
2025-12-04T09:54:33.0910230Z graph_break []
2025-12-04T09:54:33.0910595Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:54:33.0911901Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:54:33.0912973Z   return cls.__new__(cls, *args)
2025-12-04T09:54:33.0913935Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:54:33.0914894Z   warnings.warn(
2025-12-04T09:54:33.0915286Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:54:33.0915832Z unimplemented []
2025-12-04T09:54:33.0916154Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T09:54:33.0916698Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)]
2025-12-04T09:54:33.0917183Z graph_break []
2025-12-04T09:54:33.0917562Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:54:33.0918724Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:54:33.0919790Z   return cls.__new__(cls, *args)
2025-12-04T09:54:33.0920745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:54:33.0921704Z   warnings.warn(
2025-12-04T09:54:33.0922623Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-a793ea186f6e0edb.xml -
2025-12-04T09:54:33.0923686Z =========================== short test summary info ============================
2025-12-04T09:54:33.0924903Z FAILED [0.5984s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:54:33.0925908Z 
2025-12-04T09:54:33.0926141Z To execute this test, run the following from the base repo dir:
2025-12-04T09:54:33.0927189Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda
2025-12-04T09:54:33.0928032Z 
2025-12-04T09:54:33.0928302Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:54:33.0928898Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:54:33.0929435Z ================== 1 failed, 153 deselected, 2 rerun in 5.41s ==================
2025-12-04T09:54:33.0929874Z Got exit code 1
2025-12-04T09:54:33.0930147Z Retrying single test...
2025-12-04T09:54:33.0930778Z W1204 09:52:09.969000 8437 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T09:54:33.0931912Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-fcd1db8f24799401.xml
2025-12-04T09:54:33.0932790Z ============================= test session starts ==============================
2025-12-04T09:54:33.0933458Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:54:33.0934065Z cachedir: .pytest_cache
2025-12-04T09:54:33.0934770Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:54:33.0935562Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:54:33.0935927Z configfile: pytest.ini
2025-12-04T09:54:33.0936739Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:54:33.0937657Z collecting ... collected 934 items / 153 deselected / 781 selected
2025-12-04T09:54:33.0938812Z stepcurrent: skipping 79 already run items. Running only test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda
2025-12-04T09:54:33.0939843Z Running 1 items in this shard
2025-12-04T09:54:33.0940056Z 
2025-12-04T09:54:33.0940836Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda ('RERUN', {'yellow': True}) [4.0766s] [100%]
2025-12-04T09:54:33.0942313Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda ('RERUN', {'yellow': True}) [0.5983s] [100%]
2025-12-04T09:54:33.0943702Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda FAILED [0.6080s] [100%]
2025-12-04T09:54:33.0944527Z 
2025-12-04T09:54:33.0944687Z ==================================== RERUNS ====================================
2025-12-04T09:54:33.0945312Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda _
2025-12-04T09:54:33.0945931Z Traceback (most recent call last):
2025-12-04T09:54:33.0946725Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 2019, in test_size_with_unbacked_add_and_mul_expr
2025-12-04T09:54:33.0947644Z     self.check_model(Repro(), example_inputs, dynamic_shapes=spec)
2025-12-04T09:54:33.0948430Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T09:54:33.0949130Z     actual = AOTIRunnerUtil.run(
2025-12-04T09:54:33.0949752Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T09:54:33.0950421Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T09:54:33.0951105Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T09:54:33.0951860Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T09:54:33.0952725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T09:54:33.0955908Z     return aot_inductor_minifier_wrapper(
2025-12-04T09:54:33.0956736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T09:54:33.0957516Z     raise e
2025-12-04T09:54:33.0958189Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T09:54:33.0958981Z     return func(
2025-12-04T09:54:33.0959693Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T09:54:33.0960622Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T09:54:33.0961451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T09:54:33.0962172Z     return compile_fx_aot(
2025-12-04T09:54:33.0962873Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T09:54:33.0963634Z     compiled_artifacts = compile_fx(
2025-12-04T09:54:33.0964349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T09:54:33.0965073Z     return compile_fx(
2025-12-04T09:54:33.0965730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T09:54:33.0966470Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T09:54:33.0967313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T09:54:33.0968149Z     return _compile_fx_main(
2025-12-04T09:54:33.0968868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T09:54:33.0969703Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T09:54:33.0970567Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T09:54:33.0971791Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T09:54:33.0972573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T09:54:33.0973340Z     return compile_fx_forward(
2025-12-04T09:54:33.0974084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T09:54:33.0974955Z     return inner_compile(
2025-12-04T09:54:33.0975425Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T09:54:33.0975967Z     return func(*args, **kwds)
2025-12-04T09:54:33.0976774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T09:54:33.0977685Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T09:54:33.0978592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T09:54:33.0979413Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T09:54:33.0980233Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T09:54:33.0981071Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T09:54:33.0981934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T09:54:33.0982723Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T09:54:33.0983547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T09:54:33.0984552Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T09:54:33.0985548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T09:54:33.0986333Z     _check_triton_bf16_support(graph)
2025-12-04T09:54:33.0987142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T09:54:33.0987964Z     warn_and_skip(node.get_device())
2025-12-04T09:54:33.0988688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T09:54:33.0989475Z     raise SkipFrame("BF16 is not supported")
2025-12-04T09:54:33.0990006Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:54:33.0990400Z 
2025-12-04T09:54:33.0990637Z To execute this test, run the following from the base repo dir:
2025-12-04T09:54:33.0991704Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda
2025-12-04T09:54:33.0992566Z 
2025-12-04T09:54:33.0992842Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:54:33.0993496Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:54:33.0993978Z unimplemented []
2025-12-04T09:54:33.0994310Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T09:54:33.0994868Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)]
2025-12-04T09:54:33.0995361Z graph_break []
2025-12-04T09:54:33.0995736Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:54:33.0996927Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:54:33.0998008Z   return cls.__new__(cls, *args)
2025-12-04T09:54:33.0998965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:54:33.1000032Z   warnings.warn(
2025-12-04T09:54:33.1000555Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda _
2025-12-04T09:54:33.1001170Z Traceback (most recent call last):
2025-12-04T09:54:33.1001945Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 2019, in test_size_with_unbacked_add_and_mul_expr
2025-12-04T09:54:33.1002859Z     self.check_model(Repro(), example_inputs, dynamic_shapes=spec)
2025-12-04T09:54:33.1003724Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T09:54:33.1004431Z     actual = AOTIRunnerUtil.run(
2025-12-04T09:54:33.1005039Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T09:54:33.1005715Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T09:54:33.1006402Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T09:54:33.1007145Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T09:54:33.1008016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T09:54:33.1008817Z     return aot_inductor_minifier_wrapper(
2025-12-04T09:54:33.1009628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T09:54:33.1010401Z     raise e
2025-12-04T09:54:33.1011092Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T09:54:33.1011880Z     return func(
2025-12-04T09:54:33.1012593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T09:54:33.1013507Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T09:54:33.1014349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T09:54:33.1015062Z     return compile_fx_aot(
2025-12-04T09:54:33.1015749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T09:54:33.1016618Z     compiled_artifacts = compile_fx(
2025-12-04T09:54:33.1017344Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T09:54:33.1018080Z     return compile_fx(
2025-12-04T09:54:33.1018730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T09:54:33.1019492Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T09:54:33.1020341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T09:54:33.1021159Z     return _compile_fx_main(
2025-12-04T09:54:33.1021886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T09:54:33.1022738Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T09:54:33.1023598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T09:54:33.1024400Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T09:54:33.1025196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T09:54:33.1025959Z     return compile_fx_forward(
2025-12-04T09:54:33.1026699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T09:54:33.1027461Z     return inner_compile(
2025-12-04T09:54:33.1027943Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T09:54:33.1028486Z     return func(*args, **kwds)
2025-12-04T09:54:33.1029310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T09:54:33.1030221Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T09:54:33.1031131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T09:54:33.1032014Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T09:54:33.1032820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T09:54:33.1033663Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T09:54:33.1034498Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T09:54:33.1035295Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T09:54:33.1036105Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T09:54:33.1037102Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T09:54:33.1038097Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T09:54:33.1038875Z     _check_triton_bf16_support(graph)
2025-12-04T09:54:33.1039706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T09:54:33.1040518Z     warn_and_skip(node.get_device())
2025-12-04T09:54:33.1041251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T09:54:33.1042009Z     raise SkipFrame("BF16 is not supported")
2025-12-04T09:54:33.1042534Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:54:33.1042920Z 
2025-12-04T09:54:33.1043154Z To execute this test, run the following from the base repo dir:
2025-12-04T09:54:33.1044214Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda
2025-12-04T09:54:33.1045052Z 
2025-12-04T09:54:33.1045325Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:54:33.1045968Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:54:33.1046442Z unimplemented []
2025-12-04T09:54:33.1046764Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T09:54:33.1047308Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)]
2025-12-04T09:54:33.1047795Z graph_break []
2025-12-04T09:54:33.1048171Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:54:33.1049344Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:54:33.1050413Z   return cls.__new__(cls, *args)
2025-12-04T09:54:33.1051365Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:54:33.1052336Z   warnings.warn(
2025-12-04T09:54:33.1052716Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:54:33.1053184Z unimplemented []
2025-12-04T09:54:33.1053519Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T09:54:33.1054051Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)]
2025-12-04T09:54:33.1054544Z graph_break []
2025-12-04T09:54:33.1054922Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:54:33.1056206Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:54:33.1057378Z   return cls.__new__(cls, *args)
2025-12-04T09:54:33.1058339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:54:33.1059313Z   warnings.warn(
2025-12-04T09:54:33.1059691Z =================================== FAILURES ===================================
2025-12-04T09:54:33.1060348Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda _
2025-12-04T09:54:33.1060969Z Traceback (most recent call last):
2025-12-04T09:54:33.1061755Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 2019, in test_size_with_unbacked_add_and_mul_expr
2025-12-04T09:54:33.1062676Z     self.check_model(Repro(), example_inputs, dynamic_shapes=spec)
2025-12-04T09:54:33.1063479Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T09:54:33.1064183Z     actual = AOTIRunnerUtil.run(
2025-12-04T09:54:33.1064790Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T09:54:33.1065472Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T09:54:33.1066154Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T09:54:33.1066909Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T09:54:33.1067769Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T09:54:33.1068573Z     return aot_inductor_minifier_wrapper(
2025-12-04T09:54:33.1069387Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T09:54:33.1070159Z     raise e
2025-12-04T09:54:33.1070846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T09:54:33.1071863Z     return func(
2025-12-04T09:54:33.1072578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T09:54:33.1073492Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T09:54:33.1074335Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T09:54:33.1075053Z     return compile_fx_aot(
2025-12-04T09:54:33.1075747Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T09:54:33.1076508Z     compiled_artifacts = compile_fx(
2025-12-04T09:54:33.1077231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T09:54:33.1077946Z     return compile_fx(
2025-12-04T09:54:33.1078596Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T09:54:33.1079342Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T09:54:33.1080187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T09:54:33.1081011Z     return _compile_fx_main(
2025-12-04T09:54:33.1081721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T09:54:33.1082574Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T09:54:33.1083440Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T09:54:33.1084242Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T09:54:33.1085209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T09:54:33.1085985Z     return compile_fx_forward(
2025-12-04T09:54:33.1086730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T09:54:33.1087488Z     return inner_compile(
2025-12-04T09:54:33.1087974Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T09:54:33.1088614Z     return func(*args, **kwds)
2025-12-04T09:54:33.1089329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T09:54:33.1090252Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T09:54:33.1091168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T09:54:33.1091997Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T09:54:33.1092815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T09:54:33.1093671Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T09:54:33.1094516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T09:54:33.1095323Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T09:54:33.1096138Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T09:54:33.1097320Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T09:54:33.1098313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T09:54:33.1099114Z     _check_triton_bf16_support(graph)
2025-12-04T09:54:33.1099932Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T09:54:33.1100753Z     warn_and_skip(node.get_device())
2025-12-04T09:54:33.1101502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T09:54:33.1102298Z     raise SkipFrame("BF16 is not supported")
2025-12-04T09:54:33.1102838Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:54:33.1103243Z 
2025-12-04T09:54:33.1103462Z To execute this test, run the following from the base repo dir:
2025-12-04T09:54:33.1104556Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda
2025-12-04T09:54:33.1105438Z 
2025-12-04T09:54:33.1105714Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:54:33.1106384Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:54:33.1106870Z unimplemented []
2025-12-04T09:54:33.1107225Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T09:54:33.1107793Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)]
2025-12-04T09:54:33.1108277Z graph_break []
2025-12-04T09:54:33.1108662Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:54:33.1109905Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:54:33.1111003Z   return cls.__new__(cls, *args)
2025-12-04T09:54:33.1112013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:54:33.1113016Z   warnings.warn(
2025-12-04T09:54:33.1113420Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:54:33.1113890Z unimplemented []
2025-12-04T09:54:33.1114377Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T09:54:33.1114947Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)]
2025-12-04T09:54:33.1115429Z graph_break []
2025-12-04T09:54:33.1115811Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:54:33.1117005Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:54:33.1118149Z   return cls.__new__(cls, *args)
2025-12-04T09:54:33.1119118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:54:33.1120111Z   warnings.warn(
2025-12-04T09:54:33.1120503Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:54:33.1120964Z unimplemented []
2025-12-04T09:54:33.1121314Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T09:54:33.1121865Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)]
2025-12-04T09:54:33.1122343Z graph_break []
2025-12-04T09:54:33.1122722Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:54:33.1123913Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:54:33.1124989Z   return cls.__new__(cls, *args)
2025-12-04T09:54:33.1125930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:54:33.1126900Z   warnings.warn(
2025-12-04T09:54:33.1127830Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-fcd1db8f24799401.xml -
2025-12-04T09:54:33.1128900Z =========================== short test summary info ============================
2025-12-04T09:54:33.1130105Z FAILED [0.6080s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:54:33.1131130Z 
2025-12-04T09:54:33.1131358Z To execute this test, run the following from the base repo dir:
2025-12-04T09:54:33.1132418Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda
2025-12-04T09:54:33.1133252Z 
2025-12-04T09:54:33.1133535Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:54:33.1134124Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:54:33.1134661Z ================== 1 failed, 153 deselected, 2 rerun in 5.37s ==================
2025-12-04T09:54:33.1135123Z Got exit code 1
2025-12-04T09:54:33.1135905Z FAILED CONSISTENTLY: test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda
2025-12-04T09:54:33.1137168Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:54:33.1138180Z W1204 09:52:32.750000 8665 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T09:54:33.1139341Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-bb08f25297cc596b.xml
2025-12-04T09:54:33.1140216Z ============================= test session starts ==============================
2025-12-04T09:54:33.1140877Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:54:33.1141491Z cachedir: .pytest_cache
2025-12-04T09:54:33.1142343Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:54:33.1143123Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:54:33.1143485Z configfile: pytest.ini
2025-12-04T09:54:33.1144215Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:54:33.1145131Z collecting ... collected 934 items / 80 deselected / 854 selected
2025-12-04T09:54:33.1145707Z stepcurrent: skipping 80 already run items.
2025-12-04T09:54:33.1146103Z Running 74 items in this shard
2025-12-04T09:54:33.1146314Z 
2025-12-04T09:54:33.1147000Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_symint_item_cuda <- test/inductor/test_torchinductor.py PASSED [6.5275s] [  1%]
2025-12-04T09:54:33.1148797Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_extern_kernel_arg_cuda W1204 09:52:44.647000 8665 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T09:54:33.1150076Z PASSED [9.2830s] [  2%]
2025-12-04T09:54:33.1150978Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_False_cuda PASSED [6.5493s] [  4%]
2025-12-04T09:54:33.1152532Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_False_cuda PASSED [6.4522s] [  5%]
2025-12-04T09:54:33.1154090Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_True_cuda PASSED [9.9408s] [  6%]
2025-12-04T09:54:33.1155638Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_False_cuda PASSED [6.4978s] [  8%]
2025-12-04T09:54:33.1157207Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_True_cuda PASSED [7.8103s] [  9%]
2025-12-04T09:54:33.1158757Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_True_cuda PASSED [8.1735s] [ 10%]
2025-12-04T09:54:33.1160539Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_new_cuda SKIPPED [0.0033s] (requires triton.tools.tensor_descriptor TMA support) [ 12%]
2025-12-04T09:54:33.1162389Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_reinterpret_view_mem_leak_cuda <- test/inductor/test_torchinductor.py PASSED [6.6981s] [ 13%]
2025-12-04T09:54:33.1164059Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_sympy_expr_arg_cuda <- test/inductor/test_torchinductor.py PASSED [6.4184s] [ 14%]
2025-12-04T09:54:33.1165940Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_cuda SKIPPED [0.0034s] (requires triton.tools.experimental_descriptor TMA support) [ 16%]
2025-12-04T09:54:33.1167771Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_cuda PASSED [7.8875s] [ 17%]
2025-12-04T09:54:33.1169304Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_2_use_static_size_False_cuda PASSED [7.5621s] [ 18%]
2025-12-04T09:54:33.1171692Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_simple_cuda <- test/inductor/test_torchinductor.py W1204 09:54:04.717000 8665 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:54:33.1173826Z W1204 09:54:04.718000 8665 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:54:33.1174730Z PASSED [6.7229s] [ 20%]
2025-12-04T09:54:33.1175739Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_sym_expr_cond_dynamic_False_cuda PASSED [7.1835s] [ 21%]
2025-12-04T09:54:33.1177323Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_grid_with_backed_symbols_cuda <- test/inductor/test_torchinductor.py PASSED [5.8985s] [ 22%]
2025-12-04T09:54:33.1178875Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_size_weight_cuda <- test/inductor/test_torchinductor.py PASSED [6.0453s] [ 24%]
2025-12-04T09:54:33.1180637Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_mps SKIPPED [0.0004s] (No MPS backend available) [ 25%]
2025-12-04T09:54:33.1182365Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_amp_fallback_random_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (No MPS backend available) [ 27%]
2025-12-04T09:54:33.1184194Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aot_inductor_consts_cpp_build_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 28%]
2025-12-04T09:54:33.1186054Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printer_codegen_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 29%]
2025-12-04T09:54:33.1187718Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printer_fp8_dtype_mps SKIPPED [0.0007s] (No MPS backend available) [ 31%]
2025-12-04T09:54:33.1189473Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_user_defined_triton_kernel_profiling_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 32%]
2025-12-04T09:54:33.1191369Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_autotuning_args_reuse_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 33%]
2025-12-04T09:54:33.1193154Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_4_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 35%]
2025-12-04T09:54:33.1194996Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_codegen_int_array_var_fix_memory_leak_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 36%]
2025-12-04T09:54:33.1196815Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_cpu_predicate_cuda_operands_max_autotune_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 37%]
2025-12-04T09:54:33.1198522Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_nested_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 39%]
2025-12-04T09:54:33.1200305Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_multiple_outputs_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 40%]
2025-12-04T09:54:33.1202146Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_replace_view_ops_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 41%]
2025-12-04T09:54:33.1203711Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_constant_mps SKIPPED [0.0002s] (No MPS backend available) [ 43%]
2025-12-04T09:54:33.1205335Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_constant_original_fqn_and_dtype_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 44%]
2025-12-04T09:54:33.1207204Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_constant_type_propagation_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 45%]
2025-12-04T09:54:33.1208992Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_smem_above_default_limit_mps SKIPPED [0.0002s] (No MPS backend available) [ 47%]
2025-12-04T09:54:33.1210479Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fallback_mem_leak_fix_mps SKIPPED [0.0002s] (No MPS backend available) [ 48%]
2025-12-04T09:54:33.1211913Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_foreach_multiple_dynamic_mps SKIPPED [0.0002s] (No MPS backend available) [ 50%]
2025-12-04T09:54:33.1213270Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fp8_mps SKIPPED [0.0002s] (No MPS backend available) [ 51%]
2025-12-04T09:54:33.1214777Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fqn_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 52%]
2025-12-04T09:54:33.1216520Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_grid_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 54%]
2025-12-04T09:54:33.1218281Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_masked_select_dynamic_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 55%]
2025-12-04T09:54:33.1220047Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_missing_output_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 56%]
2025-12-04T09:54:33.1221787Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_mixed_device_1_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 58%]
2025-12-04T09:54:33.1223575Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_nested_tensor_from_jagged_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 59%]
2025-12-04T09:54:33.1225434Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_non_contiguous_output_alias_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 60%]
2025-12-04T09:54:33.1227266Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_none_args_aot_codegen_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 62%]
2025-12-04T09:54:33.1229033Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_misaligned_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 63%]
2025-12-04T09:54:33.1230828Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_pad_non_zero_memory_leak_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (No MPS backend available) [ 64%]
2025-12-04T09:54:33.1232642Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_proxy_executor_permute_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 66%]
2025-12-04T09:54:33.1234418Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_pytree_inputs_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 67%]
2025-12-04T09:54:33.1236182Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeated_user_defined_triton_kernel_embed_kernel_binary_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 68%]
2025-12-04T09:54:33.1237880Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_replace_unbacked_symbol_with_backed_expr_mps SKIPPED [0.0002s] (No MPS backend available) [ 70%]
2025-12-04T09:54:33.1239595Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_reuse_kernel_dynamic_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 71%]
2025-12-04T09:54:33.1241193Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_fp8_mps SKIPPED [0.0002s] (No MPS backend available) [ 72%]
2025-12-04T09:54:33.1242748Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_embed_kernel_binary_False_max_autotune_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 74%]
2025-12-04T09:54:33.1244547Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_small_constant_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 75%]
2025-12-04T09:54:33.1246157Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_stride_with_unbacked_expr_mps SKIPPED [0.0002s] (No MPS backend available) [ 77%]
2025-12-04T09:54:33.1247657Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sympy_cpp_printer_min_max_minmax1_mps SKIPPED [0.0003s] (No MPS backend available) [ 78%]
2025-12-04T09:54:33.1249199Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_autotuning_mps SKIPPED [0.0002s] (No MPS backend available) [ 79%]
2025-12-04T09:54:33.1250801Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 81%]
2025-12-04T09:54:33.1252593Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 82%]
2025-12-04T09:54:33.1254438Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 83%]
2025-12-04T09:54:33.1256373Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 85%]
2025-12-04T09:54:33.1258241Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_with_none_input_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 86%]
2025-12-04T09:54:33.1260194Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_with_none_inputs_and_equal_to_1_arg_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 87%]
2025-12-04T09:54:33.1262086Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_next_power_of_2_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 89%]
2025-12-04T09:54:33.1263859Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_2_use_static_size_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 90%]
2025-12-04T09:54:33.1265693Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbounded_expr_substitutions_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 91%]
2025-12-04T09:54:33.1267556Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_using_model_name_for_files_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 93%]
2025-12-04T09:54:33.1269277Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_mixed_device_dynamic_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 94%]
2025-12-04T09:54:33.1270874Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_mixed_device_dynamic_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 95%]
2025-12-04T09:54:33.1272806Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_outer_buffers_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 97%]
2025-12-04T09:54:33.1274497Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_parameters_mps SKIPPED [0.0002s] (No MPS backend available) [ 98%]
2025-12-04T09:54:33.1276150Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_with_no_triton_profiler_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [100%]
2025-12-04T09:54:33.1277126Z 
2025-12-04T09:54:33.1278061Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-bb08f25297cc596b.xml -
2025-12-04T09:54:33.1279179Z ========== 16 passed, 58 skipped, 80 deselected in 115.87s (0:01:55) ===========
2025-12-04T09:54:33.1280280Z The following tests failed consistently: ['test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda']
2025-12-04T09:54:33.1281247Z 
2025-12-04T09:54:33.1281796Z FINISHED PRINTING LOG FILE of inductor/test_aot_inductor 1/6 (test/test-reports/inductor.test_aot_inductor_1.6_cf1c969272c5d084_.log)
2025-12-04T09:54:33.1282482Z 
2025-12-04T09:54:33.1282855Z Finished inductor/test_aot_inductor 1/6 ... [2025-12-04 09:54:33.018077][2501.400968487], took 9.09min
2025-12-04T09:54:33.1284145Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-f2c58a9dfc31919e.xml
2025-12-04T09:54:33.2058559Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-a793ea186f6e0edb.xml
2025-12-04T09:54:33.2385367Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-fcd1db8f24799401.xml
2025-12-04T09:54:33.2727794Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-bb08f25297cc596b.xml
2025-12-04T09:54:33.4879247Z Uploading logs for 57119749248 to S3
2025-12-04T09:54:33.5185831Z Uploading artifacts took 0.20 seconds
2025-12-04T09:54:33.5186305Z inductor/test_aot_inductor 1/6 failed!
2025-12-04T09:54:33.5190545Z Running inductor/test_aot_inductor 6/6 ... [2025-12-04 09:54:33.518876][2501.90177057]
2025-12-04T09:54:33.5191136Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:54:33.5195973Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor.py', '--shard-id=6', '--num-shards=6', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:54:33.519363]
2025-12-04T10:08:41.9506470Z 
2025-12-04T10:08:41.9507421Z PRINTING LOG FILE of inductor/test_aot_inductor 6/6 (test/test-reports/inductor.test_aot_inductor_6.6_462385258b0b1d27_.log)
2025-12-04T10:08:41.9508985Z W1204 09:54:42.970000 11895 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:08:41.9510481Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-bf15e775351f3d84.xml
2025-12-04T10:08:41.9511502Z ============================= test session starts ==============================
2025-12-04T10:08:41.9512478Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:08:41.9513339Z cachedir: .pytest_cache
2025-12-04T10:08:41.9514338Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:08:41.9515295Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:08:41.9515729Z configfile: pytest.ini
2025-12-04T10:08:41.9516620Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:08:41.9517816Z collecting ... collected 934 items
2025-12-04T10:08:41.9518403Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T10:08:41.9619919Z Running 158 items in this shard: test/inductor/test_aot_inductor.py::TestAOTInductorConfig::test_compile_standalone_cross_compile_windows_package_format, test/inductor/test_aot_inductor.py::TestAOTInductorConfig::test_compile_standalone_explicit_set, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__int_mm_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_amp_fallback_random_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_constant_tensor_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printing_model_inputs_codegen_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_profiler_enable_kernel_profile_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_profiler_enable_kernel_profile_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_autotune_with_constant_folding_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_autotuning_args_reuse_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_and_force_mmap_weights_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_mismatched_branch_output_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_symint_input_disable_one_pass_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_outer_code_before_after_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_parameters_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_constant_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_conv_freezing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_copy_non_blocking_is_pinned_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_dup_unbacked_sym_decl_with_refinement_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_empty_cat_dtype_promotion_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fallback_kernel_with_symexpr_output_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_index_put_fallback_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_index_put_with_none_index_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_mmaped_weights_on_disk_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_misc_1_max_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_mixed_device_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_non_tensor_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_on_gpu_device1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_pad_fallback_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_quantized_linear_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeated_user_defined_triton_kernel_embed_kernel_binary_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_replicate_on_devices_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_run_with_grad_enabled_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_shape_failed_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_scaled_grouped_mm_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_seq_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_so_without_weight_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_stride_with_unbacked_expr_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_sym_expr_indexing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_torchvision_transforms_functional_tensor_resize_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_dynamic_launcher_grid_infer_from_tensor_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_dynamic_shape_with_div_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_arg_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_reinterpret_view_mem_leak_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_with_none_inputs_and_equal_to_1_arg_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_2_use_static_size_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_unbacked_symint_closure_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_offset_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_profiler_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_zero_size_buffer_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_addmm_multiple_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_constant_tensor_name_collision_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printer_cpp_kernel_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_assert_tensor_meta_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_autotuning_args_reuse_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_and_force_mmap_weights_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_with_parameters_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_conv3d_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_copy_non_blocking_is_pinned_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_duplicate_constant_folding_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dynamic_cat_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_extract_constants_map_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fake_tensor_device_validation_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fp8_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_inf_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_input_codegen_with_sympy_expr_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_masked_select_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_misaligned_input_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_misaligned_input_2_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_multi_device_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_non_tensor_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_none_args_aot_codegen_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_normal_functional_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_pad_non_zero_memory_leak_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_proxy_executor_squeeze_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_repeated_calling_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_replace_unbacked_symbol_with_backed_expr_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_reuse_kernel_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_rocm_triton_autotuning_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_run_with_grad_enabled_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_device_type_failed_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_scatter_fallback_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_seq_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sym_expr_indexing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_torchvision_transforms_functional_tensor_resize_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_dynamic_shape_with_div_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_equal_to_1_float_arg_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_reinterpret_view_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_sympy_fn_like_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_upper_bound_i64_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_weight_on_disk_legacy_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_mixed_device_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_parameters_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_sym_expr_cond_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_with_profiler_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_size_buffer_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__int_mm_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_addmm_multiple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_constant_tensor_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printing_model_inputs_codegen_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_runtime_asserts_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_assert_tensor_meta_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_backward_no_op_logging_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_3_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_predicate_on_cpu_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_unbacked_symint_closure_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_use_buffers_from_outer_scope_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_parameters_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_reinterpret_view_inputs_outputs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_d2h_copy_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_cat_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fp8_view_of_param_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_index_put_fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_index_put_with_none_index_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_mmaped_weights_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_libtorch_free_so_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_misc_1_max_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_path_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_poi_multiple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_quanatized_int8_linear_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeat_interleave_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeated_calling_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_device_type_failed_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sdpa_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_shifted_constraint_ranges_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_size_from_multi_output_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_subclasses_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_symbool_item_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_symint_item_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_equal_to_1_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_sympy_expr_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_0_use_static_size_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_1_use_static_size_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_sym_expr_cond_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_unbacked_symint_closure_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_unbacked_symint_closure_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_size_buffer_mps
2025-12-04T10:08:41.9708414Z 
2025-12-04T10:08:41.9709063Z inductor/test_aot_inductor.py::TestAOTInductorConfig::test_compile_standalone_cross_compile_windows_package_format PASSED [0.0040s] [  0%]
2025-12-04T10:08:41.9710302Z inductor/test_aot_inductor.py::TestAOTInductorConfig::test_compile_standalone_explicit_set PASSED [0.0037s] [  1%]
2025-12-04T10:08:41.9711552Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__int_mm_cpu <- test/inductor/test_torchinductor.py PASSED [8.1261s] [  1%]
2025-12-04T10:08:41.9713021Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cpu SKIPPED [0.0032s] (requires GPU) [  2%]
2025-12-04T10:08:41.9714697Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_2_cpu SKIPPED [0.0032s] (requires Intel GPU) [  3%]
2025-12-04T10:08:41.9716354Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_amp_fallback_random_cpu <- test/inductor/test_torchinductor.py PASSED [5.3726s] [  3%]
2025-12-04T10:08:41.9718667Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_constant_tensor_cpu <- test/inductor/test_torchinductor.py PASSED [5.2548s] [  4%]
2025-12-04T10:08:41.9720391Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printing_model_inputs_codegen_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (requires CUDA/XPU) [  5%]
2025-12-04T10:08:41.9722058Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_profiler_enable_kernel_profile_False_cpu PASSED [6.5363s] [  5%]
2025-12-04T10:08:41.9723423Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_profiler_enable_kernel_profile_True_cpu PASSED [6.5733s] [  6%]
2025-12-04T10:08:41.9724924Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_autotune_with_constant_folding_cpu <- test/inductor/test_torchinductor.py PASSED [5.5134s] [  6%]
2025-12-04T10:08:41.9726603Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_autotuning_args_reuse_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0033s] (requires GPU) [  7%]
2025-12-04T10:08:41.9728208Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_2_cpu <- test/inductor/test_torchinductor.py PASSED [5.3971s] [  8%]
2025-12-04T10:08:41.9729627Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_and_force_mmap_weights_cpu PASSED [7.0598s] [  8%]
2025-12-04T10:08:41.9731003Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_mismatched_branch_output_dynamic_False_cpu PASSED [5.9240s] [  9%]
2025-12-04T10:08:41.9732524Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_symint_input_disable_one_pass_cpu <- test/inductor/test_torchinductor.py PASSED [5.6302s] [ 10%]
2025-12-04T10:08:41.9734835Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_outer_code_before_after_cpu <- test/inductor/test_torchinductor.py W1204 09:55:46.405000 11895 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T10:08:41.9737121Z W1204 09:55:46.406000 11895 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T10:08:41.9756384Z PASSED [5.5997s] [ 10%]
2025-12-04T10:08:41.9758291Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_parameters_cpu <- test/inductor/test_torchinductor.py W1204 09:55:52.059000 11895 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T10:08:41.9759920Z PASSED [5.9527s] [ 11%]
2025-12-04T10:08:41.9760558Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_constant_cpu PASSED [5.2606s] [ 12%]
2025-12-04T10:08:41.9761829Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_conv_freezing_cpu <- test/inductor/test_torchinductor.py PASSED [11.1031s] [ 12%]
2025-12-04T10:08:41.9763699Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_copy_non_blocking_is_pinned_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (only matters for device-to-cpu copy) [ 13%]
2025-12-04T10:08:41.9765528Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_dup_unbacked_sym_decl_with_refinement_cpu <- test/inductor/test_torchinductor.py PASSED [5.6299s] [ 13%]
2025-12-04T10:08:41.9767167Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_empty_cat_dtype_promotion_cpu <- test/inductor/test_torchinductor.py PASSED [5.2371s] [ 14%]
2025-12-04T10:08:41.9768788Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fallback_kernel_with_symexpr_output_cpu SKIPPED [0.0003s] (Some archs don't support flash SDPA) [ 15%]
2025-12-04T10:08:41.9770384Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_index_put_fallback_cpu <- test/inductor/test_torchinductor.py PASSED [5.2788s] [ 15%]
2025-12-04T10:08:41.9771987Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_index_put_with_none_index_cpu PASSED [5.5379s] [ 16%]
2025-12-04T10:08:41.9773401Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_mmaped_weights_on_disk_cpu <- test/inductor/test_torchinductor.py PASSED [13.6671s] [ 17%]
2025-12-04T10:08:41.9774802Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_misc_1_max_autotune_False_cpu PASSED [5.9795s] [ 17%]
2025-12-04T10:08:41.9776396Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_mixed_device_1_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (Mixed-device test requires GPU) [ 18%]
2025-12-04T10:08:41.9778406Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_non_tensor_input_cpu <- test/inductor/test_torchinductor.py W1204 09:56:55.603000 11895 site-packages/torch/_export/__init__.py:71] +============================+
2025-12-04T10:08:41.9779934Z W1204 09:56:55.604000 11895 site-packages/torch/_export/__init__.py:72] |     !!!   WARNING   !!!    |
2025-12-04T10:08:41.9780790Z W1204 09:56:55.604000 11895 site-packages/torch/_export/__init__.py:73] +============================+
2025-12-04T10:08:41.9782502Z W1204 09:56:55.604000 11895 site-packages/torch/_export/__init__.py:74] torch._export.aot_compile()/torch._export.aot_load() is being deprecated, please switch to directly calling torch._inductor.aoti_compile_and_package(torch.export.export())/torch._inductor.aoti_load_package() instead.
2025-12-04T10:08:41.9783981Z PASSED [17.7328s] [ 18%]
2025-12-04T10:08:41.9784826Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_on_gpu_device1_cpu SKIPPED [0.0003s] (requires multiple cuda devices) [ 19%]
2025-12-04T10:08:41.9786271Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_pad_fallback_cpu <- test/inductor/test_torchinductor.py PASSED [5.4528s] [ 20%]
2025-12-04T10:08:41.9788230Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_quantized_linear_cpu [W1204 09:57:18.298046110 QuantizedLinear.cpp:379] Warning: fbgemm_pack_gemm_matrix_fp16 is deprecated and will be removed in a future PyTorch release. (function operator())
2025-12-04T10:08:41.9790299Z [W1204 09:57:24.435700574 QuantizedLinear.cpp:415] Warning: fbgemm_linear_fp16_weight_fp32_activation is deprecated and will be removed in a future PyTorch release. (function operator())
2025-12-04T10:08:41.9791312Z PASSED [5.2632s] [ 20%]
2025-12-04T10:08:41.9792409Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeated_user_defined_triton_kernel_embed_kernel_binary_False_cpu SKIPPED [0.0032s] (requires GPU) [ 21%]
2025-12-04T10:08:41.9794019Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_replicate_on_devices_cpu SKIPPED [0.0003s] (requires multiple cuda devices) [ 22%]
2025-12-04T10:08:41.9795608Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_run_with_grad_enabled_cpu <- test/inductor/test_torchinductor.py PASSED [5.2609s] [ 22%]
2025-12-04T10:08:41.9797226Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_shape_failed_cpu Error: input_handles[0]: unmatched dim value at 1, expected: 4, but got: 8
2025-12-04T10:08:41.9798155Z 
2025-12-04T10:08:41.9798429Z Error: input_handles[0]: unmatched stride value at 1, expected: 4, but got: 1
2025-12-04T10:08:41.9798841Z 
2025-12-04T10:08:41.9799173Z Error: input_handles[0]: dim value is too large at 0, expected to be <= 1024, but got: 2048
2025-12-04T10:08:41.9799631Z 
2025-12-04T10:08:41.9799937Z Error: input_handles[0]: dim value is too large at 0, expected to be <= 1024, but got: 2048
2025-12-04T10:08:41.9800397Z 
2025-12-04T10:08:41.9800505Z PASSED [5.2302s] [ 23%]
2025-12-04T10:08:41.9801407Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_scaled_grouped_mm_cpu SKIPPED [0.0003s] (scaled_grouped_mm is only supported on SM90) [ 24%]
2025-12-04T10:08:41.9802878Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_seq_cpu <- test/inductor/test_torchinductor.py PASSED [5.3141s] [ 24%]
2025-12-04T10:08:41.9804309Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_so_without_weight_cpu <- test/inductor/test_torchinductor.py PASSED [10.7425s] [ 25%]
2025-12-04T10:08:41.9805675Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_stride_with_unbacked_expr_cpu PASSED [5.2698s] [ 25%]
2025-12-04T10:08:41.9807146Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_sym_expr_indexing_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (requires CUDA/XPU) [ 26%]
2025-12-04T10:08:41.9808930Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_torchvision_transforms_functional_tensor_resize_cpu <- test/inductor/test_torchinductor.py PASSED [6.9432s] [ 27%]
2025-12-04T10:08:41.9810629Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_dynamic_launcher_grid_infer_from_tensor_cpu SKIPPED [0.0040s] (requires GPU) [ 27%]
2025-12-04T10:08:41.9812316Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_dynamic_shape_with_div_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (requires GPU) [ 28%]
2025-12-04T10:08:41.9814116Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_arg_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0033s] (requires GPU) [ 29%]
2025-12-04T10:08:41.9815848Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_False_cpu SKIPPED [0.0030s] (requires GPU) [ 29%]
2025-12-04T10:08:41.9817614Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_False_cpu SKIPPED [0.0029s] (requires GPU) [ 30%]
2025-12-04T10:08:41.9819313Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_False_cpu SKIPPED [0.0033s] (requires GPU) [ 31%]
2025-12-04T10:08:41.9821095Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_reinterpret_view_mem_leak_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0028s] (requires GPU) [ 31%]
2025-12-04T10:08:41.9822866Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_old_cpu SKIPPED [0.0028s] (requires GPU) [ 32%]
2025-12-04T10:08:41.9824800Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_with_none_inputs_and_equal_to_1_arg_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0028s] (requires GPU) [ 32%]
2025-12-04T10:08:41.9826735Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_2_use_static_size_False_cpu SKIPPED [0.0028s] (Need triton for user-defined triton kernel) [ 33%]
2025-12-04T10:08:41.9828711Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_True_cpu SKIPPED [0.0029s] (Need triton for user-defined triton kernel) [ 34%]
2025-12-04T10:08:41.9830415Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_unbacked_symint_closure_dynamic_False_cpu PASSED [5.8309s] [ 34%]
2025-12-04T10:08:41.9831889Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_offset_cpu <- test/inductor/test_torchinductor.py PASSED [5.2391s] [ 35%]
2025-12-04T10:08:41.9833356Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_profiler_cpu <- test/inductor/test_torchinductor.py PASSED [5.2525s] [ 36%]
2025-12-04T10:08:41.9834832Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_zero_size_buffer_cpu <- test/inductor/test_torchinductor.py PASSED [5.1696s] [ 36%]
2025-12-04T10:08:41.9836355Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda ('RERUN', {'yellow': True}) [0.0362s] [ 37%]
2025-12-04T10:08:41.9837950Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda ('RERUN', {'yellow': True}) [0.0061s] [ 37%]
2025-12-04T10:08:41.9839451Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda FAILED [0.0061s] [ 37%]
2025-12-04T10:08:41.9840227Z 
2025-12-04T10:08:41.9840394Z ==================================== RERUNS ====================================
2025-12-04T10:08:41.9841107Z _ AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda _
2025-12-04T10:08:41.9841805Z Traceback (most recent call last):
2025-12-04T10:08:41.9842532Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 6893, in test__weight_int4pack_mm
2025-12-04T10:08:41.9843282Z     self.check_model(model, (a,))
2025-12-04T10:08:41.9843939Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 247, in check_model
2025-12-04T10:08:41.9844643Z     ref_model = copy.deepcopy(model)
2025-12-04T10:08:41.9845172Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 172, in deepcopy
2025-12-04T10:08:41.9845694Z     y = _reconstruct(x, memo, *rv)
2025-12-04T10:08:41.9846219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 271, in _reconstruct
2025-12-04T10:08:41.9846774Z     state = deepcopy(state, memo)
2025-12-04T10:08:41.9847278Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 146, in deepcopy
2025-12-04T10:08:41.9847782Z     y = copier(x, memo)
2025-12-04T10:08:41.9848267Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict
2025-12-04T10:08:41.9848872Z     y[deepcopy(key, memo)] = deepcopy(value, memo)
2025-12-04T10:08:41.9849415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy
2025-12-04T10:08:41.9849931Z     y = copier(memo)
2025-12-04T10:08:41.9850523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 180, in __deepcopy__
2025-12-04T10:08:41.9851256Z     new_storage = self._typed_storage()._deepcopy(memo)
2025-12-04T10:08:41.9851948Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 1139, in _deepcopy
2025-12-04T10:08:41.9852775Z     return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo))
2025-12-04T10:08:41.9853547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy
2025-12-04T10:08:41.9854053Z     y = copier(memo)
2025-12-04T10:08:41.9854643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 243, in __deepcopy__
2025-12-04T10:08:41.9855314Z     new_storage = self.clone()
2025-12-04T10:08:41.9855900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 257, in clone
2025-12-04T10:08:41.9856798Z     return type(self)(self.nbytes(), device=self.device).copy_(self)
2025-12-04T10:08:41.9857388Z torch.AcceleratorError: CUDA error: invalid device function
2025-12-04T10:08:41.9858396Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
2025-12-04T10:08:41.9859640Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-12-04T10:08:41.9860455Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2025-12-04T10:08:41.9861016Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2025-12-04T10:08:41.9861383Z 
2025-12-04T10:08:41.9861389Z 
2025-12-04T10:08:41.9861617Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:41.9862748Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda
2025-12-04T10:08:41.9863687Z 
2025-12-04T10:08:41.9863955Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:41.9864791Z _ AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda _
2025-12-04T10:08:41.9865489Z Traceback (most recent call last):
2025-12-04T10:08:41.9866192Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 6893, in test__weight_int4pack_mm
2025-12-04T10:08:41.9866935Z     self.check_model(model, (a,))
2025-12-04T10:08:41.9867606Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 247, in check_model
2025-12-04T10:08:41.9868311Z     ref_model = copy.deepcopy(model)
2025-12-04T10:08:41.9868815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 172, in deepcopy
2025-12-04T10:08:41.9869347Z     y = _reconstruct(x, memo, *rv)
2025-12-04T10:08:41.9869870Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 271, in _reconstruct
2025-12-04T10:08:41.9870415Z     state = deepcopy(state, memo)
2025-12-04T10:08:41.9870913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 146, in deepcopy
2025-12-04T10:08:41.9871650Z     y = copier(x, memo)
2025-12-04T10:08:41.9872127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict
2025-12-04T10:08:41.9872735Z     y[deepcopy(key, memo)] = deepcopy(value, memo)
2025-12-04T10:08:41.9873295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy
2025-12-04T10:08:41.9873811Z     y = copier(memo)
2025-12-04T10:08:41.9874397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 180, in __deepcopy__
2025-12-04T10:08:41.9875127Z     new_storage = self._typed_storage()._deepcopy(memo)
2025-12-04T10:08:41.9875823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 1139, in _deepcopy
2025-12-04T10:08:41.9876632Z     return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo))
2025-12-04T10:08:41.9877330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy
2025-12-04T10:08:41.9877844Z     y = copier(memo)
2025-12-04T10:08:41.9878429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 243, in __deepcopy__
2025-12-04T10:08:41.9879089Z     new_storage = self.clone()
2025-12-04T10:08:41.9879673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 257, in clone
2025-12-04T10:08:41.9880417Z     return type(self)(self.nbytes(), device=self.device).copy_(self)
2025-12-04T10:08:41.9881183Z torch.AcceleratorError: CUDA error: invalid device function
2025-12-04T10:08:41.9882189Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
2025-12-04T10:08:41.9883445Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-12-04T10:08:41.9884361Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2025-12-04T10:08:41.9884906Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2025-12-04T10:08:41.9885287Z 
2025-12-04T10:08:41.9885292Z 
2025-12-04T10:08:41.9885507Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:41.9886653Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda
2025-12-04T10:08:41.9887565Z 
2025-12-04T10:08:41.9887852Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:41.9888403Z =================================== FAILURES ===================================
2025-12-04T10:08:41.9889130Z _ AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda _
2025-12-04T10:08:41.9889824Z Traceback (most recent call last):
2025-12-04T10:08:41.9890544Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 6893, in test__weight_int4pack_mm
2025-12-04T10:08:41.9891276Z     self.check_model(model, (a,))
2025-12-04T10:08:41.9891943Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 247, in check_model
2025-12-04T10:08:41.9892644Z     ref_model = copy.deepcopy(model)
2025-12-04T10:08:41.9893152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 172, in deepcopy
2025-12-04T10:08:41.9893689Z     y = _reconstruct(x, memo, *rv)
2025-12-04T10:08:41.9894215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 271, in _reconstruct
2025-12-04T10:08:41.9894784Z     state = deepcopy(state, memo)
2025-12-04T10:08:41.9895283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 146, in deepcopy
2025-12-04T10:08:41.9895789Z     y = copier(x, memo)
2025-12-04T10:08:41.9896343Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict
2025-12-04T10:08:41.9896955Z     y[deepcopy(key, memo)] = deepcopy(value, memo)
2025-12-04T10:08:41.9897524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy
2025-12-04T10:08:41.9898032Z     y = copier(memo)
2025-12-04T10:08:41.9898631Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 180, in __deepcopy__
2025-12-04T10:08:41.9899361Z     new_storage = self._typed_storage()._deepcopy(memo)
2025-12-04T10:08:41.9900054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 1139, in _deepcopy
2025-12-04T10:08:41.9900889Z     return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo))
2025-12-04T10:08:41.9901592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy
2025-12-04T10:08:41.9902117Z     y = copier(memo)
2025-12-04T10:08:41.9902695Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 243, in __deepcopy__
2025-12-04T10:08:41.9903368Z     new_storage = self.clone()
2025-12-04T10:08:41.9903963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 257, in clone
2025-12-04T10:08:41.9904693Z     return type(self)(self.nbytes(), device=self.device).copy_(self)
2025-12-04T10:08:41.9905286Z torch.AcceleratorError: CUDA error: invalid device function
2025-12-04T10:08:41.9906282Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
2025-12-04T10:08:41.9907532Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-12-04T10:08:41.9908418Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2025-12-04T10:08:41.9908978Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2025-12-04T10:08:41.9909346Z 
2025-12-04T10:08:41.9909351Z 
2025-12-04T10:08:41.9909581Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:41.9910721Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda
2025-12-04T10:08:41.9911706Z 
2025-12-04T10:08:41.9911972Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:41.9913135Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-bf15e775351f3d84.xml -
2025-12-04T10:08:41.9914197Z =========================== short test summary info ============================
2025-12-04T10:08:41.9915438Z FAILED [0.0061s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda - torch.AcceleratorError: CUDA error: invalid device function
2025-12-04T10:08:41.9917084Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
2025-12-04T10:08:41.9918334Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-12-04T10:08:41.9919145Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2025-12-04T10:08:41.9919699Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2025-12-04T10:08:41.9920068Z 
2025-12-04T10:08:41.9920074Z 
2025-12-04T10:08:41.9920288Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:41.9921427Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda
2025-12-04T10:08:41.9922350Z 
2025-12-04T10:08:41.9922618Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:41.9923217Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:08:41.9923761Z ======== 1 failed, 35 passed, 23 skipped, 2 rerun in 219.63s (0:03:39) =========
2025-12-04T10:08:41.9924239Z Got exit code 1
2025-12-04T10:08:41.9924517Z Retrying single test...
2025-12-04T10:08:41.9925136Z W1204 09:58:35.463000 15634 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:08:41.9926291Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-cd1c50b62bb47a1b.xml
2025-12-04T10:08:41.9927168Z ============================= test session starts ==============================
2025-12-04T10:08:41.9927834Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:08:41.9928434Z cachedir: .pytest_cache
2025-12-04T10:08:41.9929152Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:08:41.9929938Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:08:41.9930293Z configfile: pytest.ini
2025-12-04T10:08:41.9931017Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:08:41.9931927Z collecting ... collected 934 items / 157 deselected / 777 selected
2025-12-04T10:08:41.9933158Z stepcurrent: skipping 58 already run items. Running only test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda
2025-12-04T10:08:41.9934270Z Running 1 items in this shard
2025-12-04T10:08:41.9934483Z 
2025-12-04T10:08:41.9935703Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda [W1204 09:58:37.753737745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:41.9937072Z 
2025-12-04T10:08:41.9937213Z ('RERUN', {'yellow': True}) [15.7512s] [100%]
2025-12-04T10:08:41.9938619Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda [W1204 09:58:53.513212722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:41.9939944Z 
2025-12-04T10:08:41.9940089Z ('RERUN', {'yellow': True}) [0.0070s] [100%]
2025-12-04T10:08:41.9941457Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda [W1204 09:58:53.520790272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:41.9942722Z 
2025-12-04T10:08:41.9942830Z FAILED [0.0054s] [100%]
2025-12-04T10:08:41.9943022Z 
2025-12-04T10:08:41.9943168Z ==================================== RERUNS ====================================
2025-12-04T10:08:41.9943888Z _ AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda _
2025-12-04T10:08:41.9944582Z Traceback (most recent call last):
2025-12-04T10:08:41.9945294Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 6893, in test__weight_int4pack_mm
2025-12-04T10:08:41.9946043Z     self.check_model(model, (a,))
2025-12-04T10:08:41.9946713Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 247, in check_model
2025-12-04T10:08:41.9947491Z     ref_model = copy.deepcopy(model)
2025-12-04T10:08:41.9948080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 172, in deepcopy
2025-12-04T10:08:41.9948617Z     y = _reconstruct(x, memo, *rv)
2025-12-04T10:08:41.9949135Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 271, in _reconstruct
2025-12-04T10:08:41.9949686Z     state = deepcopy(state, memo)
2025-12-04T10:08:41.9950188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 146, in deepcopy
2025-12-04T10:08:41.9950711Z     y = copier(x, memo)
2025-12-04T10:08:41.9951184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict
2025-12-04T10:08:41.9951790Z     y[deepcopy(key, memo)] = deepcopy(value, memo)
2025-12-04T10:08:41.9952349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy
2025-12-04T10:08:41.9952851Z     y = copier(memo)
2025-12-04T10:08:41.9953439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 180, in __deepcopy__
2025-12-04T10:08:41.9954162Z     new_storage = self._typed_storage()._deepcopy(memo)
2025-12-04T10:08:41.9954863Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 1139, in _deepcopy
2025-12-04T10:08:41.9955675Z     return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo))
2025-12-04T10:08:41.9956363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy
2025-12-04T10:08:41.9956878Z     y = copier(memo)
2025-12-04T10:08:41.9957451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 243, in __deepcopy__
2025-12-04T10:08:41.9958169Z     new_storage = self.clone()
2025-12-04T10:08:41.9958942Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 257, in clone
2025-12-04T10:08:41.9959684Z     return type(self)(self.nbytes(), device=self.device).copy_(self)
2025-12-04T10:08:41.9960261Z torch.AcceleratorError: CUDA error: invalid device function
2025-12-04T10:08:41.9961259Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
2025-12-04T10:08:41.9962513Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-12-04T10:08:41.9963459Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2025-12-04T10:08:41.9964003Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2025-12-04T10:08:41.9964386Z 
2025-12-04T10:08:41.9964998Z Exception raised from copy_device_to_device at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Copy.cu:337 (most recent call first):
2025-12-04T10:08:41.9965860Z C++ CapturedTraceback:
2025-12-04T10:08:41.9967462Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T10:08:41.9969387Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T10:08:41.9970671Z #6 c10::AcceleratorError::AcceleratorError(c10::SourceLocation, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) from :0
2025-12-04T10:08:41.9972049Z #7 c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) from ??:0
2025-12-04T10:08:41.9972839Z #8 at::native::copy_device_to_device(at::TensorIterator&, bool, bool) from ??:0
2025-12-04T10:08:41.9973555Z #9 at::native::copy_impl(at::Tensor&, at::Tensor const&, bool) [clone .isra.0] from Copy.cpp:0
2025-12-04T10:08:41.9974237Z #10 at::native::copy_(at::Tensor&, at::Tensor const&, bool) from ??:0
2025-12-04T10:08:41.9976806Z #11 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool), &torch::ADInplaceOrView::copy_>, at::Tensor&, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool> >, at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0
2025-12-04T10:08:41.9979697Z #12 torch::autograd::VariableType::(anonymous namespace)::copy_(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0
2025-12-04T10:08:41.9980681Z #13 at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) from ??:0
2025-12-04T10:08:41.9981286Z #14 at::storage_copy(c10::Storage&, c10::Storage const&, bool) from ??:0
2025-12-04T10:08:41.9981915Z #15 THPStorage_copy_(_object*, _object*, _object*) from StorageMethods.cpp:0
2025-12-04T10:08:41.9982754Z #16 method_vectorcall_VARARGS_KEYWORDS from /usr/local/src/conda/python-3.10.14/Objects/descrobject.c:344
2025-12-04T10:08:41.9983737Z #17 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:41.9984663Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:41.9985605Z #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:41.9986541Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:41.9987476Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:41.9988385Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:41.9989324Z #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:41.9990254Z #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:41.9991182Z #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:41.9992094Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:41.9993219Z #27 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:41.9994150Z #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:41.9995076Z #29 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:41.9996082Z #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:41.9997009Z #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:41.9997938Z #32 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:41.9998866Z #33 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:41.9999777Z #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0000571Z #35 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0001362Z #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0002273Z #37 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0003201Z #38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0004131Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0005056Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0005855Z #41 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0006556Z #42 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0007339Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0008156Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0008842Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0009624Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0010441Z #47 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0011125Z #48 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0011908Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0012838Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0013767Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0014535Z #52 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0015316Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0016243Z #54 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0017249Z #55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0018172Z #56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0019108Z #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0020045Z #58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0020983Z #59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0021980Z #60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0022913Z #61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0023730Z #62 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0024422Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0025363Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0026240Z #65 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0027051Z #66 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0027789Z #67 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0028541Z #68 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0029401Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0030324Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0031237Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0032170Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0032952Z #73 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0033730Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0034642Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0035574Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0036501Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0037411Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0038279Z #79 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0039085Z #80 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0039835Z #81 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0040538Z #82 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:08:42.0041214Z #83 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0041999Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0042933Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0043849Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0044781Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0045702Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0046472Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0047250Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0048175Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0049096Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0050072Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0050997Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0051780Z #95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0052562Z #96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0053541Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0054469Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0055413Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0056427Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0057318Z #101 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0058144Z #102 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0058914Z #103 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0059666Z #104 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0060551Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0061498Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0062298Z #107 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0063084Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0064033Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0064988Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0065936Z #111 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0066869Z #112 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0067770Z #113 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0068594Z #114 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0069356Z #115 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0070099Z #116 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0071205Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0072310Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0073242Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0074185Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0075132Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0076081Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0076863Z #123 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0077651Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0078593Z #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0079719Z #126 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0080651Z #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0081595Z #128 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0082480Z #129 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0083390Z #130 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0084141Z #131 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0084906Z #132 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0085784Z #133 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0086719Z #134 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0087663Z #135 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0088602Z #136 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0089545Z #137 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0090481Z #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0091418Z #139 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0092356Z #140 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0093294Z #141 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0094222Z #142 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0095042Z #143 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T10:08:42.0095789Z #144 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T10:08:42.0096600Z #145 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T10:08:42.0097307Z #146 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T10:08:42.0098103Z #147 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T10:08:42.0098937Z #148 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T10:08:42.0099696Z #149 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T10:08:42.0100422Z #150 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T10:08:42.0101119Z #151 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T10:08:42.0101740Z #152 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T10:08:42.0102172Z #153 _start from ??:0
2025-12-04T10:08:42.0102478Z #154 <unwind unsupported> from ??:0
2025-12-04T10:08:42.0102715Z 
2025-12-04T10:08:42.0102720Z 
2025-12-04T10:08:42.0102949Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.0104104Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda
2025-12-04T10:08:42.0105017Z 
2025-12-04T10:08:42.0105288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.0105931Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.0107498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py:257: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.0109012Z   return type(self)(self.nbytes(), device=self.device).copy_(self)
2025-12-04T10:08:42.0109792Z _ AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda _
2025-12-04T10:08:42.0110491Z Traceback (most recent call last):
2025-12-04T10:08:42.0111280Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 6893, in test__weight_int4pack_mm
2025-12-04T10:08:42.0112006Z     self.check_model(model, (a,))
2025-12-04T10:08:42.0112671Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 247, in check_model
2025-12-04T10:08:42.0113369Z     ref_model = copy.deepcopy(model)
2025-12-04T10:08:42.0113889Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 172, in deepcopy
2025-12-04T10:08:42.0114407Z     y = _reconstruct(x, memo, *rv)
2025-12-04T10:08:42.0114938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 271, in _reconstruct
2025-12-04T10:08:42.0115487Z     state = deepcopy(state, memo)
2025-12-04T10:08:42.0115975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 146, in deepcopy
2025-12-04T10:08:42.0116492Z     y = copier(x, memo)
2025-12-04T10:08:42.0116979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict
2025-12-04T10:08:42.0117584Z     y[deepcopy(key, memo)] = deepcopy(value, memo)
2025-12-04T10:08:42.0118129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy
2025-12-04T10:08:42.0118638Z     y = copier(memo)
2025-12-04T10:08:42.0119232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 180, in __deepcopy__
2025-12-04T10:08:42.0119945Z     new_storage = self._typed_storage()._deepcopy(memo)
2025-12-04T10:08:42.0120650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 1139, in _deepcopy
2025-12-04T10:08:42.0121481Z     return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo))
2025-12-04T10:08:42.0122175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy
2025-12-04T10:08:42.0122677Z     y = copier(memo)
2025-12-04T10:08:42.0123265Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 243, in __deepcopy__
2025-12-04T10:08:42.0123944Z     new_storage = self.clone()
2025-12-04T10:08:42.0124518Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 257, in clone
2025-12-04T10:08:42.0125260Z     return type(self)(self.nbytes(), device=self.device).copy_(self)
2025-12-04T10:08:42.0125845Z torch.AcceleratorError: CUDA error: invalid device function
2025-12-04T10:08:42.0126844Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
2025-12-04T10:08:42.0128092Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-12-04T10:08:42.0128900Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2025-12-04T10:08:42.0129456Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2025-12-04T10:08:42.0129824Z 
2025-12-04T10:08:42.0130433Z Exception raised from copy_device_to_device at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Copy.cu:337 (most recent call first):
2025-12-04T10:08:42.0131300Z C++ CapturedTraceback:
2025-12-04T10:08:42.0132808Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T10:08:42.0134734Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T10:08:42.0136094Z #6 c10::AcceleratorError::AcceleratorError(c10::SourceLocation, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) from :0
2025-12-04T10:08:42.0137332Z #7 c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) from ??:0
2025-12-04T10:08:42.0138122Z #8 at::native::copy_device_to_device(at::TensorIterator&, bool, bool) from ??:0
2025-12-04T10:08:42.0138912Z #9 at::native::copy_impl(at::Tensor&, at::Tensor const&, bool) [clone .isra.0] from Copy.cpp:0
2025-12-04T10:08:42.0139585Z #10 at::native::copy_(at::Tensor&, at::Tensor const&, bool) from ??:0
2025-12-04T10:08:42.0142078Z #11 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool), &torch::ADInplaceOrView::copy_>, at::Tensor&, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool> >, at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0
2025-12-04T10:08:42.0144962Z #12 torch::autograd::VariableType::(anonymous namespace)::copy_(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0
2025-12-04T10:08:42.0145949Z #13 at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) from ??:0
2025-12-04T10:08:42.0146556Z #14 at::storage_copy(c10::Storage&, c10::Storage const&, bool) from ??:0
2025-12-04T10:08:42.0147202Z #15 THPStorage_copy_(_object*, _object*, _object*) from StorageMethods.cpp:0
2025-12-04T10:08:42.0148027Z #16 method_vectorcall_VARARGS_KEYWORDS from /usr/local/src/conda/python-3.10.14/Objects/descrobject.c:344
2025-12-04T10:08:42.0149009Z #17 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0149947Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0150873Z #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0151804Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0152730Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0153662Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0154579Z #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0155503Z #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0156433Z #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0157363Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0158274Z #27 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0159198Z #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0160121Z #29 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0161049Z #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0161962Z #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0162891Z #32 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0163822Z #33 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0164825Z #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0165601Z #35 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0166391Z #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0167320Z #37 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0168333Z #38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0169262Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0170191Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0171190Z #41 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0171884Z #42 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0172670Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0173485Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0174186Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0174950Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0175771Z #47 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0176548Z #48 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0177320Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0178257Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0179190Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0179976Z #52 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0180745Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0181680Z #54 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0182614Z #55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0183540Z #56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0184450Z #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0185375Z #58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0186306Z #59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0187228Z #60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0188135Z #61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0188948Z #62 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0189652Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0190420Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0191288Z #65 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0192097Z #66 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0192847Z #67 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0193729Z #68 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0194591Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0195520Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0196453Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0197455Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0198238Z #73 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0199018Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0199929Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0200861Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0201786Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0202708Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0203567Z #79 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0204380Z #80 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0205134Z #81 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0205856Z #82 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:08:42.0206524Z #83 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0207312Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0208251Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0209178Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0210095Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0211024Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0211803Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0212572Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0213496Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0214422Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0215344Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0216319Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0217105Z #95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0217902Z #96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0218843Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0219760Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0220693Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0221633Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0222616Z #101 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0223425Z #102 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0224199Z #103 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0224966Z #104 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0225892Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0226842Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0227636Z #107 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0228440Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0229384Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0230331Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0231280Z #111 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0232223Z #112 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0233096Z #113 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0233912Z #114 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0234679Z #115 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0235435Z #116 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0236296Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0237239Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0238187Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0239110Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0240058Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0240996Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0241785Z #123 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0242574Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0243518Z #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0244459Z #126 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0245403Z #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0246329Z #128 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0247216Z #129 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0248034Z #130 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0248793Z #131 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0249540Z #132 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0250414Z #133 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0251428Z #134 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0252358Z #135 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0253295Z #136 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0254235Z #137 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0255239Z #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0256167Z #139 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0257177Z #140 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0258126Z #141 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0259079Z #142 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0259892Z #143 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T10:08:42.0260652Z #144 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T10:08:42.0261384Z #145 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T10:08:42.0262094Z #146 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T10:08:42.0262870Z #147 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T10:08:42.0263704Z #148 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T10:08:42.0264478Z #149 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T10:08:42.0265190Z #150 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T10:08:42.0265877Z #151 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T10:08:42.0266494Z #152 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T10:08:42.0266937Z #153 _start from ??:0
2025-12-04T10:08:42.0267233Z #154 <unwind unsupported> from ??:0
2025-12-04T10:08:42.0267482Z 
2025-12-04T10:08:42.0267487Z 
2025-12-04T10:08:42.0267708Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.0268853Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda
2025-12-04T10:08:42.0269767Z 
2025-12-04T10:08:42.0270048Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.0270673Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.0272329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py:257: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.0273820Z   return type(self)(self.nbytes(), device=self.device).copy_(self)
2025-12-04T10:08:42.0274338Z =================================== FAILURES ===================================
2025-12-04T10:08:42.0275060Z _ AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda _
2025-12-04T10:08:42.0275756Z Traceback (most recent call last):
2025-12-04T10:08:42.0276471Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 6893, in test__weight_int4pack_mm
2025-12-04T10:08:42.0277200Z     self.check_model(model, (a,))
2025-12-04T10:08:42.0277866Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 247, in check_model
2025-12-04T10:08:42.0278561Z     ref_model = copy.deepcopy(model)
2025-12-04T10:08:42.0279217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 172, in deepcopy
2025-12-04T10:08:42.0279737Z     y = _reconstruct(x, memo, *rv)
2025-12-04T10:08:42.0280264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 271, in _reconstruct
2025-12-04T10:08:42.0280818Z     state = deepcopy(state, memo)
2025-12-04T10:08:42.0281305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 146, in deepcopy
2025-12-04T10:08:42.0281956Z     y = copier(x, memo)
2025-12-04T10:08:42.0282447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict
2025-12-04T10:08:42.0283043Z     y[deepcopy(key, memo)] = deepcopy(value, memo)
2025-12-04T10:08:42.0283603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy
2025-12-04T10:08:42.0284117Z     y = copier(memo)
2025-12-04T10:08:42.0284709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 180, in __deepcopy__
2025-12-04T10:08:42.0285432Z     new_storage = self._typed_storage()._deepcopy(memo)
2025-12-04T10:08:42.0286135Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 1139, in _deepcopy
2025-12-04T10:08:42.0286958Z     return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo))
2025-12-04T10:08:42.0287649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy
2025-12-04T10:08:42.0288145Z     y = copier(memo)
2025-12-04T10:08:42.0288734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 243, in __deepcopy__
2025-12-04T10:08:42.0289409Z     new_storage = self.clone()
2025-12-04T10:08:42.0289984Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 257, in clone
2025-12-04T10:08:42.0290725Z     return type(self)(self.nbytes(), device=self.device).copy_(self)
2025-12-04T10:08:42.0291310Z torch.AcceleratorError: CUDA error: invalid device function
2025-12-04T10:08:42.0292298Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
2025-12-04T10:08:42.0293553Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-12-04T10:08:42.0294358Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2025-12-04T10:08:42.0294917Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2025-12-04T10:08:42.0295289Z 
2025-12-04T10:08:42.0295901Z Exception raised from copy_device_to_device at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Copy.cu:337 (most recent call first):
2025-12-04T10:08:42.0296831Z C++ CapturedTraceback:
2025-12-04T10:08:42.0298354Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T10:08:42.0300288Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T10:08:42.0301573Z #6 c10::AcceleratorError::AcceleratorError(c10::SourceLocation, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) from :0
2025-12-04T10:08:42.0302729Z #7 c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) from ??:0
2025-12-04T10:08:42.0303533Z #8 at::native::copy_device_to_device(at::TensorIterator&, bool, bool) from ??:0
2025-12-04T10:08:42.0304258Z #9 at::native::copy_impl(at::Tensor&, at::Tensor const&, bool) [clone .isra.0] from Copy.cpp:0
2025-12-04T10:08:42.0304933Z #10 at::native::copy_(at::Tensor&, at::Tensor const&, bool) from ??:0
2025-12-04T10:08:42.0307499Z #11 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool), &torch::ADInplaceOrView::copy_>, at::Tensor&, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool> >, at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0
2025-12-04T10:08:42.0310369Z #12 torch::autograd::VariableType::(anonymous namespace)::copy_(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0
2025-12-04T10:08:42.0311412Z #13 at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) from ??:0
2025-12-04T10:08:42.0312010Z #14 at::storage_copy(c10::Storage&, c10::Storage const&, bool) from ??:0
2025-12-04T10:08:42.0312654Z #15 THPStorage_copy_(_object*, _object*, _object*) from StorageMethods.cpp:0
2025-12-04T10:08:42.0313474Z #16 method_vectorcall_VARARGS_KEYWORDS from /usr/local/src/conda/python-3.10.14/Objects/descrobject.c:344
2025-12-04T10:08:42.0314455Z #17 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0315387Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0316310Z #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0317247Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0318180Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0319106Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0320019Z #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0320945Z #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0321874Z #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0322795Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0323712Z #27 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0324645Z #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0325576Z #29 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0326499Z #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0327413Z #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0328337Z #32 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0329276Z #33 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0330190Z #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0330976Z #35 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0331760Z #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0332694Z #37 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0333607Z #38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0334534Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0335461Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0336427Z #41 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0337124Z #42 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0337904Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0338722Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0339490Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0340259Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0341073Z #47 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0341777Z #48 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0342543Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0343480Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0344411Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0345190Z #52 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0345952Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0346883Z #54 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0347809Z #55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0348734Z #56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0349643Z #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0350575Z #58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0351494Z #59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0352413Z #60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0353320Z #61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0354142Z #62 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0354840Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0355608Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0356481Z #65 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0357287Z #66 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0358035Z #67 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0358763Z #68 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0359623Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0360554Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0361478Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0362386Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0363165Z #73 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0363941Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0364919Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0365849Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0366777Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0367701Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0368630Z #79 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0369438Z #80 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0370186Z #81 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0370902Z #82 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:08:42.0371723Z #83 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0372503Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0373432Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0374358Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0375276Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0376201Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0377086Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0377854Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0378793Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0379734Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0380666Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0381587Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0382377Z #95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0383155Z #96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0384085Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0385000Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0385940Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0386881Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0387770Z #101 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0388583Z #102 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0389357Z #103 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0390121Z #104 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0390988Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0391938Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0392741Z #107 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0393713Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0394651Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0395599Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0396541Z #111 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0397565Z #112 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0398437Z #113 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0399256Z #114 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0400021Z #115 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0400771Z #116 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0401644Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0402589Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0403531Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0404463Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0405405Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0406343Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0407137Z #123 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0407922Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0408868Z #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0409809Z #126 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0410751Z #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0411687Z #128 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0412570Z #129 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0413390Z #130 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0414149Z #131 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0414906Z #132 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0415776Z #133 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0416786Z #134 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0417716Z #135 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0418669Z #136 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0419616Z #137 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0420566Z #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0421498Z #139 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0422514Z #140 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0423461Z #141 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0424401Z #142 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0425215Z #143 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T10:08:42.0426039Z #144 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T10:08:42.0426778Z #145 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T10:08:42.0427472Z #146 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T10:08:42.0428262Z #147 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T10:08:42.0429095Z #148 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T10:08:42.0429875Z #149 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T10:08:42.0430583Z #150 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T10:08:42.0431271Z #151 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T10:08:42.0431887Z #152 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T10:08:42.0432333Z #153 _start from ??:0
2025-12-04T10:08:42.0432631Z #154 <unwind unsupported> from ??:0
2025-12-04T10:08:42.0432878Z 
2025-12-04T10:08:42.0432883Z 
2025-12-04T10:08:42.0433101Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.0434259Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda
2025-12-04T10:08:42.0435316Z 
2025-12-04T10:08:42.0435597Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.0436226Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.0437709Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py:257: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.0439195Z   return type(self)(self.nbytes(), device=self.device).copy_(self)
2025-12-04T10:08:42.0440319Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-cd1c50b62bb47a1b.xml -
2025-12-04T10:08:42.0441368Z =========================== short test summary info ============================
2025-12-04T10:08:42.0442605Z FAILED [0.0054s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda - torch.AcceleratorError: CUDA error: invalid device function
2025-12-04T10:08:42.0444272Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
2025-12-04T10:08:42.0445526Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-12-04T10:08:42.0446319Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2025-12-04T10:08:42.0446872Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2025-12-04T10:08:42.0447245Z 
2025-12-04T10:08:42.0447871Z Exception raised from copy_device_to_device at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Copy.cu:337 (most recent call first):
2025-12-04T10:08:42.0448734Z C++ CapturedTraceback:
2025-12-04T10:08:42.0450333Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T10:08:42.0452268Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T10:08:42.0453556Z #6 c10::AcceleratorError::AcceleratorError(c10::SourceLocation, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) from :0
2025-12-04T10:08:42.0454796Z #7 c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) from ??:0
2025-12-04T10:08:42.0455571Z #8 at::native::copy_device_to_device(at::TensorIterator&, bool, bool) from ??:0
2025-12-04T10:08:42.0456353Z #9 at::native::copy_impl(at::Tensor&, at::Tensor const&, bool) [clone .isra.0] from Copy.cpp:0
2025-12-04T10:08:42.0457033Z #10 at::native::copy_(at::Tensor&, at::Tensor const&, bool) from ??:0
2025-12-04T10:08:42.0459536Z #11 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool), &torch::ADInplaceOrView::copy_>, at::Tensor&, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool> >, at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0
2025-12-04T10:08:42.0462403Z #12 torch::autograd::VariableType::(anonymous namespace)::copy_(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0
2025-12-04T10:08:42.0463360Z #13 at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) from ??:0
2025-12-04T10:08:42.0463960Z #14 at::storage_copy(c10::Storage&, c10::Storage const&, bool) from ??:0
2025-12-04T10:08:42.0464598Z #15 THPStorage_copy_(_object*, _object*, _object*) from StorageMethods.cpp:0
2025-12-04T10:08:42.0465427Z #16 method_vectorcall_VARARGS_KEYWORDS from /usr/local/src/conda/python-3.10.14/Objects/descrobject.c:344
2025-12-04T10:08:42.0466396Z #17 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0467334Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0468266Z #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0469204Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0470120Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0471253Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0472191Z #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0473115Z #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0474048Z #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0474978Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0475908Z #27 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0476832Z #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0477771Z #29 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0478700Z #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0479633Z #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0480785Z #32 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0481720Z #33 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0482661Z #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0483453Z #35 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0484304Z #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0485232Z #37 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0486165Z #38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0487093Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0488012Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0488827Z #41 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0489527Z #42 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0490294Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0491108Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0491815Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0492595Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0493397Z #47 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0494100Z #48 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0494881Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0495821Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0496805Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0497591Z #52 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0498386Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0499303Z #54 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0500235Z #55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0501158Z #56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0502080Z #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0502994Z #58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0503920Z #59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0504844Z #60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0505776Z #61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0506576Z #62 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0524741Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0525644Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0526521Z #65 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0527553Z #66 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0528309Z #67 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0529040Z #68 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0529900Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0530921Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0531836Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0532748Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0533515Z #73 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0534285Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0535195Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0536099Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0537096Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0538011Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0538859Z #79 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0539643Z #80 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0540373Z #81 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0541061Z #82 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:08:42.0541726Z #83 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0542486Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0543393Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0544289Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0545197Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0546101Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0546863Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0547620Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0548531Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0549443Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0550349Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0551247Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0552014Z #95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0552771Z #96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0553673Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0554581Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0555557Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0556479Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0557337Z #101 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0558138Z #102 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0558941Z #103 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0559679Z #104 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0560535Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0561456Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0562233Z #107 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0563017Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0563943Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0564882Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0565817Z #111 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0566740Z #112 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0567607Z #113 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0568424Z #114 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0569188Z #115 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0569941Z #116 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0570813Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0571972Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0572919Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0573853Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0574797Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0575742Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0576618Z #123 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0577414Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0578361Z #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0579310Z #126 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0580240Z #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0581189Z #128 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0582073Z #129 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0582888Z #130 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0583640Z #131 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0584552Z #132 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0585429Z #133 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0586374Z #134 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0587302Z #135 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0588333Z #136 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0589279Z #137 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0590225Z #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0591153Z #139 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0592106Z #140 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0593049Z #141 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0593979Z #142 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0594804Z #143 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T10:08:42.0595563Z #144 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T10:08:42.0596304Z #145 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T10:08:42.0596997Z #146 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T10:08:42.0597792Z #147 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T10:08:42.0598627Z #148 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T10:08:42.0599398Z #149 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T10:08:42.0600106Z #150 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T10:08:42.0600800Z #151 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T10:08:42.0601416Z #152 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T10:08:42.0601855Z #153 _start from ??:0
2025-12-04T10:08:42.0602167Z #154 <unwind unsupported> from ??:0
2025-12-04T10:08:42.0602407Z 
2025-12-04T10:08:42.0602426Z 
2025-12-04T10:08:42.0602648Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.0603802Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda
2025-12-04T10:08:42.0604723Z 
2025-12-04T10:08:42.0605001Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.0605602Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:08:42.0606138Z ================= 1 failed, 157 deselected, 2 rerun in 15.85s ==================
2025-12-04T10:08:42.0606585Z Got exit code 1
2025-12-04T10:08:42.0606874Z Retrying single test...
2025-12-04T10:08:42.0607520Z W1204 09:59:04.172000 15755 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:08:42.0608668Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-3e5313e420476f15.xml
2025-12-04T10:08:42.0609546Z ============================= test session starts ==============================
2025-12-04T10:08:42.0610219Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:08:42.0610831Z cachedir: .pytest_cache
2025-12-04T10:08:42.0611607Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:08:42.0612397Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:08:42.0612760Z configfile: pytest.ini
2025-12-04T10:08:42.0613486Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:08:42.0614398Z collecting ... collected 934 items / 157 deselected / 777 selected
2025-12-04T10:08:42.0615732Z stepcurrent: skipping 58 already run items. Running only test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda
2025-12-04T10:08:42.0616924Z Running 1 items in this shard
2025-12-04T10:08:42.0617139Z 
2025-12-04T10:08:42.0618284Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda [W1204 09:59:06.480206481 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.0619565Z 
2025-12-04T10:08:42.0619705Z ('RERUN', {'yellow': True}) [15.9940s] [100%]
2025-12-04T10:08:42.0621101Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda [W1204 09:59:22.482459322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.0622371Z 
2025-12-04T10:08:42.0622506Z ('RERUN', {'yellow': True}) [0.0070s] [100%]
2025-12-04T10:08:42.0623890Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda [W1204 09:59:22.490028905 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.0625145Z 
2025-12-04T10:08:42.0625250Z FAILED [0.0054s] [100%]
2025-12-04T10:08:42.0625442Z 
2025-12-04T10:08:42.0625597Z ==================================== RERUNS ====================================
2025-12-04T10:08:42.0626317Z _ AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda _
2025-12-04T10:08:42.0627008Z Traceback (most recent call last):
2025-12-04T10:08:42.0627713Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 6893, in test__weight_int4pack_mm
2025-12-04T10:08:42.0628451Z     self.check_model(model, (a,))
2025-12-04T10:08:42.0629117Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 247, in check_model
2025-12-04T10:08:42.0629802Z     ref_model = copy.deepcopy(model)
2025-12-04T10:08:42.0630323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 172, in deepcopy
2025-12-04T10:08:42.0630857Z     y = _reconstruct(x, memo, *rv)
2025-12-04T10:08:42.0631381Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 271, in _reconstruct
2025-12-04T10:08:42.0631922Z     state = deepcopy(state, memo)
2025-12-04T10:08:42.0632427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 146, in deepcopy
2025-12-04T10:08:42.0632949Z     y = copier(x, memo)
2025-12-04T10:08:42.0633427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict
2025-12-04T10:08:42.0634036Z     y[deepcopy(key, memo)] = deepcopy(value, memo)
2025-12-04T10:08:42.0634599Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy
2025-12-04T10:08:42.0635116Z     y = copier(memo)
2025-12-04T10:08:42.0635699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 180, in __deepcopy__
2025-12-04T10:08:42.0636434Z     new_storage = self._typed_storage()._deepcopy(memo)
2025-12-04T10:08:42.0637137Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 1139, in _deepcopy
2025-12-04T10:08:42.0637948Z     return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo))
2025-12-04T10:08:42.0638736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy
2025-12-04T10:08:42.0639253Z     y = copier(memo)
2025-12-04T10:08:42.0639842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 243, in __deepcopy__
2025-12-04T10:08:42.0640502Z     new_storage = self.clone()
2025-12-04T10:08:42.0641093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 257, in clone
2025-12-04T10:08:42.0641920Z     return type(self)(self.nbytes(), device=self.device).copy_(self)
2025-12-04T10:08:42.0642499Z torch.AcceleratorError: CUDA error: invalid device function
2025-12-04T10:08:42.0643501Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
2025-12-04T10:08:42.0644763Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-12-04T10:08:42.0645580Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2025-12-04T10:08:42.0646134Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2025-12-04T10:08:42.0646517Z 
2025-12-04T10:08:42.0647130Z Exception raised from copy_device_to_device at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Copy.cu:337 (most recent call first):
2025-12-04T10:08:42.0647995Z C++ CapturedTraceback:
2025-12-04T10:08:42.0649506Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T10:08:42.0651413Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T10:08:42.0652688Z #6 c10::AcceleratorError::AcceleratorError(c10::SourceLocation, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) from :0
2025-12-04T10:08:42.0653860Z #7 c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) from ??:0
2025-12-04T10:08:42.0654639Z #8 at::native::copy_device_to_device(at::TensorIterator&, bool, bool) from ??:0
2025-12-04T10:08:42.0655342Z #9 at::native::copy_impl(at::Tensor&, at::Tensor const&, bool) [clone .isra.0] from Copy.cpp:0
2025-12-04T10:08:42.0656017Z #10 at::native::copy_(at::Tensor&, at::Tensor const&, bool) from ??:0
2025-12-04T10:08:42.0658605Z #11 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool), &torch::ADInplaceOrView::copy_>, at::Tensor&, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool> >, at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0
2025-12-04T10:08:42.0661479Z #12 torch::autograd::VariableType::(anonymous namespace)::copy_(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0
2025-12-04T10:08:42.0662453Z #13 at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) from ??:0
2025-12-04T10:08:42.0663035Z #14 at::storage_copy(c10::Storage&, c10::Storage const&, bool) from ??:0
2025-12-04T10:08:42.0663322Z #15 THPStorage_copy_(_object*, _object*, _object*) from StorageMethods.cpp:0
2025-12-04T10:08:42.0663747Z #16 method_vectorcall_VARARGS_KEYWORDS from /usr/local/src/conda/python-3.10.14/Objects/descrobject.c:344
2025-12-04T10:08:42.0664171Z #17 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0664545Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0664953Z #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0665412Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0665818Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0666204Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0666672Z #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0667045Z #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0667465Z #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0667837Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0668252Z #27 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0668628Z #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0669032Z #29 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0669412Z #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0669825Z #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0670194Z #32 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0670614Z #33 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0671181Z #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0671462Z #35 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0671838Z #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0672243Z #37 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0672628Z #38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0673037Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0673420Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0673715Z #41 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0673974Z #42 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0674360Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0674659Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0674919Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0675308Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0675599Z #47 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0675873Z #48 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0676249Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0676654Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0677037Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0677296Z #52 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0677809Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0678217Z #54 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0678589Z #55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0679012Z #56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0679463Z #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0679882Z #58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0680256Z #59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0680661Z #60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0681049Z #61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0681341Z #62 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0681617Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0681986Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0682352Z #65 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0682659Z #66 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0682955Z #67 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0683271Z #68 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0683678Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0684066Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0684472Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0684842Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0685124Z #73 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0685503Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0685920Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0686289Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0686703Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0687091Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0687442Z #79 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0687760Z #80 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0688065Z #81 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0688336Z #82 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:08:42.0688611Z #83 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0688982Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0689391Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0689846Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0690254Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0690641Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0690905Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0691356Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0691776Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0692146Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0692560Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0692936Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0693193Z #95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0693578Z #96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0693980Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0694367Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0694772Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0695157Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0695527Z #101 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0695842Z #102 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0696146Z #103 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0696530Z #104 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0696947Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0697348Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0697616Z #107 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0697994Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0698423Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0698801Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0699229Z #111 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0699607Z #112 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0699961Z #113 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0700291Z #114 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0700587Z #115 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0700893Z #116 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0701323Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0701699Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0702201Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0702585Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0702998Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0703388Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0703710Z #123 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0704097Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0704508Z #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0704883Z #126 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0705311Z #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0705689Z #128 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0706060Z #129 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0706371Z #130 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0706674Z #131 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0706994Z #132 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0707406Z #133 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0707781Z #134 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0708209Z #135 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0708585Z #136 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0709011Z #137 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0709390Z #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0709804Z #139 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0710196Z #140 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0710608Z #141 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0710995Z #142 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0711289Z #143 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T10:08:42.0711596Z #144 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T10:08:42.0711875Z #145 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T10:08:42.0712163Z #146 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T10:08:42.0712532Z #147 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T10:08:42.0712857Z #148 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T10:08:42.0713148Z #149 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T10:08:42.0713428Z #150 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T10:08:42.0713697Z #151 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T10:08:42.0713957Z #152 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T10:08:42.0714075Z #153 _start from ??:0
2025-12-04T10:08:42.0714200Z #154 <unwind unsupported> from ??:0
2025-12-04T10:08:42.0714206Z 
2025-12-04T10:08:42.0714211Z 
2025-12-04T10:08:42.0714443Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.0715237Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda
2025-12-04T10:08:42.0715306Z 
2025-12-04T10:08:42.0715577Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.0715817Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.0716936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py:257: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.0717184Z   return type(self)(self.nbytes(), device=self.device).copy_(self)
2025-12-04T10:08:42.0717609Z _ AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda _
2025-12-04T10:08:42.0717734Z Traceback (most recent call last):
2025-12-04T10:08:42.0718224Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 6893, in test__weight_int4pack_mm
2025-12-04T10:08:42.0718349Z     self.check_model(model, (a,))
2025-12-04T10:08:42.0718793Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 247, in check_model
2025-12-04T10:08:42.0718920Z     ref_model = copy.deepcopy(model)
2025-12-04T10:08:42.0719191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 172, in deepcopy
2025-12-04T10:08:42.0719321Z     y = _reconstruct(x, memo, *rv)
2025-12-04T10:08:42.0719610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 271, in _reconstruct
2025-12-04T10:08:42.0719733Z     state = deepcopy(state, memo)
2025-12-04T10:08:42.0720010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 146, in deepcopy
2025-12-04T10:08:42.0720116Z     y = copier(x, memo)
2025-12-04T10:08:42.0720424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict
2025-12-04T10:08:42.0720582Z     y[deepcopy(key, memo)] = deepcopy(value, memo)
2025-12-04T10:08:42.0720846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy
2025-12-04T10:08:42.0720964Z     y = copier(memo)
2025-12-04T10:08:42.0721370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 180, in __deepcopy__
2025-12-04T10:08:42.0721540Z     new_storage = self._typed_storage()._deepcopy(memo)
2025-12-04T10:08:42.0721944Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 1139, in _deepcopy
2025-12-04T10:08:42.0722228Z     return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo))
2025-12-04T10:08:42.0722510Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy
2025-12-04T10:08:42.0722615Z     y = copier(memo)
2025-12-04T10:08:42.0723024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 243, in __deepcopy__
2025-12-04T10:08:42.0723158Z     new_storage = self.clone()
2025-12-04T10:08:42.0723523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 257, in clone
2025-12-04T10:08:42.0723753Z     return type(self)(self.nbytes(), device=self.device).copy_(self)
2025-12-04T10:08:42.0723987Z torch.AcceleratorError: CUDA error: invalid device function
2025-12-04T10:08:42.0724633Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
2025-12-04T10:08:42.0725122Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-12-04T10:08:42.0725379Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2025-12-04T10:08:42.0725617Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2025-12-04T10:08:42.0725623Z 
2025-12-04T10:08:42.0726244Z Exception raised from copy_device_to_device at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Copy.cu:337 (most recent call first):
2025-12-04T10:08:42.0726358Z C++ CapturedTraceback:
2025-12-04T10:08:42.0727686Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T10:08:42.0728237Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T10:08:42.0728895Z #6 c10::AcceleratorError::AcceleratorError(c10::SourceLocation, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) from :0
2025-12-04T10:08:42.0729277Z #7 c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) from ??:0
2025-12-04T10:08:42.0729543Z #8 at::native::copy_device_to_device(at::TensorIterator&, bool, bool) from ??:0
2025-12-04T10:08:42.0729857Z #9 at::native::copy_impl(at::Tensor&, at::Tensor const&, bool) [clone .isra.0] from Copy.cpp:0
2025-12-04T10:08:42.0730076Z #10 at::native::copy_(at::Tensor&, at::Tensor const&, bool) from ??:0
2025-12-04T10:08:42.0732208Z #11 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool), &torch::ADInplaceOrView::copy_>, at::Tensor&, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool> >, at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0
2025-12-04T10:08:42.0732827Z #12 torch::autograd::VariableType::(anonymous namespace)::copy_(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0
2025-12-04T10:08:42.0733046Z #13 at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) from ??:0
2025-12-04T10:08:42.0733288Z #14 at::storage_copy(c10::Storage&, c10::Storage const&, bool) from ??:0
2025-12-04T10:08:42.0733554Z #15 THPStorage_copy_(_object*, _object*, _object*) from StorageMethods.cpp:0
2025-12-04T10:08:42.0733990Z #16 method_vectorcall_VARARGS_KEYWORDS from /usr/local/src/conda/python-3.10.14/Objects/descrobject.c:344
2025-12-04T10:08:42.0734399Z #17 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0734775Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0735198Z #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0735570Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0735974Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0736424Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0736835Z #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0737221Z #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0737627Z #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0737996Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0738507Z #27 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0738878Z #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0739291Z #29 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0739657Z #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0740122Z #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0740505Z #32 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0740909Z #33 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0741297Z #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0741566Z #35 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0741937Z #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0742356Z #37 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0742725Z #38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0743145Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0743513Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0743809Z #41 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0744085Z #42 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0744456Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0744747Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0745019Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0745391Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0745693Z #47 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0745950Z #48 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0746322Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0746738Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0747107Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0747381Z #52 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0747749Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0748154Z #54 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0748530Z #55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0748938Z #56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0749306Z #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0749722Z #58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0750092Z #59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0750571Z #60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0750947Z #61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0751238Z #62 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0751515Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0751945Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0752306Z #65 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0752611Z #66 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0752907Z #67 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0753228Z #68 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0753640Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0754024Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0754434Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0754802Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0755079Z #73 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0755451Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0755851Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0756237Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0756644Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0757024Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0757377Z #79 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0757684Z #80 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0757998Z #81 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0758266Z #82 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:08:42.0758541Z #83 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0758913Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0759324Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0759707Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0760115Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0760497Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0760761Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0761132Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0761549Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0761920Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0762323Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0762765Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0763025Z #95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0763404Z #96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0763809Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0764236Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0764650Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0765034Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0765403Z #101 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0765718Z #102 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0766018Z #103 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0766340Z #104 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0766752Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0767143Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0767410Z #107 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0767789Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0768213Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0768596Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0769009Z #111 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0769401Z #112 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0769759Z #113 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0770089Z #114 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0770389Z #115 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0770699Z #116 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0771302Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0771684Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0772108Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0772481Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0772892Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0773285Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0773547Z #123 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0773923Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0774349Z #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0774867Z #126 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0775296Z #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0775673Z #128 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0776029Z #129 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0776508Z #130 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0776809Z #131 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0777129Z #132 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0777545Z #133 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0777922Z #134 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0778356Z #135 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0778734Z #136 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0779158Z #137 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0779540Z #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0779946Z #139 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0780333Z #140 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0780746Z #141 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0781134Z #142 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0781427Z #143 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T10:08:42.0781737Z #144 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T10:08:42.0782017Z #145 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T10:08:42.0782304Z #146 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T10:08:42.0782663Z #147 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T10:08:42.0783005Z #148 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T10:08:42.0783300Z #149 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T10:08:42.0783586Z #150 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T10:08:42.0783855Z #151 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T10:08:42.0784059Z #152 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T10:08:42.0784177Z #153 _start from ??:0
2025-12-04T10:08:42.0784302Z #154 <unwind unsupported> from ??:0
2025-12-04T10:08:42.0784308Z 
2025-12-04T10:08:42.0784313Z 
2025-12-04T10:08:42.0784545Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.0785336Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda
2025-12-04T10:08:42.0785346Z 
2025-12-04T10:08:42.0785617Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.0785860Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.0787147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py:257: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.0787401Z   return type(self)(self.nbytes(), device=self.device).copy_(self)
2025-12-04T10:08:42.0787554Z =================================== FAILURES ===================================
2025-12-04T10:08:42.0787984Z _ AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda _
2025-12-04T10:08:42.0788186Z Traceback (most recent call last):
2025-12-04T10:08:42.0788667Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 6893, in test__weight_int4pack_mm
2025-12-04T10:08:42.0788790Z     self.check_model(model, (a,))
2025-12-04T10:08:42.0789234Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 247, in check_model
2025-12-04T10:08:42.0789363Z     ref_model = copy.deepcopy(model)
2025-12-04T10:08:42.0789653Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 172, in deepcopy
2025-12-04T10:08:42.0789773Z     y = _reconstruct(x, memo, *rv)
2025-12-04T10:08:42.0790069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 271, in _reconstruct
2025-12-04T10:08:42.0790206Z     state = deepcopy(state, memo)
2025-12-04T10:08:42.0790473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 146, in deepcopy
2025-12-04T10:08:42.0790579Z     y = copier(x, memo)
2025-12-04T10:08:42.0790890Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict
2025-12-04T10:08:42.0791052Z     y[deepcopy(key, memo)] = deepcopy(value, memo)
2025-12-04T10:08:42.0791329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy
2025-12-04T10:08:42.0791433Z     y = copier(memo)
2025-12-04T10:08:42.0791844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 180, in __deepcopy__
2025-12-04T10:08:42.0792028Z     new_storage = self._typed_storage()._deepcopy(memo)
2025-12-04T10:08:42.0792425Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 1139, in _deepcopy
2025-12-04T10:08:42.0792708Z     return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo))
2025-12-04T10:08:42.0792985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy
2025-12-04T10:08:42.0793085Z     y = copier(memo)
2025-12-04T10:08:42.0793507Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 243, in __deepcopy__
2025-12-04T10:08:42.0793631Z     new_storage = self.clone()
2025-12-04T10:08:42.0793998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 257, in clone
2025-12-04T10:08:42.0794240Z     return type(self)(self.nbytes(), device=self.device).copy_(self)
2025-12-04T10:08:42.0794464Z torch.AcceleratorError: CUDA error: invalid device function
2025-12-04T10:08:42.0795112Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
2025-12-04T10:08:42.0795607Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-12-04T10:08:42.0795793Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2025-12-04T10:08:42.0796044Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2025-12-04T10:08:42.0796049Z 
2025-12-04T10:08:42.0796660Z Exception raised from copy_device_to_device at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Copy.cu:337 (most recent call first):
2025-12-04T10:08:42.0796777Z C++ CapturedTraceback:
2025-12-04T10:08:42.0798110Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T10:08:42.0798659Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T10:08:42.0799329Z #6 c10::AcceleratorError::AcceleratorError(c10::SourceLocation, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) from :0
2025-12-04T10:08:42.0799700Z #7 c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) from ??:0
2025-12-04T10:08:42.0799976Z #8 at::native::copy_device_to_device(at::TensorIterator&, bool, bool) from ??:0
2025-12-04T10:08:42.0800344Z #9 at::native::copy_impl(at::Tensor&, at::Tensor const&, bool) [clone .isra.0] from Copy.cpp:0
2025-12-04T10:08:42.0800562Z #10 at::native::copy_(at::Tensor&, at::Tensor const&, bool) from ??:0
2025-12-04T10:08:42.0802837Z #11 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool), &torch::ADInplaceOrView::copy_>, at::Tensor&, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool> >, at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0
2025-12-04T10:08:42.0803450Z #12 torch::autograd::VariableType::(anonymous namespace)::copy_(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0
2025-12-04T10:08:42.0803686Z #13 at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) from ??:0
2025-12-04T10:08:42.0803921Z #14 at::storage_copy(c10::Storage&, c10::Storage const&, bool) from ??:0
2025-12-04T10:08:42.0804184Z #15 THPStorage_copy_(_object*, _object*, _object*) from StorageMethods.cpp:0
2025-12-04T10:08:42.0804620Z #16 method_vectorcall_VARARGS_KEYWORDS from /usr/local/src/conda/python-3.10.14/Objects/descrobject.c:344
2025-12-04T10:08:42.0805032Z #17 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0805426Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0805839Z #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0806214Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0806636Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0807014Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0807433Z #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0807804Z #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0808211Z #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0808605Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0809007Z #27 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0809392Z #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0809800Z #29 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0810176Z #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0810593Z #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0810966Z #32 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0811372Z #33 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0811834Z #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0812102Z #35 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0812482Z #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0812891Z #37 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0813319Z #38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0813735Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0814105Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0814409Z #41 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0814673Z #42 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0815042Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0815346Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0815605Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0815982Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0816379Z #47 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0816643Z #48 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0817037Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0817445Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0817820Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0818095Z #52 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0818466Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0818887Z #54 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0819262Z #55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0819671Z #56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0820057Z #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0820464Z #58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0820853Z #59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0821260Z #60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0821633Z #61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0821941Z #62 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0822207Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0822578Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0822948Z #65 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0823256Z #66 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0823645Z #67 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0823955Z #68 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0824364Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0824753Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0825240Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0825627Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0825888Z #73 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0826260Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0826686Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0827063Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0827484Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0827856Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0828210Z #79 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0828529Z #80 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0828828Z #81 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0829100Z #82 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:08:42.0829375Z #83 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0829755Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0830172Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0830541Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0830949Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0831336Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0831597Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0831978Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0832382Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0832754Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0833170Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0833538Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0833809Z #95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0834183Z #96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0834585Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0834969Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0835375Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0835832Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0836203Z #101 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0836513Z #102 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0836827Z #103 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0837203Z #104 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0837622Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0838021Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0838287Z #107 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0838683Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0839103Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0839479Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0840071Z #111 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0840452Z #112 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0840826Z #113 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0841141Z #114 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0841440Z #115 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0841763Z #116 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0842184Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0842567Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0842997Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0843379Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0843815Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0844195Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0844466Z #123 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0844858Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0845277Z #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0845664Z #126 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0846078Z #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0846454Z #128 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0846827Z #129 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0847136Z #130 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0847446Z #131 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0847758Z #132 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0848252Z #133 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0848647Z #134 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0849058Z #135 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0849434Z #136 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0849930Z #137 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0850309Z #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0850908Z #139 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0851290Z #140 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0851711Z #141 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0852106Z #142 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0852402Z #143 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T10:08:42.0852730Z #144 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T10:08:42.0853008Z #145 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T10:08:42.0853298Z #146 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T10:08:42.0853670Z #147 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T10:08:42.0854000Z #148 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T10:08:42.0854293Z #149 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T10:08:42.0854584Z #150 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T10:08:42.0854854Z #151 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T10:08:42.0855064Z #152 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T10:08:42.0855170Z #153 _start from ??:0
2025-12-04T10:08:42.0855295Z #154 <unwind unsupported> from ??:0
2025-12-04T10:08:42.0855306Z 
2025-12-04T10:08:42.0855315Z 
2025-12-04T10:08:42.0855552Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.0856418Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda
2025-12-04T10:08:42.0856425Z 
2025-12-04T10:08:42.0856711Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.0856942Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.0858066Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py:257: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.0858313Z   return type(self)(self.nbytes(), device=self.device).copy_(self)
2025-12-04T10:08:42.0859055Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-3e5313e420476f15.xml -
2025-12-04T10:08:42.0859253Z =========================== short test summary info ============================
2025-12-04T10:08:42.0860166Z FAILED [0.0054s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda - torch.AcceleratorError: CUDA error: invalid device function
2025-12-04T10:08:42.0860811Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
2025-12-04T10:08:42.0861392Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-12-04T10:08:42.0861583Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2025-12-04T10:08:42.0861835Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2025-12-04T10:08:42.0861841Z 
2025-12-04T10:08:42.0862451Z Exception raised from copy_device_to_device at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Copy.cu:337 (most recent call first):
2025-12-04T10:08:42.0862626Z C++ CapturedTraceback:
2025-12-04T10:08:42.0863958Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T10:08:42.0864450Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T10:08:42.0865116Z #6 c10::AcceleratorError::AcceleratorError(c10::SourceLocation, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) from :0
2025-12-04T10:08:42.0865484Z #7 c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) from ??:0
2025-12-04T10:08:42.0865769Z #8 at::native::copy_device_to_device(at::TensorIterator&, bool, bool) from ??:0
2025-12-04T10:08:42.0866071Z #9 at::native::copy_impl(at::Tensor&, at::Tensor const&, bool) [clone .isra.0] from Copy.cpp:0
2025-12-04T10:08:42.0866288Z #10 at::native::copy_(at::Tensor&, at::Tensor const&, bool) from ??:0
2025-12-04T10:08:42.0868433Z #11 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool), &torch::ADInplaceOrView::copy_>, at::Tensor&, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool> >, at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0
2025-12-04T10:08:42.0869039Z #12 torch::autograd::VariableType::(anonymous namespace)::copy_(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0
2025-12-04T10:08:42.0869274Z #13 at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) from ??:0
2025-12-04T10:08:42.0869510Z #14 at::storage_copy(c10::Storage&, c10::Storage const&, bool) from ??:0
2025-12-04T10:08:42.0869777Z #15 THPStorage_copy_(_object*, _object*, _object*) from StorageMethods.cpp:0
2025-12-04T10:08:42.0870213Z #16 method_vectorcall_VARARGS_KEYWORDS from /usr/local/src/conda/python-3.10.14/Objects/descrobject.c:344
2025-12-04T10:08:42.0870630Z #17 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0871189Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0871599Z #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0871971Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0872395Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0872765Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0873182Z #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0873551Z #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0874089Z #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0874478Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0874880Z #27 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0875264Z #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0875786Z #29 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0876154Z #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0876576Z #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0876945Z #32 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0877368Z #33 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0877740Z #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0878009Z #35 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0878395Z #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0878805Z #37 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0879172Z #38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0879590Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0879963Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0880273Z #41 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0880541Z #42 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0880916Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0881221Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0881482Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0881870Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0882162Z #47 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0882422Z #48 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0882811Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0883224Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0883594Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0883867Z #52 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0884237Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0884660Z #54 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0885030Z #55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0885436Z #56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0885820Z #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0886291Z #58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0886674Z #59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0887080Z #60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0887451Z #61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0887818Z #62 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:08:42.0888080Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0888464Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0888815Z #65 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0889124Z #66 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0889437Z #67 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0889741Z #68 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0890153Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0890544Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0890957Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0891340Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0891602Z #73 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0891973Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0892398Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0892771Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0893194Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0893567Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0893923Z #79 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0894248Z #80 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0894547Z #81 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0894832Z #82 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:08:42.0895093Z #83 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0895470Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0895892Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0896325Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0896737Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0897128Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0897388Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0897774Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0898184Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0898631Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0899052Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0899423Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0899694Z #95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0900130Z #96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0900536Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0900922Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0901327Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0901729Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0902089Z #101 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0902404Z #102 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0902720Z #103 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0903035Z #104 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0903451Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0903845Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0904111Z #107 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0904510Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0904924Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0905301Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0905728Z #111 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0906110Z #112 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0906479Z #113 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0906792Z #114 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0907096Z #115 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0907420Z #116 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0907835Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0908214Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0908637Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0909019Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0909443Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0909824Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0910091Z #123 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:08:42.0910485Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0910968Z #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0911359Z #126 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0911771Z #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0912208Z #128 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0912580Z #129 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:08:42.0912893Z #130 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:08:42.0913210Z #131 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:08:42.0913523Z #132 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:08:42.0913942Z #133 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:08:42.0914339Z #134 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0914754Z #135 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0915131Z #136 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0915566Z #137 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0915944Z #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0916371Z #139 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0916752Z #140 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0917169Z #141 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:08:42.0917563Z #142 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:08:42.0917858Z #143 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T10:08:42.0918179Z #144 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T10:08:42.0918458Z #145 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T10:08:42.0918746Z #146 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T10:08:42.0919119Z #147 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T10:08:42.0919446Z #148 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T10:08:42.0919752Z #149 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T10:08:42.0920029Z #150 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T10:08:42.0920298Z #151 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T10:08:42.0920510Z #152 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T10:08:42.0920614Z #153 _start from ??:0
2025-12-04T10:08:42.0920739Z #154 <unwind unsupported> from ??:0
2025-12-04T10:08:42.0920749Z 
2025-12-04T10:08:42.0920754Z 
2025-12-04T10:08:42.0920993Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.0921784Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda
2025-12-04T10:08:42.0921790Z 
2025-12-04T10:08:42.0922070Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.0922349Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:08:42.0922556Z ================= 1 failed, 157 deselected, 2 rerun in 16.09s ==================
2025-12-04T10:08:42.0922674Z Got exit code 1
2025-12-04T10:08:42.0923382Z FAILED CONSISTENTLY: test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda
2025-12-04T10:08:42.0923807Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:08:42.0924316Z W1204 09:59:32.768000 15876 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:08:42.0924880Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-b23b654b51890d24.xml
2025-12-04T10:08:42.0925064Z ============================= test session starts ==============================
2025-12-04T10:08:42.0925422Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:08:42.0925551Z cachedir: .pytest_cache
2025-12-04T10:08:42.0926074Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:08:42.0926201Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:08:42.0926325Z configfile: pytest.ini
2025-12-04T10:08:42.0926870Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:08:42.0927103Z collecting ... collected 934 items / 59 deselected / 875 selected
2025-12-04T10:08:42.0927262Z stepcurrent: skipping 59 already run items.
2025-12-04T10:08:42.0927382Z Running 99 items in this shard
2025-12-04T10:08:42.0927388Z 
2025-12-04T10:08:42.0928246Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_1_cuda SKIPPED [0.0040s] (requires Intel GPU) [  1%]
2025-12-04T10:08:42.0929079Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_addmm_multiple_dynamic_cuda SKIPPED [0.0003s] (Skipping triton backend only since not big GPU (not enough SM)) [  2%]
2025-12-04T10:08:42.0929858Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_constant_tensor_name_collision_cuda <- test/inductor/test_torchinductor.py PASSED [9.9357s] [  3%]
2025-12-04T10:08:42.0930721Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printer_cpp_kernel_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (cpu test case only) [  4%]
2025-12-04T10:08:42.0931409Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_assert_tensor_meta_cuda <- test/inductor/test_torchinductor.py PASSED [5.9687s] [  5%]
2025-12-04T10:08:42.0932125Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_autotuning_args_reuse_cuda <- test/inductor/test_torchinductor.py PASSED [8.1062s] [  6%]
2025-12-04T10:08:42.0932811Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_1_cuda <- test/inductor/test_torchinductor.py PASSED [6.1250s] [  7%]
2025-12-04T10:08:42.0933528Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_and_force_mmap_weights_cuda SKIPPED [0.0031s] (Test for x86 backend) [  8%]
2025-12-04T10:08:42.0934889Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_with_parameters_cuda <- test/inductor/test_torchinductor.py W1204 10:00:05.791000 15876 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T10:08:42.0935354Z W1204 10:00:06.500000 15876 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:08:42.0935475Z PASSED [8.1015s] [  9%]
2025-12-04T10:08:42.0936122Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_conv3d_cuda SKIPPED [0.0035s] (requires modern GPU to run max-autotune) [ 10%]
2025-12-04T10:08:42.0938356Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_copy_non_blocking_is_pinned_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0006s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/164858 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 11%]
2025-12-04T10:08:42.0940106Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda ('RERUN', {'yellow': True}) [13.2476s] [ 12%]
2025-12-04T10:08:42.0940701Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda ('RERUN', {'yellow': True}) [11.7770s] [ 12%]
2025-12-04T10:08:42.0941191Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda FAILED [11.8502s] [ 12%]
2025-12-04T10:08:42.0941197Z 
2025-12-04T10:08:42.0941348Z ==================================== RERUNS ====================================
2025-12-04T10:08:42.0941673Z __________ AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda ___________
2025-12-04T10:08:42.0941798Z Traceback (most recent call last):
2025-12-04T10:08:42.0942272Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 873, in test_deconv_freezing
2025-12-04T10:08:42.0942462Z     self.check_model(Model(self.device), example_inputs)
2025-12-04T10:08:42.0942897Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T10:08:42.0943033Z     actual = AOTIRunnerUtil.run(
2025-12-04T10:08:42.0943420Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T10:08:42.0943564Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T10:08:42.0943981Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T10:08:42.0944187Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T10:08:42.0944731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T10:08:42.0944864Z     return aot_inductor_minifier_wrapper(
2025-12-04T10:08:42.0945409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.0945521Z     raise e
2025-12-04T10:08:42.0946058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.0946170Z     return func(
2025-12-04T10:08:42.0946717Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T10:08:42.0946948Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T10:08:42.0947423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T10:08:42.0947542Z     return compile_fx_aot(
2025-12-04T10:08:42.0948034Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T10:08:42.0948174Z     compiled_artifacts = compile_fx(
2025-12-04T10:08:42.0948644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T10:08:42.0948768Z     return compile_fx(
2025-12-04T10:08:42.0949235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T10:08:42.0949372Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T10:08:42.0949953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T10:08:42.0950069Z     return _compile_fx_main(
2025-12-04T10:08:42.0950643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T10:08:42.0950860Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T10:08:42.0951384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2098, in fw_compiler_freezing
2025-12-04T10:08:42.0951585Z     optimized_function = inner_compile(
2025-12-04T10:08:42.0951868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T10:08:42.0951983Z     return func(*args, **kwds)
2025-12-04T10:08:42.0952492Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T10:08:42.0952758Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T10:08:42.0953271Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T10:08:42.0953448Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.0953949Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T10:08:42.0954155Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:08:42.0954658Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T10:08:42.0954810Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:08:42.0955354Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:08:42.0955676Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:08:42.0956210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T10:08:42.0956341Z     _check_triton_bf16_support(graph)
2025-12-04T10:08:42.0956892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T10:08:42.0957027Z     warn_and_skip(node.get_device())
2025-12-04T10:08:42.0957510Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T10:08:42.0957669Z     raise SkipFrame("BF16 is not supported")
2025-12-04T10:08:42.0957926Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.0957933Z 
2025-12-04T10:08:42.0958150Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.0958758Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda
2025-12-04T10:08:42.0958764Z 
2025-12-04T10:08:42.0959033Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.0959272Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.0959381Z unimplemented []
2025-12-04T10:08:42.0959539Z stats [('calls_captured', 3), ('unique_graphs', 3)]
2025-12-04T10:08:42.0959892Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)]
2025-12-04T10:08:42.0959995Z graph_break []
2025-12-04T10:08:42.0960220Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.0960969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.0961076Z   warnings.warn(
2025-12-04T10:08:42.0961489Z __________ AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda ___________
2025-12-04T10:08:42.0961677Z Traceback (most recent call last):
2025-12-04T10:08:42.0962215Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 873, in test_deconv_freezing
2025-12-04T10:08:42.0962415Z     self.check_model(Model(self.device), example_inputs)
2025-12-04T10:08:42.0962850Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T10:08:42.0962974Z     actual = AOTIRunnerUtil.run(
2025-12-04T10:08:42.0963376Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T10:08:42.0963584Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T10:08:42.0964005Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T10:08:42.0964211Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T10:08:42.0964738Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T10:08:42.0964885Z     return aot_inductor_minifier_wrapper(
2025-12-04T10:08:42.0965433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.0965532Z     raise e
2025-12-04T10:08:42.0966087Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.0966188Z     return func(
2025-12-04T10:08:42.0966756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T10:08:42.0966997Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T10:08:42.0967460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T10:08:42.0967592Z     return compile_fx_aot(
2025-12-04T10:08:42.0968087Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T10:08:42.0968214Z     compiled_artifacts = compile_fx(
2025-12-04T10:08:42.0968703Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T10:08:42.0968812Z     return compile_fx(
2025-12-04T10:08:42.0969296Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T10:08:42.0969434Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T10:08:42.0970008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T10:08:42.0970142Z     return _compile_fx_main(
2025-12-04T10:08:42.0970645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T10:08:42.0970863Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T10:08:42.0971565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2098, in fw_compiler_freezing
2025-12-04T10:08:42.0971696Z     optimized_function = inner_compile(
2025-12-04T10:08:42.0971991Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T10:08:42.0972138Z     return func(*args, **kwds)
2025-12-04T10:08:42.0972767Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T10:08:42.0973056Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T10:08:42.0973549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T10:08:42.0973738Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.0974240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T10:08:42.0974430Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:08:42.0975129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T10:08:42.0975279Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:08:42.0975823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:08:42.0976144Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:08:42.0976927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T10:08:42.0977071Z     _check_triton_bf16_support(graph)
2025-12-04T10:08:42.0977617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T10:08:42.0977739Z     warn_and_skip(node.get_device())
2025-12-04T10:08:42.0978243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T10:08:42.0978385Z     raise SkipFrame("BF16 is not supported")
2025-12-04T10:08:42.0978655Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.0978661Z 
2025-12-04T10:08:42.0978878Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.0979472Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda
2025-12-04T10:08:42.0979483Z 
2025-12-04T10:08:42.0979768Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.0979993Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.0980169Z unimplemented []
2025-12-04T10:08:42.0980379Z stats [('calls_captured', 3), ('unique_graphs', 3)]
2025-12-04T10:08:42.0980712Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)]
2025-12-04T10:08:42.0980837Z graph_break []
2025-12-04T10:08:42.0981056Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.0981790Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.0981910Z   warnings.warn(
2025-12-04T10:08:42.0982131Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.0982250Z unimplemented []
2025-12-04T10:08:42.0982410Z stats [('calls_captured', 3), ('unique_graphs', 3)]
2025-12-04T10:08:42.0982742Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)]
2025-12-04T10:08:42.0982856Z graph_break []
2025-12-04T10:08:42.0983072Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.0983806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.0983922Z   warnings.warn(
2025-12-04T10:08:42.0984071Z =================================== FAILURES ===================================
2025-12-04T10:08:42.0984395Z __________ AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda ___________
2025-12-04T10:08:42.0984519Z Traceback (most recent call last):
2025-12-04T10:08:42.0984981Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 873, in test_deconv_freezing
2025-12-04T10:08:42.0985179Z     self.check_model(Model(self.device), example_inputs)
2025-12-04T10:08:42.0985612Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T10:08:42.0985733Z     actual = AOTIRunnerUtil.run(
2025-12-04T10:08:42.0986132Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T10:08:42.0986275Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T10:08:42.0986773Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T10:08:42.0986977Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T10:08:42.0987499Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T10:08:42.0987705Z     return aot_inductor_minifier_wrapper(
2025-12-04T10:08:42.0988248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.0988345Z     raise e
2025-12-04T10:08:42.0988899Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.0988998Z     return func(
2025-12-04T10:08:42.0989560Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T10:08:42.0989800Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T10:08:42.0990257Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T10:08:42.0990386Z     return compile_fx_aot(
2025-12-04T10:08:42.0990880Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T10:08:42.0991024Z     compiled_artifacts = compile_fx(
2025-12-04T10:08:42.0991494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T10:08:42.0991601Z     return compile_fx(
2025-12-04T10:08:42.0992084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T10:08:42.0992219Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T10:08:42.0992794Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T10:08:42.0992923Z     return _compile_fx_main(
2025-12-04T10:08:42.0993430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T10:08:42.0993644Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T10:08:42.0994171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2098, in fw_compiler_freezing
2025-12-04T10:08:42.0994298Z     optimized_function = inner_compile(
2025-12-04T10:08:42.0994590Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T10:08:42.0994705Z     return func(*args, **kwds)
2025-12-04T10:08:42.0995199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T10:08:42.0995481Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T10:08:42.0995975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T10:08:42.0996161Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.0996663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T10:08:42.0996859Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:08:42.0997374Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T10:08:42.0997522Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:08:42.0998068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:08:42.0998390Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:08:42.0998976Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T10:08:42.0999118Z     _check_triton_bf16_support(graph)
2025-12-04T10:08:42.0999664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T10:08:42.0999787Z     warn_and_skip(node.get_device())
2025-12-04T10:08:42.1000344Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T10:08:42.1000489Z     raise SkipFrame("BF16 is not supported")
2025-12-04T10:08:42.1000759Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1000766Z 
2025-12-04T10:08:42.1000985Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1001577Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda
2025-12-04T10:08:42.1001599Z 
2025-12-04T10:08:42.1001869Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1002091Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1002211Z unimplemented []
2025-12-04T10:08:42.1002368Z stats [('calls_captured', 3), ('unique_graphs', 3)]
2025-12-04T10:08:42.1002702Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)]
2025-12-04T10:08:42.1002817Z graph_break []
2025-12-04T10:08:42.1003035Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1003764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1003879Z   warnings.warn(
2025-12-04T10:08:42.1004095Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1004215Z unimplemented []
2025-12-04T10:08:42.1004373Z stats [('calls_captured', 3), ('unique_graphs', 3)]
2025-12-04T10:08:42.1004699Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)]
2025-12-04T10:08:42.1004812Z graph_break []
2025-12-04T10:08:42.1005027Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1005755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1005870Z   warnings.warn(
2025-12-04T10:08:42.1006083Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1006201Z unimplemented []
2025-12-04T10:08:42.1006358Z stats [('calls_captured', 3), ('unique_graphs', 3)]
2025-12-04T10:08:42.1006685Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)]
2025-12-04T10:08:42.1006807Z graph_break []
2025-12-04T10:08:42.1007023Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1007751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1007866Z   warnings.warn(
2025-12-04T10:08:42.1008608Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-b23b654b51890d24.xml -
2025-12-04T10:08:42.1008800Z =========================== short test summary info ============================
2025-12-04T10:08:42.1009609Z FAILED [11.8502s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1009615Z 
2025-12-04T10:08:42.1009836Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1010511Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda
2025-12-04T10:08:42.1010518Z 
2025-12-04T10:08:42.1010786Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1010980Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:08:42.1011310Z == 1 failed, 5 passed, 6 skipped, 59 deselected, 2 rerun in 75.23s (0:01:15) ===
2025-12-04T10:08:42.1011412Z Got exit code 1
2025-12-04T10:08:42.1011535Z Retrying single test...
2025-12-04T10:08:42.1011981Z W1204 10:01:02.102000 17907 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:08:42.1012559Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-2e7c8f13f7be0603.xml
2025-12-04T10:08:42.1012727Z ============================= test session starts ==============================
2025-12-04T10:08:42.1013085Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:08:42.1013209Z cachedir: .pytest_cache
2025-12-04T10:08:42.1013734Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:08:42.1013861Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:08:42.1013988Z configfile: pytest.ini
2025-12-04T10:08:42.1014529Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:08:42.1014774Z collecting ... collected 934 items / 157 deselected / 777 selected
2025-12-04T10:08:42.1015446Z stepcurrent: skipping 70 already run items. Running only test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda
2025-12-04T10:08:42.1015562Z Running 1 items in this shard
2025-12-04T10:08:42.1015567Z 
2025-12-04T10:08:42.1016667Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda [W1204 10:01:06.686889056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1016675Z 
2025-12-04T10:08:42.1017193Z [W1204 10:01:22.458104453 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1017203Z 
2025-12-04T10:08:42.1017730Z [W1204 10:01:22.461371808 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1017735Z 
2025-12-04T10:08:42.1018199Z W1204 10:01:22.133000 17907 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:08:42.1018730Z [W1204 10:01:29.177153013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1018735Z 
2025-12-04T10:08:42.1019251Z [W1204 10:01:29.177692163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1019256Z 
2025-12-04T10:08:42.1019786Z [W1204 10:01:29.177873901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1019792Z 
2025-12-04T10:08:42.1020303Z [W1204 10:01:29.182719305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1020313Z 
2025-12-04T10:08:42.1020825Z [W1204 10:01:29.186747597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1020829Z 
2025-12-04T10:08:42.1021354Z [W1204 10:01:36.205092467 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1021358Z 
2025-12-04T10:08:42.1021933Z [W1204 10:01:36.207015370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1021938Z 
2025-12-04T10:08:42.1022466Z [W1204 10:01:36.209392177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1022470Z 
2025-12-04T10:08:42.1022608Z ('RERUN', {'yellow': True}) [32.8516s] [100%]
2025-12-04T10:08:42.1023673Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda [W1204 10:01:36.381944046 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1023679Z 
2025-12-04T10:08:42.1024193Z [W1204 10:01:36.383730334 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1024199Z 
2025-12-04T10:08:42.1024729Z [W1204 10:01:36.386050108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1024734Z 
2025-12-04T10:08:42.1025243Z [W1204 10:01:42.884198898 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1025248Z 
2025-12-04T10:08:42.1025758Z [W1204 10:01:42.884665417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1025781Z 
2025-12-04T10:08:42.1026293Z [W1204 10:01:42.884838468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1026297Z 
2025-12-04T10:08:42.1026807Z [W1204 10:01:42.888045367 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1026811Z 
2025-12-04T10:08:42.1027341Z [W1204 10:01:42.891762381 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1027345Z 
2025-12-04T10:08:42.1027859Z [W1204 10:01:48.974715488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1027864Z 
2025-12-04T10:08:42.1028390Z [W1204 10:01:48.976593061 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1028401Z 
2025-12-04T10:08:42.1028915Z [W1204 10:01:48.978919971 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1028920Z 
2025-12-04T10:08:42.1029067Z ('RERUN', {'yellow': True}) [11.7288s] [100%]
2025-12-04T10:08:42.1030068Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda [W1204 10:01:48.109639308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1030074Z 
2025-12-04T10:08:42.1030594Z [W1204 10:01:48.111433956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1030613Z 
2025-12-04T10:08:42.1031123Z [W1204 10:01:48.113999686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1031128Z 
2025-12-04T10:08:42.1031644Z [W1204 10:01:54.509384790 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1031649Z 
2025-12-04T10:08:42.1032179Z [W1204 10:01:54.509814465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1032185Z 
2025-12-04T10:08:42.1032697Z [W1204 10:01:54.509969178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1032702Z 
2025-12-04T10:08:42.1033299Z [W1204 10:01:54.513143703 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1033305Z 
2025-12-04T10:08:42.1033818Z [W1204 10:01:54.516781924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1033823Z 
2025-12-04T10:08:42.1034350Z [W1204 10:02:00.564663468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1034456Z 
2025-12-04T10:08:42.1034965Z [W1204 10:02:00.566543683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1034970Z 
2025-12-04T10:08:42.1035494Z [W1204 10:02:00.568903701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1035499Z 
2025-12-04T10:08:42.1035610Z FAILED [11.5884s] [100%]
2025-12-04T10:08:42.1035620Z 
2025-12-04T10:08:42.1035766Z ==================================== RERUNS ====================================
2025-12-04T10:08:42.1036095Z __________ AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda ___________
2025-12-04T10:08:42.1036224Z Traceback (most recent call last):
2025-12-04T10:08:42.1036686Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 873, in test_deconv_freezing
2025-12-04T10:08:42.1036894Z     self.check_model(Model(self.device), example_inputs)
2025-12-04T10:08:42.1037328Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T10:08:42.1037469Z     actual = AOTIRunnerUtil.run(
2025-12-04T10:08:42.1037856Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T10:08:42.1037998Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T10:08:42.1038425Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T10:08:42.1038626Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T10:08:42.1039164Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T10:08:42.1039295Z     return aot_inductor_minifier_wrapper(
2025-12-04T10:08:42.1039838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1039953Z     raise e
2025-12-04T10:08:42.1040490Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1040586Z     return func(
2025-12-04T10:08:42.1041144Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T10:08:42.1041377Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T10:08:42.1041852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T10:08:42.1041967Z     return compile_fx_aot(
2025-12-04T10:08:42.1042459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T10:08:42.1042596Z     compiled_artifacts = compile_fx(
2025-12-04T10:08:42.1043071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T10:08:42.1043175Z     return compile_fx(
2025-12-04T10:08:42.1043652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T10:08:42.1043791Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T10:08:42.1044376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T10:08:42.1044554Z     return _compile_fx_main(
2025-12-04T10:08:42.1045058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T10:08:42.1045275Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T10:08:42.1045801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2098, in fw_compiler_freezing
2025-12-04T10:08:42.1046005Z     optimized_function = inner_compile(
2025-12-04T10:08:42.1046286Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T10:08:42.1046401Z     return func(*args, **kwds)
2025-12-04T10:08:42.1046910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T10:08:42.1047177Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T10:08:42.1047673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T10:08:42.1047862Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1048363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T10:08:42.1048572Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:08:42.1049073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T10:08:42.1049226Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:08:42.1049768Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:08:42.1050090Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:08:42.1050629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T10:08:42.1050757Z     _check_triton_bf16_support(graph)
2025-12-04T10:08:42.1051306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T10:08:42.1051445Z     warn_and_skip(node.get_device())
2025-12-04T10:08:42.1051926Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T10:08:42.1052072Z     raise SkipFrame("BF16 is not supported")
2025-12-04T10:08:42.1052337Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1052343Z 
2025-12-04T10:08:42.1052560Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1053158Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda
2025-12-04T10:08:42.1053163Z 
2025-12-04T10:08:42.1053436Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1053658Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1053777Z unimplemented []
2025-12-04T10:08:42.1053936Z stats [('calls_captured', 3), ('unique_graphs', 3)]
2025-12-04T10:08:42.1054277Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)]
2025-12-04T10:08:42.1054382Z graph_break []
2025-12-04T10:08:42.1054600Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1055821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1055941Z   if out == self.unknown_value:
2025-12-04T10:08:42.1057306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1057430Z   if out == self.unknown_value:
2025-12-04T10:08:42.1058624Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1058834Z   if out == self.unknown_value:
2025-12-04T10:08:42.1059564Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1059695Z   warnings.warn(
2025-12-04T10:08:42.1060011Z __________ AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda ___________
2025-12-04T10:08:42.1060140Z Traceback (most recent call last):
2025-12-04T10:08:42.1060619Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 873, in test_deconv_freezing
2025-12-04T10:08:42.1060805Z     self.check_model(Model(self.device), example_inputs)
2025-12-04T10:08:42.1061235Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T10:08:42.1061375Z     actual = AOTIRunnerUtil.run(
2025-12-04T10:08:42.1061761Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T10:08:42.1061920Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T10:08:42.1062322Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T10:08:42.1062523Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T10:08:42.1063060Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T10:08:42.1063192Z     return aot_inductor_minifier_wrapper(
2025-12-04T10:08:42.1063735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1063842Z     raise e
2025-12-04T10:08:42.1064384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1064494Z     return func(
2025-12-04T10:08:42.1065043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T10:08:42.1065275Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T10:08:42.1065743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T10:08:42.1065858Z     return compile_fx_aot(
2025-12-04T10:08:42.1066350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T10:08:42.1066492Z     compiled_artifacts = compile_fx(
2025-12-04T10:08:42.1066959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T10:08:42.1067079Z     return compile_fx(
2025-12-04T10:08:42.1067542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T10:08:42.1067682Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T10:08:42.1068267Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T10:08:42.1068382Z     return _compile_fx_main(
2025-12-04T10:08:42.1068893Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T10:08:42.1069096Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T10:08:42.1069684Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2098, in fw_compiler_freezing
2025-12-04T10:08:42.1069825Z     optimized_function = inner_compile(
2025-12-04T10:08:42.1070107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T10:08:42.1070222Z     return func(*args, **kwds)
2025-12-04T10:08:42.1070733Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T10:08:42.1071248Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T10:08:42.1071754Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T10:08:42.1071928Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1072430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T10:08:42.1072645Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:08:42.1073145Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T10:08:42.1073306Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:08:42.1073840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:08:42.1074167Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:08:42.1074702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T10:08:42.1074829Z     _check_triton_bf16_support(graph)
2025-12-04T10:08:42.1075377Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T10:08:42.1075511Z     warn_and_skip(node.get_device())
2025-12-04T10:08:42.1075998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T10:08:42.1076152Z     raise SkipFrame("BF16 is not supported")
2025-12-04T10:08:42.1076407Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1076413Z 
2025-12-04T10:08:42.1076630Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1077237Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda
2025-12-04T10:08:42.1077242Z 
2025-12-04T10:08:42.1077511Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1077749Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1077860Z unimplemented []
2025-12-04T10:08:42.1078018Z stats [('calls_captured', 3), ('unique_graphs', 3)]
2025-12-04T10:08:42.1078368Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)]
2025-12-04T10:08:42.1078469Z graph_break []
2025-12-04T10:08:42.1078691Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1079904Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1080028Z   if out == self.unknown_value:
2025-12-04T10:08:42.1081401Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1083237Z   if out == self.unknown_value:
2025-12-04T10:08:42.1084805Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1086277Z   if out == self.unknown_value:
2025-12-04T10:08:42.1087239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1088304Z   warnings.warn(
2025-12-04T10:08:42.1088688Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1089174Z unimplemented []
2025-12-04T10:08:42.1089511Z stats [('calls_captured', 3), ('unique_graphs', 3)]
2025-12-04T10:08:42.1090124Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)]
2025-12-04T10:08:42.1090740Z graph_break []
2025-12-04T10:08:42.1091133Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1093254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1095203Z   if out == self.unknown_value:
2025-12-04T10:08:42.1096720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1098205Z   if out == self.unknown_value:
2025-12-04T10:08:42.1099165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1100144Z   warnings.warn(
2025-12-04T10:08:42.1100464Z =================================== FAILURES ===================================
2025-12-04T10:08:42.1101079Z __________ AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda ___________
2025-12-04T10:08:42.1101992Z Traceback (most recent call last):
2025-12-04T10:08:42.1103011Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 873, in test_deconv_freezing
2025-12-04T10:08:42.1103810Z     self.check_model(Model(self.device), example_inputs)
2025-12-04T10:08:42.1104575Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T10:08:42.1105284Z     actual = AOTIRunnerUtil.run(
2025-12-04T10:08:42.1105915Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T10:08:42.1106603Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T10:08:42.1107277Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T10:08:42.1108040Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T10:08:42.1108922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T10:08:42.1109728Z     return aot_inductor_minifier_wrapper(
2025-12-04T10:08:42.1110527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1111402Z     raise e
2025-12-04T10:08:42.1112088Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1112879Z     return func(
2025-12-04T10:08:42.1113583Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T10:08:42.1114514Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T10:08:42.1115359Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T10:08:42.1116191Z     return compile_fx_aot(
2025-12-04T10:08:42.1116907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T10:08:42.1117687Z     compiled_artifacts = compile_fx(
2025-12-04T10:08:42.1118414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T10:08:42.1119213Z     return compile_fx(
2025-12-04T10:08:42.1119889Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T10:08:42.1120646Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T10:08:42.1121482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T10:08:42.1122319Z     return _compile_fx_main(
2025-12-04T10:08:42.1123051Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T10:08:42.1123904Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T10:08:42.1124761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2098, in fw_compiler_freezing
2025-12-04T10:08:42.1125565Z     optimized_function = inner_compile(
2025-12-04T10:08:42.1126108Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T10:08:42.1126652Z     return func(*args, **kwds)
2025-12-04T10:08:42.1127357Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T10:08:42.1128266Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T10:08:42.1129170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T10:08:42.1129977Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1130805Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T10:08:42.1131652Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:08:42.1132494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T10:08:42.1133280Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:08:42.1134104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:08:42.1135110Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:08:42.1136099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T10:08:42.1136951Z     _check_triton_bf16_support(graph)
2025-12-04T10:08:42.1137759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T10:08:42.1138577Z     warn_and_skip(node.get_device())
2025-12-04T10:08:42.1139292Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T10:08:42.1140067Z     raise SkipFrame("BF16 is not supported")
2025-12-04T10:08:42.1140594Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1140993Z 
2025-12-04T10:08:42.1141225Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1142161Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda
2025-12-04T10:08:42.1142895Z 
2025-12-04T10:08:42.1143166Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1143812Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1144413Z unimplemented []
2025-12-04T10:08:42.1144742Z stats [('calls_captured', 3), ('unique_graphs', 3)]
2025-12-04T10:08:42.1145382Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)]
2025-12-04T10:08:42.1145974Z graph_break []
2025-12-04T10:08:42.1146347Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1147993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1160362Z   if out == self.unknown_value:
2025-12-04T10:08:42.1161857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1163355Z   if out == self.unknown_value:
2025-12-04T10:08:42.1164776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1166221Z   if out == self.unknown_value:
2025-12-04T10:08:42.1167186Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1168163Z   warnings.warn(
2025-12-04T10:08:42.1168552Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1169034Z unimplemented []
2025-12-04T10:08:42.1169368Z stats [('calls_captured', 3), ('unique_graphs', 3)]
2025-12-04T10:08:42.1170003Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)]
2025-12-04T10:08:42.1170573Z graph_break []
2025-12-04T10:08:42.1171178Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1172752Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1174222Z   if out == self.unknown_value:
2025-12-04T10:08:42.1175621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1177137Z   if out == self.unknown_value:
2025-12-04T10:08:42.1178084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1179064Z   warnings.warn(
2025-12-04T10:08:42.1179443Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1179918Z unimplemented []
2025-12-04T10:08:42.1180251Z stats [('calls_captured', 3), ('unique_graphs', 3)]
2025-12-04T10:08:42.1180870Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)]
2025-12-04T10:08:42.1181455Z graph_break []
2025-12-04T10:08:42.1181835Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1183403Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1184840Z   if out == self.unknown_value:
2025-12-04T10:08:42.1186477Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1187933Z   if out == self.unknown_value:
2025-12-04T10:08:42.1188892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1189960Z   warnings.warn(
2025-12-04T10:08:42.1190881Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-2e7c8f13f7be0603.xml -
2025-12-04T10:08:42.1191946Z =========================== short test summary info ============================
2025-12-04T10:08:42.1193090Z FAILED [11.5884s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1194019Z 
2025-12-04T10:08:42.1194246Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1195199Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda
2025-12-04T10:08:42.1195923Z 
2025-12-04T10:08:42.1196205Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1196800Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:08:42.1197328Z ================= 1 failed, 157 deselected, 2 rerun in 56.26s ==================
2025-12-04T10:08:42.1197787Z Got exit code 1
2025-12-04T10:08:42.1198065Z Retrying single test...
2025-12-04T10:08:42.1198692Z W1204 10:02:12.522000 18895 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:08:42.1199848Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-8d6cdce6581fa448.xml
2025-12-04T10:08:42.1200737Z ============================= test session starts ==============================
2025-12-04T10:08:42.1201413Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:08:42.1202005Z cachedir: .pytest_cache
2025-12-04T10:08:42.1202720Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:08:42.1203514Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:08:42.1203859Z configfile: pytest.ini
2025-12-04T10:08:42.1204598Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:08:42.1205516Z collecting ... collected 934 items / 157 deselected / 777 selected
2025-12-04T10:08:42.1206561Z stepcurrent: skipping 70 already run items. Running only test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda
2025-12-04T10:08:42.1207483Z Running 1 items in this shard
2025-12-04T10:08:42.1207712Z 
2025-12-04T10:08:42.1208710Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda [W1204 10:02:16.072468211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1209845Z 
2025-12-04T10:08:42.1210365Z [W1204 10:02:32.969755695 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1211020Z 
2025-12-04T10:08:42.1211549Z [W1204 10:02:32.973126310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1212203Z 
2025-12-04T10:08:42.1212677Z W1204 10:02:32.648000 18895 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:08:42.1213861Z [W1204 10:02:40.775847541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1214531Z 
2025-12-04T10:08:42.1215045Z [W1204 10:02:40.776377923 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1215709Z 
2025-12-04T10:08:42.1216222Z [W1204 10:02:40.776570645 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1217009Z 
2025-12-04T10:08:42.1217535Z [W1204 10:02:40.781470701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1218185Z 
2025-12-04T10:08:42.1218708Z [W1204 10:02:40.785591636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1219354Z 
2025-12-04T10:08:42.1219869Z [W1204 10:02:47.826891735 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1220530Z 
2025-12-04T10:08:42.1221044Z [W1204 10:02:47.828779803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1221703Z 
2025-12-04T10:08:42.1222215Z [W1204 10:02:47.831156108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1222872Z 
2025-12-04T10:08:42.1223022Z ('RERUN', {'yellow': True}) [33.0632s] [100%]
2025-12-04T10:08:42.1224286Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda [W1204 10:02:47.999722910 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1225409Z 
2025-12-04T10:08:42.1225929Z [W1204 10:02:47.001504841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1226594Z 
2025-12-04T10:08:42.1227116Z [W1204 10:02:47.003768082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1227777Z 
2025-12-04T10:08:42.1228289Z [W1204 10:02:53.422308098 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1228938Z 
2025-12-04T10:08:42.1229463Z [W1204 10:02:53.422718137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1230114Z 
2025-12-04T10:08:42.1230642Z [W1204 10:02:53.422868533 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1231292Z 
2025-12-04T10:08:42.1231806Z [W1204 10:02:53.425944101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1232468Z 
2025-12-04T10:08:42.1232981Z [W1204 10:02:53.429506266 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1233645Z 
2025-12-04T10:08:42.1234156Z [W1204 10:02:59.578267718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1234802Z 
2025-12-04T10:08:42.1235328Z [W1204 10:02:59.580183005 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1235980Z 
2025-12-04T10:08:42.1236505Z [W1204 10:02:59.582566586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1237156Z 
2025-12-04T10:08:42.1237290Z ('RERUN', {'yellow': True}) [11.7124s] [100%]
2025-12-04T10:08:42.1238623Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda [W1204 10:02:59.713768696 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1239765Z 
2025-12-04T10:08:42.1240281Z [W1204 10:02:59.715501167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1240928Z 
2025-12-04T10:08:42.1241453Z [W1204 10:02:59.718003599 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1242166Z 
2025-12-04T10:08:42.1242694Z [W1204 10:03:04.129315977 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1243348Z 
2025-12-04T10:08:42.1243864Z [W1204 10:03:04.129749292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1244529Z 
2025-12-04T10:08:42.1245046Z [W1204 10:03:04.129905310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1245707Z 
2025-12-04T10:08:42.1246223Z [W1204 10:03:04.133121240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1246866Z 
2025-12-04T10:08:42.1247393Z [W1204 10:03:04.136769310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1248046Z 
2025-12-04T10:08:42.1248567Z [W1204 10:03:10.260192085 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1249214Z 
2025-12-04T10:08:42.1249728Z [W1204 10:03:10.262059543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1250392Z 
2025-12-04T10:08:42.1250903Z [W1204 10:03:10.264361021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:08:42.1251568Z 
2025-12-04T10:08:42.1251684Z FAILED [11.6797s] [100%]
2025-12-04T10:08:42.1251872Z 
2025-12-04T10:08:42.1252034Z ==================================== RERUNS ====================================
2025-12-04T10:08:42.1252647Z __________ AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda ___________
2025-12-04T10:08:42.1253216Z Traceback (most recent call last):
2025-12-04T10:08:42.1253924Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 873, in test_deconv_freezing
2025-12-04T10:08:42.1254702Z     self.check_model(Model(self.device), example_inputs)
2025-12-04T10:08:42.1255471Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T10:08:42.1256173Z     actual = AOTIRunnerUtil.run(
2025-12-04T10:08:42.1256863Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T10:08:42.1257531Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T10:08:42.1258225Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T10:08:42.1258988Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T10:08:42.1259850Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T10:08:42.1260662Z     return aot_inductor_minifier_wrapper(
2025-12-04T10:08:42.1261479Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1262261Z     raise e
2025-12-04T10:08:42.1262933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1263708Z     return func(
2025-12-04T10:08:42.1264427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T10:08:42.1265434Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T10:08:42.1266281Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T10:08:42.1267007Z     return compile_fx_aot(
2025-12-04T10:08:42.1267709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T10:08:42.1268551Z     compiled_artifacts = compile_fx(
2025-12-04T10:08:42.1269274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T10:08:42.1269999Z     return compile_fx(
2025-12-04T10:08:42.1270646Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T10:08:42.1271622Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T10:08:42.1272471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T10:08:42.1273302Z     return _compile_fx_main(
2025-12-04T10:08:42.1274008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T10:08:42.1274861Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T10:08:42.1275722Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2098, in fw_compiler_freezing
2025-12-04T10:08:42.1276529Z     optimized_function = inner_compile(
2025-12-04T10:08:42.1277056Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T10:08:42.1277596Z     return func(*args, **kwds)
2025-12-04T10:08:42.1278316Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T10:08:42.1279215Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T10:08:42.1280126Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T10:08:42.1280941Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1281756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T10:08:42.1282593Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:08:42.1283424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T10:08:42.1284225Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:08:42.1285043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:08:42.1286030Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:08:42.1287027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T10:08:42.1287817Z     _check_triton_bf16_support(graph)
2025-12-04T10:08:42.1288604Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T10:08:42.1289420Z     warn_and_skip(node.get_device())
2025-12-04T10:08:42.1290153Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T10:08:42.1290915Z     raise SkipFrame("BF16 is not supported")
2025-12-04T10:08:42.1291432Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1291834Z 
2025-12-04T10:08:42.1292054Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1292997Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda
2025-12-04T10:08:42.1293923Z 
2025-12-04T10:08:42.1294214Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1294846Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1295328Z unimplemented []
2025-12-04T10:08:42.1295667Z stats [('calls_captured', 3), ('unique_graphs', 3)]
2025-12-04T10:08:42.1296356Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)]
2025-12-04T10:08:42.1297140Z graph_break []
2025-12-04T10:08:42.1297526Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1299103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1300559Z   if out == self.unknown_value:
2025-12-04T10:08:42.1301986Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1303431Z   if out == self.unknown_value:
2025-12-04T10:08:42.1304857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1306296Z   if out == self.unknown_value:
2025-12-04T10:08:42.1307244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1308214Z   warnings.warn(
2025-12-04T10:08:42.1308691Z __________ AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda ___________
2025-12-04T10:08:42.1309261Z Traceback (most recent call last):
2025-12-04T10:08:42.1309960Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 873, in test_deconv_freezing
2025-12-04T10:08:42.1310741Z     self.check_model(Model(self.device), example_inputs)
2025-12-04T10:08:42.1311477Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T10:08:42.1312174Z     actual = AOTIRunnerUtil.run(
2025-12-04T10:08:42.1312791Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T10:08:42.1313461Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T10:08:42.1314126Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T10:08:42.1314878Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T10:08:42.1315747Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T10:08:42.1316546Z     return aot_inductor_minifier_wrapper(
2025-12-04T10:08:42.1317341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1318121Z     raise e
2025-12-04T10:08:42.1318806Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1319575Z     return func(
2025-12-04T10:08:42.1320290Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T10:08:42.1321214Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T10:08:42.1322048Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T10:08:42.1322758Z     return compile_fx_aot(
2025-12-04T10:08:42.1323534Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T10:08:42.1324306Z     compiled_artifacts = compile_fx(
2025-12-04T10:08:42.1325008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T10:08:42.1325733Z     return compile_fx(
2025-12-04T10:08:42.1326398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T10:08:42.1327221Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T10:08:42.1328073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T10:08:42.1328894Z     return _compile_fx_main(
2025-12-04T10:08:42.1329611Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T10:08:42.1330461Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T10:08:42.1331332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2098, in fw_compiler_freezing
2025-12-04T10:08:42.1332119Z     optimized_function = inner_compile(
2025-12-04T10:08:42.1332653Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T10:08:42.1333196Z     return func(*args, **kwds)
2025-12-04T10:08:42.1333907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T10:08:42.1334812Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T10:08:42.1335721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T10:08:42.1336604Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1337411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T10:08:42.1338257Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:08:42.1339090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T10:08:42.1339893Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:08:42.1340696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:08:42.1341698Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:08:42.1342694Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T10:08:42.1343487Z     _check_triton_bf16_support(graph)
2025-12-04T10:08:42.1344272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T10:08:42.1345096Z     warn_and_skip(node.get_device())
2025-12-04T10:08:42.1345823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T10:08:42.1346573Z     raise SkipFrame("BF16 is not supported")
2025-12-04T10:08:42.1347096Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1347499Z 
2025-12-04T10:08:42.1347718Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1348663Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda
2025-12-04T10:08:42.1349384Z 
2025-12-04T10:08:42.1349655Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1350286Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1350760Z unimplemented []
2025-12-04T10:08:42.1351188Z stats [('calls_captured', 3), ('unique_graphs', 3)]
2025-12-04T10:08:42.1351806Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)]
2025-12-04T10:08:42.1352375Z graph_break []
2025-12-04T10:08:42.1352754Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1354310Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1355834Z   if out == self.unknown_value:
2025-12-04T10:08:42.1357247Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1358693Z   if out == self.unknown_value:
2025-12-04T10:08:42.1360092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1361545Z   if out == self.unknown_value:
2025-12-04T10:08:42.1362491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1363469Z   warnings.warn(
2025-12-04T10:08:42.1363846Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1364314Z unimplemented []
2025-12-04T10:08:42.1364637Z stats [('calls_captured', 3), ('unique_graphs', 3)]
2025-12-04T10:08:42.1365260Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)]
2025-12-04T10:08:42.1365821Z graph_break []
2025-12-04T10:08:42.1366201Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1367767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1369201Z   if out == self.unknown_value:
2025-12-04T10:08:42.1370621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1372295Z   if out == self.unknown_value:
2025-12-04T10:08:42.1373247Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1374210Z   warnings.warn(
2025-12-04T10:08:42.1374537Z =================================== FAILURES ===================================
2025-12-04T10:08:42.1375148Z __________ AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda ___________
2025-12-04T10:08:42.1375729Z Traceback (most recent call last):
2025-12-04T10:08:42.1376495Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 873, in test_deconv_freezing
2025-12-04T10:08:42.1377285Z     self.check_model(Model(self.device), example_inputs)
2025-12-04T10:08:42.1378038Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T10:08:42.1378723Z     actual = AOTIRunnerUtil.run(
2025-12-04T10:08:42.1379341Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T10:08:42.1380014Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T10:08:42.1380690Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T10:08:42.1381595Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T10:08:42.1382468Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T10:08:42.1383269Z     return aot_inductor_minifier_wrapper(
2025-12-04T10:08:42.1384079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1384933Z     raise e
2025-12-04T10:08:42.1385618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1386395Z     return func(
2025-12-04T10:08:42.1387093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T10:08:42.1388014Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T10:08:42.1388854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T10:08:42.1389569Z     return compile_fx_aot(
2025-12-04T10:08:42.1390259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T10:08:42.1391017Z     compiled_artifacts = compile_fx(
2025-12-04T10:08:42.1391737Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T10:08:42.1392445Z     return compile_fx(
2025-12-04T10:08:42.1393098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T10:08:42.1393842Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T10:08:42.1394679Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T10:08:42.1395494Z     return _compile_fx_main(
2025-12-04T10:08:42.1396215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T10:08:42.1397063Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T10:08:42.1397921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2098, in fw_compiler_freezing
2025-12-04T10:08:42.1398713Z     optimized_function = inner_compile(
2025-12-04T10:08:42.1399256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T10:08:42.1399791Z     return func(*args, **kwds)
2025-12-04T10:08:42.1400496Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T10:08:42.1401408Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T10:08:42.1402314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T10:08:42.1403133Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1403936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T10:08:42.1404779Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:08:42.1405615Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T10:08:42.1406404Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:08:42.1407227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:08:42.1408230Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:08:42.1409221Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T10:08:42.1410081Z     _check_triton_bf16_support(graph)
2025-12-04T10:08:42.1410887Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T10:08:42.1411712Z     warn_and_skip(node.get_device())
2025-12-04T10:08:42.1412447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T10:08:42.1413274Z     raise SkipFrame("BF16 is not supported")
2025-12-04T10:08:42.1413814Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1414199Z 
2025-12-04T10:08:42.1414435Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1415369Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda
2025-12-04T10:08:42.1416104Z 
2025-12-04T10:08:42.1416449Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1417096Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1417575Z unimplemented []
2025-12-04T10:08:42.1417895Z stats [('calls_captured', 3), ('unique_graphs', 3)]
2025-12-04T10:08:42.1418528Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)]
2025-12-04T10:08:42.1419108Z graph_break []
2025-12-04T10:08:42.1419483Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1421042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1422551Z   if out == self.unknown_value:
2025-12-04T10:08:42.1424063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1425516Z   if out == self.unknown_value:
2025-12-04T10:08:42.1426920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1428385Z   if out == self.unknown_value:
2025-12-04T10:08:42.1429333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1430314Z   warnings.warn(
2025-12-04T10:08:42.1430687Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1431166Z unimplemented []
2025-12-04T10:08:42.1431495Z stats [('calls_captured', 3), ('unique_graphs', 3)]
2025-12-04T10:08:42.1432118Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)]
2025-12-04T10:08:42.1432695Z graph_break []
2025-12-04T10:08:42.1433076Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1434633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1436086Z   if out == self.unknown_value:
2025-12-04T10:08:42.1437505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1438957Z   if out == self.unknown_value:
2025-12-04T10:08:42.1440017Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1440981Z   warnings.warn(
2025-12-04T10:08:42.1441381Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1441857Z unimplemented []
2025-12-04T10:08:42.1442178Z stats [('calls_captured', 3), ('unique_graphs', 3)]
2025-12-04T10:08:42.1442813Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)]
2025-12-04T10:08:42.1443470Z graph_break []
2025-12-04T10:08:42.1443852Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1445416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1446880Z   if out == self.unknown_value:
2025-12-04T10:08:42.1448309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:08:42.1449774Z   if out == self.unknown_value:
2025-12-04T10:08:42.1450712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1451687Z   warnings.warn(
2025-12-04T10:08:42.1452603Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-8d6cdce6581fa448.xml -
2025-12-04T10:08:42.1453662Z =========================== short test summary info ============================
2025-12-04T10:08:42.1454798Z FAILED [11.6797s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1455737Z 
2025-12-04T10:08:42.1455954Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1456971Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda
2025-12-04T10:08:42.1457695Z 
2025-12-04T10:08:42.1457980Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1458576Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:08:42.1459122Z ================= 1 failed, 157 deselected, 2 rerun in 56.54s ==================
2025-12-04T10:08:42.1459577Z Got exit code 1
2025-12-04T10:08:42.1460240Z FAILED CONSISTENTLY: test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda
2025-12-04T10:08:42.1461302Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:08:42.1462302Z W1204 10:03:22.855000 19883 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:08:42.1463451Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-9dce38c1d023996d.xml
2025-12-04T10:08:42.1464313Z ============================= test session starts ==============================
2025-12-04T10:08:42.1464984Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:08:42.1465588Z cachedir: .pytest_cache
2025-12-04T10:08:42.1466296Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:08:42.1467068Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:08:42.1467430Z configfile: pytest.ini
2025-12-04T10:08:42.1468164Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:08:42.1469168Z collecting ... collected 934 items / 71 deselected / 863 selected
2025-12-04T10:08:42.1469666Z stepcurrent: skipping 71 already run items.
2025-12-04T10:08:42.1470061Z Running 87 items in this shard
2025-12-04T10:08:42.1470273Z 
2025-12-04T10:08:42.1471229Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_duplicate_constant_folding_cuda <- test/inductor/test_torchinductor.py PASSED [9.7597s] [  1%]
2025-12-04T10:08:42.1472891Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dynamic_cat_cuda <- test/inductor/test_torchinductor.py PASSED [6.4184s] [  2%]
2025-12-04T10:08:42.1474410Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_extract_constants_map_cuda <- test/inductor/test_torchinductor.py PASSED [6.3029s] [  3%]
2025-12-04T10:08:42.1476009Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fake_tensor_device_validation_cuda <- test/inductor/test_torchinductor.py PASSED [0.0788s] [  4%]
2025-12-04T10:08:42.1477607Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fp8_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ devices) [  5%]
2025-12-04T10:08:42.1479131Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_inf_cuda SKIPPED [0.0002s] (Skip this test, only for local test. SIGABRT is produced.) [  6%]
2025-12-04T10:08:42.1480696Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_input_codegen_with_sympy_expr_cuda <- test/inductor/test_torchinductor.py PASSED [7.0187s] [  8%]
2025-12-04T10:08:42.1482276Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_masked_select_dynamic_cuda <- test/inductor/test_torchinductor.py PASSED [6.4916s] [  9%]
2025-12-04T10:08:42.1484160Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_misaligned_input_1_cuda <- test/inductor/test_torchinductor.py W1204 10:04:07.312000 19883 site-packages/torch/_export/__init__.py:71] +============================+
2025-12-04T10:08:42.1485700Z W1204 10:04:07.312000 19883 site-packages/torch/_export/__init__.py:72] |     !!!   WARNING   !!!    |
2025-12-04T10:08:42.1486538Z W1204 10:04:07.312000 19883 site-packages/torch/_export/__init__.py:73] +============================+
2025-12-04T10:08:42.1488268Z W1204 10:04:07.312000 19883 site-packages/torch/_export/__init__.py:74] torch._export.aot_compile()/torch._export.aot_load() is being deprecated, please switch to directly calling torch._inductor.aoti_compile_and_package(torch.export.export())/torch._inductor.aoti_load_package() instead.
2025-12-04T10:08:42.1490924Z [W1204 10:04:12.269716879 cgy3tashbjqpuzkl7jeiimyyzcze2gud63na6myypd3346d645j2.wrapper.cpp:752] Warning: "Input 0 was compiled as 16-bytes aligned, but it is not aligned at run time. Copying to an aligned tensor to guarantee correctness, but expect a performance hit." (function run_impl)
2025-12-04T10:08:42.1492436Z PASSED [12.0124s] [ 10%]
2025-12-04T10:08:42.1493351Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_misaligned_input_2_cuda <- test/inductor/test_torchinductor.py PASSED [11.9552s] [ 11%]
2025-12-04T10:08:42.1495233Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_multi_device_cuda <- test/inductor/test_torchinductor.py W1204 10:04:24.982000 19883 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:08:42.1496887Z W1204 10:04:24.984000 19883 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:08:42.1497541Z PASSED [6.5199s] [ 12%]
2025-12-04T10:08:42.1498423Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_non_tensor_input_cuda <- test/inductor/test_torchinductor.py PASSED [16.0093s] [ 13%]
2025-12-04T10:08:42.1499939Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_none_args_aot_codegen_cuda <- test/inductor/test_torchinductor.py PASSED [12.8722s] [ 14%]
2025-12-04T10:08:42.1501625Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_normal_functional_cuda <- test/inductor/test_torchinductor.py PASSED [5.3145s] [ 16%]
2025-12-04T10:08:42.1503636Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_pad_non_zero_memory_leak_cuda <- test/inductor/test_torchinductor.py W1204 10:05:05.695000 19883 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:08:42.1505101Z PASSED [6.5301s] [ 17%]
2025-12-04T10:08:42.1506005Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_proxy_executor_squeeze_cuda <- test/inductor/test_torchinductor.py PASSED [5.3510s] [ 18%]
2025-12-04T10:08:42.1509011Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_repeated_calling_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0009s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/146185 for platform(s) inductor, linux, rocm, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 19%]
2025-12-04T10:08:42.1511873Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_replace_unbacked_symbol_with_backed_expr_cuda PASSED [7.6863s] [ 20%]
2025-12-04T10:08:42.1513301Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_reuse_kernel_cuda <- test/inductor/test_torchinductor.py PASSED [11.6272s] [ 21%]
2025-12-04T10:08:42.1514852Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_rocm_triton_autotuning_cuda SKIPPED [0.0032s] (test currently only works on the ROCm stack) [ 22%]
2025-12-04T10:08:42.1516435Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_run_with_grad_enabled_cuda <- test/inductor/test_torchinductor.py PASSED [5.4674s] [ 24%]
2025-12-04T10:08:42.1518083Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_device_type_failed_cuda Error: input_handles[0]: unmatched device type, expected: 00(cpu), but got: 1
2025-12-04T10:08:42.1519047Z 
2025-12-04T10:08:42.1519155Z PASSED [5.7040s] [ 25%]
2025-12-04T10:08:42.1520029Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_scatter_fallback_cuda <- test/inductor/test_torchinductor.py PASSED [5.8492s] [ 26%]
2025-12-04T10:08:42.1521475Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_seq_cuda <- test/inductor/test_torchinductor.py PASSED [5.9965s] [ 27%]
2025-12-04T10:08:42.1522928Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda ('RERUN', {'yellow': True}) [1.1269s] [ 28%]
2025-12-04T10:08:42.1524423Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda ('RERUN', {'yellow': True}) [0.6134s] [ 28%]
2025-12-04T10:08:42.1525841Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda FAILED [0.8981s] [ 28%]
2025-12-04T10:08:42.1526575Z 
2025-12-04T10:08:42.1526734Z ==================================== RERUNS ====================================
2025-12-04T10:08:42.1527390Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda _
2025-12-04T10:08:42.1528006Z Traceback (most recent call last):
2025-12-04T10:08:42.1528818Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 1969, in test_size_with_unbacked_add_expr_transitive
2025-12-04T10:08:42.1529749Z     self.check_model(Repro(), example_inputs, dynamic_shapes=spec)
2025-12-04T10:08:42.1530533Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T10:08:42.1531235Z     actual = AOTIRunnerUtil.run(
2025-12-04T10:08:42.1531855Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T10:08:42.1532529Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T10:08:42.1533276Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T10:08:42.1534034Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T10:08:42.1534910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T10:08:42.1535699Z     return aot_inductor_minifier_wrapper(
2025-12-04T10:08:42.1536580Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1537435Z     raise e
2025-12-04T10:08:42.1538123Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1538893Z     return func(
2025-12-04T10:08:42.1539605Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T10:08:42.1540529Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T10:08:42.1541365Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T10:08:42.1542075Z     return compile_fx_aot(
2025-12-04T10:08:42.1542776Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T10:08:42.1543546Z     compiled_artifacts = compile_fx(
2025-12-04T10:08:42.1544252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T10:08:42.1544972Z     return compile_fx(
2025-12-04T10:08:42.1545632Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T10:08:42.1546384Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T10:08:42.1547222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T10:08:42.1548060Z     return _compile_fx_main(
2025-12-04T10:08:42.1548783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T10:08:42.1549621Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T10:08:42.1550155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T10:08:42.1550310Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1550826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T10:08:42.1550945Z     return compile_fx_forward(
2025-12-04T10:08:42.1551457Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T10:08:42.1551583Z     return inner_compile(
2025-12-04T10:08:42.1551869Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T10:08:42.1551997Z     return func(*args, **kwds)
2025-12-04T10:08:42.1552490Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T10:08:42.1552755Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T10:08:42.1553260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T10:08:42.1553442Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1553943Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T10:08:42.1554148Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:08:42.1554647Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T10:08:42.1554892Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:08:42.1555423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:08:42.1555744Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:08:42.1556276Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T10:08:42.1556465Z     _check_triton_bf16_support(graph)
2025-12-04T10:08:42.1557025Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T10:08:42.1557148Z     warn_and_skip(node.get_device())
2025-12-04T10:08:42.1557630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T10:08:42.1557790Z     raise SkipFrame("BF16 is not supported")
2025-12-04T10:08:42.1558052Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1558058Z 
2025-12-04T10:08:42.1558276Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1559005Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda
2025-12-04T10:08:42.1559016Z 
2025-12-04T10:08:42.1559284Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1559524Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1559631Z unimplemented []
2025-12-04T10:08:42.1559797Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T10:08:42.1560055Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:08:42.1560155Z graph_break []
2025-12-04T10:08:42.1560387Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1561209Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T10:08:42.1561328Z   return cls.__new__(cls, *args)
2025-12-04T10:08:42.1562069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1562178Z   warnings.warn(
2025-12-04T10:08:42.1562539Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda _
2025-12-04T10:08:42.1562675Z Traceback (most recent call last):
2025-12-04T10:08:42.1563238Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 1969, in test_size_with_unbacked_add_expr_transitive
2025-12-04T10:08:42.1563476Z     self.check_model(Repro(), example_inputs, dynamic_shapes=spec)
2025-12-04T10:08:42.1563912Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T10:08:42.1564035Z     actual = AOTIRunnerUtil.run(
2025-12-04T10:08:42.1564435Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T10:08:42.1564578Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T10:08:42.1564985Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T10:08:42.1565201Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T10:08:42.1565727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T10:08:42.1565871Z     return aot_inductor_minifier_wrapper(
2025-12-04T10:08:42.1566410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1566506Z     raise e
2025-12-04T10:08:42.1567131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1567230Z     return func(
2025-12-04T10:08:42.1567793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T10:08:42.1568027Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T10:08:42.1568569Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T10:08:42.1568698Z     return compile_fx_aot(
2025-12-04T10:08:42.1569191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T10:08:42.1569316Z     compiled_artifacts = compile_fx(
2025-12-04T10:08:42.1569801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T10:08:42.1569908Z     return compile_fx(
2025-12-04T10:08:42.1570390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T10:08:42.1570527Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T10:08:42.1571276Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T10:08:42.1571412Z     return _compile_fx_main(
2025-12-04T10:08:42.1571917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T10:08:42.1572118Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T10:08:42.1572651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T10:08:42.1572804Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1573321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T10:08:42.1573440Z     return compile_fx_forward(
2025-12-04T10:08:42.1573957Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T10:08:42.1574082Z     return inner_compile(
2025-12-04T10:08:42.1574364Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T10:08:42.1574498Z     return func(*args, **kwds)
2025-12-04T10:08:42.1574994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T10:08:42.1575260Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T10:08:42.1575771Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T10:08:42.1575945Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1576551Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T10:08:42.1576763Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:08:42.1577266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T10:08:42.1577431Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:08:42.1577970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:08:42.1578294Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:08:42.1578833Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T10:08:42.1578962Z     _check_triton_bf16_support(graph)
2025-12-04T10:08:42.1579669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T10:08:42.1579795Z     warn_and_skip(node.get_device())
2025-12-04T10:08:42.1580280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T10:08:42.1580438Z     raise SkipFrame("BF16 is not supported")
2025-12-04T10:08:42.1580699Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1580783Z 
2025-12-04T10:08:42.1581003Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1581737Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda
2025-12-04T10:08:42.1581743Z 
2025-12-04T10:08:42.1582014Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1582257Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1582370Z unimplemented []
2025-12-04T10:08:42.1582539Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T10:08:42.1582803Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:08:42.1582906Z graph_break []
2025-12-04T10:08:42.1583141Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1583964Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T10:08:42.1584083Z   return cls.__new__(cls, *args)
2025-12-04T10:08:42.1584826Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1584933Z   warnings.warn(
2025-12-04T10:08:42.1585154Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1585280Z unimplemented []
2025-12-04T10:08:42.1585452Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T10:08:42.1585712Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:08:42.1585815Z graph_break []
2025-12-04T10:08:42.1586033Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1586853Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T10:08:42.1586974Z   return cls.__new__(cls, *args)
2025-12-04T10:08:42.1587699Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1587816Z   warnings.warn(
2025-12-04T10:08:42.1587964Z =================================== FAILURES ===================================
2025-12-04T10:08:42.1588338Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda _
2025-12-04T10:08:42.1588464Z Traceback (most recent call last):
2025-12-04T10:08:42.1589028Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 1969, in test_size_with_unbacked_add_expr_transitive
2025-12-04T10:08:42.1589266Z     self.check_model(Repro(), example_inputs, dynamic_shapes=spec)
2025-12-04T10:08:42.1589705Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T10:08:42.1589842Z     actual = AOTIRunnerUtil.run(
2025-12-04T10:08:42.1590226Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T10:08:42.1590371Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T10:08:42.1590783Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T10:08:42.1590985Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T10:08:42.1591578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T10:08:42.1591723Z     return aot_inductor_minifier_wrapper(
2025-12-04T10:08:42.1592262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1592433Z     raise e
2025-12-04T10:08:42.1592970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1593066Z     return func(
2025-12-04T10:08:42.1593626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T10:08:42.1593857Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T10:08:42.1594319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T10:08:42.1594444Z     return compile_fx_aot(
2025-12-04T10:08:42.1594933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T10:08:42.1595071Z     compiled_artifacts = compile_fx(
2025-12-04T10:08:42.1595538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T10:08:42.1595652Z     return compile_fx(
2025-12-04T10:08:42.1596135Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T10:08:42.1596270Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T10:08:42.1596841Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T10:08:42.1596971Z     return _compile_fx_main(
2025-12-04T10:08:42.1597475Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T10:08:42.1597691Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T10:08:42.1598213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T10:08:42.1598363Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1598879Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T10:08:42.1598996Z     return compile_fx_forward(
2025-12-04T10:08:42.1599524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T10:08:42.1599635Z     return inner_compile(
2025-12-04T10:08:42.1599921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T10:08:42.1600050Z     return func(*args, **kwds)
2025-12-04T10:08:42.1600551Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T10:08:42.1600816Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T10:08:42.1601321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T10:08:42.1601497Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1602015Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T10:08:42.1602207Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:08:42.1602707Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T10:08:42.1602864Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:08:42.1603463Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:08:42.1603797Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:08:42.1604315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T10:08:42.1604440Z     _check_triton_bf16_support(graph)
2025-12-04T10:08:42.1605059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T10:08:42.1605182Z     warn_and_skip(node.get_device())
2025-12-04T10:08:42.1605665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T10:08:42.1605818Z     raise SkipFrame("BF16 is not supported")
2025-12-04T10:08:42.1606076Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1606083Z 
2025-12-04T10:08:42.1606320Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1607036Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda
2025-12-04T10:08:42.1607042Z 
2025-12-04T10:08:42.1607314Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1607552Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1607663Z unimplemented []
2025-12-04T10:08:42.1607841Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T10:08:42.1608084Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:08:42.1608185Z graph_break []
2025-12-04T10:08:42.1608415Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1609235Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T10:08:42.1609352Z   return cls.__new__(cls, *args)
2025-12-04T10:08:42.1610088Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1610193Z   warnings.warn(
2025-12-04T10:08:42.1610427Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1610538Z unimplemented []
2025-12-04T10:08:42.1610704Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T10:08:42.1610963Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:08:42.1611061Z graph_break []
2025-12-04T10:08:42.1611277Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1612100Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T10:08:42.1612220Z   return cls.__new__(cls, *args)
2025-12-04T10:08:42.1612953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1613059Z   warnings.warn(
2025-12-04T10:08:42.1613275Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1613395Z unimplemented []
2025-12-04T10:08:42.1613561Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T10:08:42.1613803Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:08:42.1613918Z graph_break []
2025-12-04T10:08:42.1614131Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1615021Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T10:08:42.1615137Z   return cls.__new__(cls, *args)
2025-12-04T10:08:42.1615860Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1615975Z   warnings.warn(
2025-12-04T10:08:42.1616899Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-9dce38c1d023996d.xml -
2025-12-04T10:08:42.1617182Z =========================== short test summary info ============================
2025-12-04T10:08:42.1618090Z FAILED [0.8981s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1618096Z 
2025-12-04T10:08:42.1618313Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1619050Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda
2025-12-04T10:08:42.1619056Z 
2025-12-04T10:08:42.1619325Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1619521Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:08:42.1619778Z = 1 failed, 20 passed, 4 skipped, 71 deselected, 2 rerun in 157.74s (0:02:37) ==
2025-12-04T10:08:42.1619879Z Got exit code 1
2025-12-04T10:08:42.1620005Z Retrying single test...
2025-12-04T10:08:42.1620450Z W1204 10:06:14.384000 23467 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:08:42.1621029Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-b570798f966501a4.xml
2025-12-04T10:08:42.1621195Z ============================= test session starts ==============================
2025-12-04T10:08:42.1621551Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:08:42.1621674Z cachedir: .pytest_cache
2025-12-04T10:08:42.1622194Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:08:42.1622321Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:08:42.1622448Z configfile: pytest.ini
2025-12-04T10:08:42.1622989Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:08:42.1623229Z collecting ... collected 934 items / 157 deselected / 777 selected
2025-12-04T10:08:42.1624027Z stepcurrent: skipping 95 already run items. Running only test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda
2025-12-04T10:08:42.1624143Z Running 1 items in this shard
2025-12-04T10:08:42.1624149Z 
2025-12-04T10:08:42.1624848Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda ('RERUN', {'yellow': True}) [4.1738s] [100%]
2025-12-04T10:08:42.1625532Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda ('RERUN', {'yellow': True}) [0.6041s] [100%]
2025-12-04T10:08:42.1626139Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda FAILED [0.6119s] [100%]
2025-12-04T10:08:42.1626148Z 
2025-12-04T10:08:42.1626295Z ==================================== RERUNS ====================================
2025-12-04T10:08:42.1626655Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda _
2025-12-04T10:08:42.1626792Z Traceback (most recent call last):
2025-12-04T10:08:42.1627356Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 1969, in test_size_with_unbacked_add_expr_transitive
2025-12-04T10:08:42.1627660Z     self.check_model(Repro(), example_inputs, dynamic_shapes=spec)
2025-12-04T10:08:42.1628096Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T10:08:42.1628218Z     actual = AOTIRunnerUtil.run(
2025-12-04T10:08:42.1628621Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T10:08:42.1628823Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T10:08:42.1629228Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T10:08:42.1629444Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T10:08:42.1629973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T10:08:42.1630121Z     return aot_inductor_minifier_wrapper(
2025-12-04T10:08:42.1630669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1630767Z     raise e
2025-12-04T10:08:42.1631317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1631417Z     return func(
2025-12-04T10:08:42.1631962Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T10:08:42.1632213Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T10:08:42.1632671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T10:08:42.1632797Z     return compile_fx_aot(
2025-12-04T10:08:42.1633288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T10:08:42.1633413Z     compiled_artifacts = compile_fx(
2025-12-04T10:08:42.1633903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T10:08:42.1634010Z     return compile_fx(
2025-12-04T10:08:42.1634488Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T10:08:42.1634623Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T10:08:42.1635199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T10:08:42.1635331Z     return _compile_fx_main(
2025-12-04T10:08:42.1635831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T10:08:42.1636032Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T10:08:42.1636563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T10:08:42.1636719Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1637230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T10:08:42.1637347Z     return compile_fx_forward(
2025-12-04T10:08:42.1637873Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T10:08:42.1638001Z     return inner_compile(
2025-12-04T10:08:42.1638284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T10:08:42.1638399Z     return func(*args, **kwds)
2025-12-04T10:08:42.1638914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T10:08:42.1639183Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T10:08:42.1639842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T10:08:42.1640022Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1640525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T10:08:42.1640734Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:08:42.1641237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T10:08:42.1641458Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:08:42.1641991Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:08:42.1642317Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:08:42.1642852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T10:08:42.1642986Z     _check_triton_bf16_support(graph)
2025-12-04T10:08:42.1643551Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T10:08:42.1643676Z     warn_and_skip(node.get_device())
2025-12-04T10:08:42.1644162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T10:08:42.1644326Z     raise SkipFrame("BF16 is not supported")
2025-12-04T10:08:42.1644585Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1644592Z 
2025-12-04T10:08:42.1644811Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1645546Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda
2025-12-04T10:08:42.1645551Z 
2025-12-04T10:08:42.1645826Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1646067Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1646177Z unimplemented []
2025-12-04T10:08:42.1646345Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T10:08:42.1646608Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:08:42.1646717Z graph_break []
2025-12-04T10:08:42.1646937Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1647772Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T10:08:42.1647893Z   return cls.__new__(cls, *args)
2025-12-04T10:08:42.1648640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1648749Z   warnings.warn(
2025-12-04T10:08:42.1649114Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda _
2025-12-04T10:08:42.1649255Z Traceback (most recent call last):
2025-12-04T10:08:42.1649816Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 1969, in test_size_with_unbacked_add_expr_transitive
2025-12-04T10:08:42.1650056Z     self.check_model(Repro(), example_inputs, dynamic_shapes=spec)
2025-12-04T10:08:42.1650492Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T10:08:42.1650615Z     actual = AOTIRunnerUtil.run(
2025-12-04T10:08:42.1651016Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T10:08:42.1651158Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T10:08:42.1651563Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T10:08:42.1651847Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T10:08:42.1652376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T10:08:42.1652520Z     return aot_inductor_minifier_wrapper(
2025-12-04T10:08:42.1653065Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1653216Z     raise e
2025-12-04T10:08:42.1653769Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1653868Z     return func(
2025-12-04T10:08:42.1654427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T10:08:42.1654658Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T10:08:42.1655118Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T10:08:42.1655246Z     return compile_fx_aot(
2025-12-04T10:08:42.1655737Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T10:08:42.1655861Z     compiled_artifacts = compile_fx(
2025-12-04T10:08:42.1656424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T10:08:42.1656543Z     return compile_fx(
2025-12-04T10:08:42.1657028Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T10:08:42.1657164Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T10:08:42.1657738Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T10:08:42.1657871Z     return _compile_fx_main(
2025-12-04T10:08:42.1658374Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T10:08:42.1658578Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T10:08:42.1659114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T10:08:42.1659264Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1659780Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T10:08:42.1659900Z     return compile_fx_forward(
2025-12-04T10:08:42.1660413Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T10:08:42.1660537Z     return inner_compile(
2025-12-04T10:08:42.1660818Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T10:08:42.1660950Z     return func(*args, **kwds)
2025-12-04T10:08:42.1661444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T10:08:42.1661709Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T10:08:42.1662217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T10:08:42.1662398Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1662900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T10:08:42.1663106Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:08:42.1663608Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T10:08:42.1663768Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:08:42.1664393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:08:42.1664718Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:08:42.1665247Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T10:08:42.1665430Z     _check_triton_bf16_support(graph)
2025-12-04T10:08:42.1665984Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T10:08:42.1666107Z     warn_and_skip(node.get_device())
2025-12-04T10:08:42.1666590Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T10:08:42.1666745Z     raise SkipFrame("BF16 is not supported")
2025-12-04T10:08:42.1667004Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1667014Z 
2025-12-04T10:08:42.1667232Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1667964Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda
2025-12-04T10:08:42.1667969Z 
2025-12-04T10:08:42.1668238Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1668479Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1668585Z unimplemented []
2025-12-04T10:08:42.1668750Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T10:08:42.1669008Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:08:42.1669110Z graph_break []
2025-12-04T10:08:42.1669327Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1670162Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T10:08:42.1670281Z   return cls.__new__(cls, *args)
2025-12-04T10:08:42.1671192Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1671300Z   warnings.warn(
2025-12-04T10:08:42.1671525Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1671647Z unimplemented []
2025-12-04T10:08:42.1671816Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T10:08:42.1672077Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:08:42.1672179Z graph_break []
2025-12-04T10:08:42.1672397Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1673226Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T10:08:42.1673344Z   return cls.__new__(cls, *args)
2025-12-04T10:08:42.1674071Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1674186Z   warnings.warn(
2025-12-04T10:08:42.1674337Z =================================== FAILURES ===================================
2025-12-04T10:08:42.1674707Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda _
2025-12-04T10:08:42.1674831Z Traceback (most recent call last):
2025-12-04T10:08:42.1675393Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 1969, in test_size_with_unbacked_add_expr_transitive
2025-12-04T10:08:42.1675629Z     self.check_model(Repro(), example_inputs, dynamic_shapes=spec)
2025-12-04T10:08:42.1676190Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T10:08:42.1676318Z     actual = AOTIRunnerUtil.run(
2025-12-04T10:08:42.1676717Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T10:08:42.1676863Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T10:08:42.1677284Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T10:08:42.1677568Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T10:08:42.1678094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T10:08:42.1678237Z     return aot_inductor_minifier_wrapper(
2025-12-04T10:08:42.1678781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1678889Z     raise e
2025-12-04T10:08:42.1679433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1679532Z     return func(
2025-12-04T10:08:42.1680094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T10:08:42.1680327Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T10:08:42.1680793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T10:08:42.1680918Z     return compile_fx_aot(
2025-12-04T10:08:42.1681411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T10:08:42.1681548Z     compiled_artifacts = compile_fx(
2025-12-04T10:08:42.1682018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T10:08:42.1682127Z     return compile_fx(
2025-12-04T10:08:42.1682606Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T10:08:42.1682744Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T10:08:42.1683312Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T10:08:42.1683442Z     return _compile_fx_main(
2025-12-04T10:08:42.1683941Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T10:08:42.1684153Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T10:08:42.1684672Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T10:08:42.1684821Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1685338Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T10:08:42.1685456Z     return compile_fx_forward(
2025-12-04T10:08:42.1685981Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T10:08:42.1686091Z     return inner_compile(
2025-12-04T10:08:42.1686371Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T10:08:42.1686505Z     return func(*args, **kwds)
2025-12-04T10:08:42.1687002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T10:08:42.1687266Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T10:08:42.1687769Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T10:08:42.1687943Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1688520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T10:08:42.1688714Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:08:42.1689211Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T10:08:42.1689375Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:08:42.1689969Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:08:42.1690306Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:08:42.1690825Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T10:08:42.1690953Z     _check_triton_bf16_support(graph)
2025-12-04T10:08:42.1691517Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T10:08:42.1691642Z     warn_and_skip(node.get_device())
2025-12-04T10:08:42.1692123Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T10:08:42.1692278Z     raise SkipFrame("BF16 is not supported")
2025-12-04T10:08:42.1692534Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1692544Z 
2025-12-04T10:08:42.1692775Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1693493Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda
2025-12-04T10:08:42.1693499Z 
2025-12-04T10:08:42.1693766Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1693999Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1694108Z unimplemented []
2025-12-04T10:08:42.1694286Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T10:08:42.1694530Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:08:42.1694631Z graph_break []
2025-12-04T10:08:42.1694860Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1695681Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T10:08:42.1695800Z   return cls.__new__(cls, *args)
2025-12-04T10:08:42.1696615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1696725Z   warnings.warn(
2025-12-04T10:08:42.1696958Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1697069Z unimplemented []
2025-12-04T10:08:42.1697238Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T10:08:42.1697501Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:08:42.1697602Z graph_break []
2025-12-04T10:08:42.1697822Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1698647Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T10:08:42.1698771Z   return cls.__new__(cls, *args)
2025-12-04T10:08:42.1699509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1699615Z   warnings.warn(
2025-12-04T10:08:42.1699832Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1700026Z unimplemented []
2025-12-04T10:08:42.1700193Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T10:08:42.1700438Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:08:42.1700553Z graph_break []
2025-12-04T10:08:42.1700770Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1701592Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T10:08:42.1701767Z   return cls.__new__(cls, *args)
2025-12-04T10:08:42.1702489Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1702612Z   warnings.warn(
2025-12-04T10:08:42.1703363Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-b570798f966501a4.xml -
2025-12-04T10:08:42.1703554Z =========================== short test summary info ============================
2025-12-04T10:08:42.1704471Z FAILED [0.6119s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1704483Z 
2025-12-04T10:08:42.1704702Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1705434Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda
2025-12-04T10:08:42.1705439Z 
2025-12-04T10:08:42.1705710Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1705909Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:08:42.1706117Z ================== 1 failed, 157 deselected, 2 rerun in 5.48s ==================
2025-12-04T10:08:42.1706221Z Got exit code 1
2025-12-04T10:08:42.1706348Z Retrying single test...
2025-12-04T10:08:42.1706794Z W1204 10:06:37.377000 23695 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:08:42.1707375Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-be9e2a318f1480ff.xml
2025-12-04T10:08:42.1707549Z ============================= test session starts ==============================
2025-12-04T10:08:42.1707901Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:08:42.1708032Z cachedir: .pytest_cache
2025-12-04T10:08:42.1708556Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:08:42.1708684Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:08:42.1708807Z configfile: pytest.ini
2025-12-04T10:08:42.1709352Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:08:42.1709598Z collecting ... collected 934 items / 157 deselected / 777 selected
2025-12-04T10:08:42.1710391Z stepcurrent: skipping 95 already run items. Running only test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda
2025-12-04T10:08:42.1710517Z Running 1 items in this shard
2025-12-04T10:08:42.1710522Z 
2025-12-04T10:08:42.1711227Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda ('RERUN', {'yellow': True}) [4.1921s] [100%]
2025-12-04T10:08:42.1711911Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda ('RERUN', {'yellow': True}) [0.6323s] [100%]
2025-12-04T10:08:42.1712612Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda FAILED [0.6361s] [100%]
2025-12-04T10:08:42.1712619Z 
2025-12-04T10:08:42.1712762Z ==================================== RERUNS ====================================
2025-12-04T10:08:42.1713123Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda _
2025-12-04T10:08:42.1713263Z Traceback (most recent call last):
2025-12-04T10:08:42.1713827Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 1969, in test_size_with_unbacked_add_expr_transitive
2025-12-04T10:08:42.1714131Z     self.check_model(Repro(), example_inputs, dynamic_shapes=spec)
2025-12-04T10:08:42.1714566Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T10:08:42.1714690Z     actual = AOTIRunnerUtil.run(
2025-12-04T10:08:42.1715092Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T10:08:42.1715234Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T10:08:42.1715641Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T10:08:42.1715854Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T10:08:42.1716381Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T10:08:42.1716525Z     return aot_inductor_minifier_wrapper(
2025-12-04T10:08:42.1717074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1717170Z     raise e
2025-12-04T10:08:42.1717721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1717824Z     return func(
2025-12-04T10:08:42.1718368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T10:08:42.1718617Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T10:08:42.1719074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T10:08:42.1719202Z     return compile_fx_aot(
2025-12-04T10:08:42.1719694Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T10:08:42.1719825Z     compiled_artifacts = compile_fx(
2025-12-04T10:08:42.1720307Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T10:08:42.1720414Z     return compile_fx(
2025-12-04T10:08:42.1720895Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T10:08:42.1721031Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T10:08:42.1721606Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T10:08:42.1721736Z     return _compile_fx_main(
2025-12-04T10:08:42.1722238Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T10:08:42.1722440Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T10:08:42.1722971Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T10:08:42.1723126Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1723640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T10:08:42.1723758Z     return compile_fx_forward(
2025-12-04T10:08:42.1724272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T10:08:42.1724394Z     return inner_compile(
2025-12-04T10:08:42.1724737Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T10:08:42.1724853Z     return func(*args, **kwds)
2025-12-04T10:08:42.1725362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T10:08:42.1725625Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T10:08:42.1726191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T10:08:42.1726367Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1726870Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T10:08:42.1727078Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:08:42.1727584Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T10:08:42.1727746Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:08:42.1728279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:08:42.1728598Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:08:42.1729133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T10:08:42.1729259Z     _check_triton_bf16_support(graph)
2025-12-04T10:08:42.1729816Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T10:08:42.1729940Z     warn_and_skip(node.get_device())
2025-12-04T10:08:42.1730422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T10:08:42.1730583Z     raise SkipFrame("BF16 is not supported")
2025-12-04T10:08:42.1730841Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1730847Z 
2025-12-04T10:08:42.1731067Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1731796Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda
2025-12-04T10:08:42.1731807Z 
2025-12-04T10:08:42.1732075Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1732310Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1732416Z unimplemented []
2025-12-04T10:08:42.1732581Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T10:08:42.1732842Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:08:42.1732944Z graph_break []
2025-12-04T10:08:42.1733166Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1733991Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T10:08:42.1734109Z   return cls.__new__(cls, *args)
2025-12-04T10:08:42.1734848Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1734957Z   warnings.warn(
2025-12-04T10:08:42.1735315Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda _
2025-12-04T10:08:42.1735454Z Traceback (most recent call last):
2025-12-04T10:08:42.1736014Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 1969, in test_size_with_unbacked_add_expr_transitive
2025-12-04T10:08:42.1736250Z     self.check_model(Repro(), example_inputs, dynamic_shapes=spec)
2025-12-04T10:08:42.1736818Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T10:08:42.1736945Z     actual = AOTIRunnerUtil.run(
2025-12-04T10:08:42.1737343Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T10:08:42.1737486Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T10:08:42.1737889Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T10:08:42.1738168Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T10:08:42.1738693Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T10:08:42.1738840Z     return aot_inductor_minifier_wrapper(
2025-12-04T10:08:42.1739377Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1739470Z     raise e
2025-12-04T10:08:42.1740023Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1740123Z     return func(
2025-12-04T10:08:42.1740669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T10:08:42.1740913Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T10:08:42.1741373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T10:08:42.1741499Z     return compile_fx_aot(
2025-12-04T10:08:42.1741990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T10:08:42.1742114Z     compiled_artifacts = compile_fx(
2025-12-04T10:08:42.1742597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T10:08:42.1742706Z     return compile_fx(
2025-12-04T10:08:42.1743183Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T10:08:42.1743318Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T10:08:42.1743891Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T10:08:42.1744022Z     return _compile_fx_main(
2025-12-04T10:08:42.1744522Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T10:08:42.1744724Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T10:08:42.1745253Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T10:08:42.1745402Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1745920Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T10:08:42.1746037Z     return compile_fx_forward(
2025-12-04T10:08:42.1746551Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T10:08:42.1746673Z     return inner_compile(
2025-12-04T10:08:42.1746957Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T10:08:42.1747071Z     return func(*args, **kwds)
2025-12-04T10:08:42.1747576Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T10:08:42.1747842Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T10:08:42.1748344Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T10:08:42.1748580Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1749084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T10:08:42.1749293Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:08:42.1749794Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T10:08:42.1750030Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:08:42.1750564Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:08:42.1750887Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:08:42.1751420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T10:08:42.1751548Z     _check_triton_bf16_support(graph)
2025-12-04T10:08:42.1752114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T10:08:42.1752239Z     warn_and_skip(node.get_device())
2025-12-04T10:08:42.1752722Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T10:08:42.1752876Z     raise SkipFrame("BF16 is not supported")
2025-12-04T10:08:42.1753137Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1753142Z 
2025-12-04T10:08:42.1753360Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1754093Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda
2025-12-04T10:08:42.1754099Z 
2025-12-04T10:08:42.1754366Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1754605Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1754711Z unimplemented []
2025-12-04T10:08:42.1754875Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T10:08:42.1755129Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:08:42.1755230Z graph_break []
2025-12-04T10:08:42.1755449Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1756280Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T10:08:42.1756398Z   return cls.__new__(cls, *args)
2025-12-04T10:08:42.1757136Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1757241Z   warnings.warn(
2025-12-04T10:08:42.1757465Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1757583Z unimplemented []
2025-12-04T10:08:42.1757746Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T10:08:42.1758002Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:08:42.1758102Z graph_break []
2025-12-04T10:08:42.1758317Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1759154Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T10:08:42.1759273Z   return cls.__new__(cls, *args)
2025-12-04T10:08:42.1759997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1760118Z   warnings.warn(
2025-12-04T10:08:42.1760342Z =================================== FAILURES ===================================
2025-12-04T10:08:42.1760718Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda _
2025-12-04T10:08:42.1760843Z Traceback (most recent call last):
2025-12-04T10:08:42.1761410Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 1969, in test_size_with_unbacked_add_expr_transitive
2025-12-04T10:08:42.1761652Z     self.check_model(Repro(), example_inputs, dynamic_shapes=spec)
2025-12-04T10:08:42.1762147Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T10:08:42.1762271Z     actual = AOTIRunnerUtil.run(
2025-12-04T10:08:42.1762672Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T10:08:42.1762818Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T10:08:42.1763242Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T10:08:42.1763451Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T10:08:42.1763972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T10:08:42.1764123Z     return aot_inductor_minifier_wrapper(
2025-12-04T10:08:42.1764666Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1764781Z     raise e
2025-12-04T10:08:42.1765319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T10:08:42.1765419Z     return func(
2025-12-04T10:08:42.1765983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T10:08:42.1766219Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T10:08:42.1766680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T10:08:42.1766811Z     return compile_fx_aot(
2025-12-04T10:08:42.1767303Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T10:08:42.1767444Z     compiled_artifacts = compile_fx(
2025-12-04T10:08:42.1768051Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T10:08:42.1768168Z     return compile_fx(
2025-12-04T10:08:42.1768650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T10:08:42.1768787Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T10:08:42.1769364Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T10:08:42.1769491Z     return _compile_fx_main(
2025-12-04T10:08:42.1769996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T10:08:42.1770215Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T10:08:42.1770730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T10:08:42.1770882Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1771583Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T10:08:42.1771701Z     return compile_fx_forward(
2025-12-04T10:08:42.1772236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T10:08:42.1772349Z     return inner_compile(
2025-12-04T10:08:42.1772631Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T10:08:42.1772897Z     return func(*args, **kwds)
2025-12-04T10:08:42.1773393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T10:08:42.1773657Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T10:08:42.1774160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T10:08:42.1774429Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T10:08:42.1774943Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T10:08:42.1775137Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:08:42.1775641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T10:08:42.1775799Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:08:42.1776398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:08:42.1776735Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:08:42.1777251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T10:08:42.1777386Z     _check_triton_bf16_support(graph)
2025-12-04T10:08:42.1777950Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T10:08:42.1778073Z     warn_and_skip(node.get_device())
2025-12-04T10:08:42.1778561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T10:08:42.1778717Z     raise SkipFrame("BF16 is not supported")
2025-12-04T10:08:42.1778974Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1778985Z 
2025-12-04T10:08:42.1779220Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1779943Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda
2025-12-04T10:08:42.1779949Z 
2025-12-04T10:08:42.1780219Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1780463Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1780570Z unimplemented []
2025-12-04T10:08:42.1780748Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T10:08:42.1780992Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:08:42.1781094Z graph_break []
2025-12-04T10:08:42.1781327Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1782150Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T10:08:42.1782268Z   return cls.__new__(cls, *args)
2025-12-04T10:08:42.1783009Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1783114Z   warnings.warn(
2025-12-04T10:08:42.1783351Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1783456Z unimplemented []
2025-12-04T10:08:42.1783622Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T10:08:42.1783878Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:08:42.1783979Z graph_break []
2025-12-04T10:08:42.1784192Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1785102Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T10:08:42.1785226Z   return cls.__new__(cls, *args)
2025-12-04T10:08:42.1785962Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1786068Z   warnings.warn(
2025-12-04T10:08:42.1786354Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:08:42.1786471Z unimplemented []
2025-12-04T10:08:42.1786633Z stats [('calls_captured', 22), ('unique_graphs', 1)]
2025-12-04T10:08:42.1786876Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:08:42.1786991Z graph_break []
2025-12-04T10:08:42.1787207Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:08:42.1788030Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T10:08:42.1788146Z   return cls.__new__(cls, *args)
2025-12-04T10:08:42.1788867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:08:42.1788985Z   warnings.warn(
2025-12-04T10:08:42.1789746Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-be9e2a318f1480ff.xml -
2025-12-04T10:08:42.1789940Z =========================== short test summary info ============================
2025-12-04T10:08:42.1790857Z FAILED [0.6361s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T10:08:42.1790864Z 
2025-12-04T10:08:42.1791087Z To execute this test, run the following from the base repo dir:
2025-12-04T10:08:42.1791822Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda
2025-12-04T10:08:42.1791829Z 
2025-12-04T10:08:42.1792098Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:08:42.1792294Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:08:42.1792503Z ================== 1 failed, 157 deselected, 2 rerun in 5.55s ==================
2025-12-04T10:08:42.1792605Z Got exit code 1
2025-12-04T10:08:42.1793257Z FAILED CONSISTENTLY: test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda
2025-12-04T10:08:42.1793670Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:08:42.1794135Z W1204 10:07:00.169000 23923 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:08:42.1794704Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-e62e290dfdad5699.xml
2025-12-04T10:08:42.1794869Z ============================= test session starts ==============================
2025-12-04T10:08:42.1795236Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:08:42.1795355Z cachedir: .pytest_cache
2025-12-04T10:08:42.1795878Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:08:42.1796016Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:08:42.1796126Z configfile: pytest.ini
2025-12-04T10:08:42.1796677Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:08:42.1796902Z collecting ... collected 934 items / 96 deselected / 838 selected
2025-12-04T10:08:42.1797136Z stepcurrent: skipping 96 already run items.
2025-12-04T10:08:42.1797267Z Running 62 items in this shard
2025-12-04T10:08:42.1797273Z 
2025-12-04T10:08:42.1797966Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sym_expr_indexing_cuda <- test/inductor/test_torchinductor.py PASSED [9.5457s] [  1%]
2025-12-04T10:08:42.1798828Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_torchvision_transforms_functional_tensor_resize_cuda <- test/inductor/test_torchinductor.py PASSED [8.3397s] [  3%]
2025-12-04T10:08:42.1799686Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_dynamic_shape_with_div_cuda <- test/inductor/test_torchinductor.py PASSED [5.5832s] [  4%]
2025-12-04T10:08:42.1800329Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_equal_to_1_float_arg_dynamic_True_cuda PASSED [6.2479s] [  6%]
2025-12-04T10:08:42.1801053Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_True_cuda PASSED [7.0983s] [  8%]
2025-12-04T10:08:42.1801990Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_new_cuda SKIPPED [0.0032s] (requires triton.tools.tensor_descriptor TMA support) [  9%]
2025-12-04T10:08:42.1802760Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_reinterpret_view_cuda <- test/inductor/test_torchinductor.py PASSED [6.4212s] [ 11%]
2025-12-04T10:08:42.1803505Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_sympy_fn_like_arg_cuda <- test/inductor/test_torchinductor.py PASSED [6.2953s] [ 12%]
2025-12-04T10:08:42.1804206Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_cuda PASSED [7.8616s] [ 14%]
2025-12-04T10:08:42.1806278Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_upper_bound_i64_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0008s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/159860 for platform(s) inductor, linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 16%]
2025-12-04T10:08:42.1807000Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_weight_on_disk_legacy_cuda <- test/inductor/test_torchinductor.py PASSED [6.4071s] [ 17%]
2025-12-04T10:08:42.1808255Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_mixed_device_dynamic_True_cuda W1204 10:08:06.093000 23923 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T10:08:42.1808922Z W1204 10:08:06.094000 23923 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T10:08:42.1809332Z W1204 10:08:06.553000 23923 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:08:42.1809439Z PASSED [7.0360s] [ 19%]
2025-12-04T10:08:42.1810637Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_parameters_cuda W1204 10:08:13.146000 23923 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T10:08:42.1810749Z PASSED [7.0724s] [ 20%]
2025-12-04T10:08:42.1811996Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_sym_expr_cond_dynamic_True_cuda W1204 10:08:20.203000 23923 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T10:08:42.1812736Z W1204 10:08:20.203000 23923 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T10:08:42.1812843Z PASSED [7.2707s] [ 22%]
2025-12-04T10:08:42.1813981Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_with_profiler_cuda <- test/inductor/test_torchinductor.py W1204 10:08:27.544000 23923 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:08:42.1814142Z PASSED [6.1129s] [ 24%]
2025-12-04T10:08:42.1814821Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_size_buffer_cuda <- test/inductor/test_torchinductor.py PASSED [5.7960s] [ 25%]
2025-12-04T10:08:42.1815581Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__int_mm_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (No MPS backend available) [ 27%]
2025-12-04T10:08:42.1816232Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_addmm_multiple_dynamic_mps SKIPPED [0.0002s] (No MPS backend available) [ 29%]
2025-12-04T10:08:42.1817148Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_constant_tensor_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 30%]
2025-12-04T10:08:42.1818075Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printing_model_inputs_codegen_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 32%]
2025-12-04T10:08:42.1818917Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_runtime_asserts_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0005s] (No MPS backend available) [ 33%]
2025-12-04T10:08:42.1819726Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_assert_tensor_meta_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 35%]
2025-12-04T10:08:42.1820372Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_backward_no_op_logging_mps SKIPPED [0.0002s] (No MPS backend available) [ 37%]
2025-12-04T10:08:42.1821196Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_1_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 38%]
2025-12-04T10:08:42.1821817Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_3_mps SKIPPED [0.0002s] (No MPS backend available) [ 40%]
2025-12-04T10:08:42.1822657Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_predicate_on_cpu_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 41%]
2025-12-04T10:08:42.1823397Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_unbacked_symint_closure_dynamic_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 43%]
2025-12-04T10:08:42.1824295Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_use_buffers_from_outer_scope_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 45%]
2025-12-04T10:08:42.1825117Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_parameters_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 46%]
2025-12-04T10:08:42.1826044Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_reinterpret_view_inputs_outputs_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (No MPS backend available) [ 48%]
2025-12-04T10:08:42.1826818Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_d2h_copy_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 50%]
2025-12-04T10:08:42.1827604Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_cat_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 51%]
2025-12-04T10:08:42.1828311Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fp8_view_of_param_mps SKIPPED [0.0002s] (No MPS backend available) [ 53%]
2025-12-04T10:08:42.1829128Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_index_put_fallback_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 54%]
2025-12-04T10:08:42.1829791Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_index_put_with_none_index_mps SKIPPED [0.0004s] (No MPS backend available) [ 56%]
2025-12-04T10:08:42.1830674Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_mmaped_weights_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 58%]
2025-12-04T10:08:42.1831473Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_libtorch_free_so_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 59%]
2025-12-04T10:08:42.1832135Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_misc_1_max_autotune_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 61%]
2025-12-04T10:08:42.1832932Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_path_2_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 62%]
2025-12-04T10:08:42.1833775Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_poi_multiple_dynamic_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (No MPS backend available) [ 64%]
2025-12-04T10:08:42.1834428Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_quanatized_int8_linear_mps SKIPPED [0.0002s] (No MPS backend available) [ 66%]
2025-12-04T10:08:42.1835260Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeat_interleave_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (No MPS backend available) [ 67%]
2025-12-04T10:08:42.1836072Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeated_calling_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 69%]
2025-12-04T10:08:42.1836771Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_device_type_failed_mps SKIPPED [0.0002s] (No MPS backend available) [ 70%]
2025-12-04T10:08:42.1837349Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sdpa_2_mps SKIPPED [0.0002s] (No MPS backend available) [ 72%]
2025-12-04T10:08:42.1838206Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_shifted_constraint_ranges_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 74%]
2025-12-04T10:08:42.1839049Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_size_from_multi_output_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 75%]
2025-12-04T10:08:42.1839833Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_subclasses_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 77%]
2025-12-04T10:08:42.1840632Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_symbool_item_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 79%]
2025-12-04T10:08:42.1841411Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_symint_item_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 80%]
2025-12-04T10:08:42.1842259Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_equal_to_1_arg_mps <- test/inductor/test_torchinductor.py SKIPPED [0.2830s] (No MPS backend available) [ 82%]
2025-12-04T10:08:42.1843105Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_True_mps SKIPPED [0.0003s] (No MPS backend available) [ 83%]
2025-12-04T10:08:42.1843938Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 85%]
2025-12-04T10:08:42.1853744Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 87%]
2025-12-04T10:08:42.1854737Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_sympy_expr_arg_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 88%]
2025-12-04T10:08:42.1855681Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 90%]
2025-12-04T10:08:42.1856618Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_0_use_static_size_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 91%]
2025-12-04T10:08:42.1857430Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_1_use_static_size_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 93%]
2025-12-04T10:08:42.1858179Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_sym_expr_cond_dynamic_True_mps SKIPPED [0.0004s] (No MPS backend available) [ 95%]
2025-12-04T10:08:42.1858970Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_unbacked_symint_closure_dynamic_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 96%]
2025-12-04T10:08:42.1859769Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_unbacked_symint_closure_dynamic_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 98%]
2025-12-04T10:08:42.1860571Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_size_buffer_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [100%]
2025-12-04T10:08:42.1860578Z 
2025-12-04T10:08:42.1861343Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-e62e290dfdad5699.xml -
2025-12-04T10:08:42.1861583Z =========== 14 passed, 48 skipped, 96 deselected in 97.55s (0:01:37) ===========
2025-12-04T10:08:42.1863463Z The following tests failed consistently: ['test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda', 'test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda', 'test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda']
2025-12-04T10:08:42.1863474Z 
2025-12-04T10:08:42.1864040Z FINISHED PRINTING LOG FILE of inductor/test_aot_inductor 6/6 (test/test-reports/inductor.test_aot_inductor_6.6_462385258b0b1d27_.log)
2025-12-04T10:08:42.1864046Z 
2025-12-04T10:08:42.1864404Z Finished inductor/test_aot_inductor 6/6 ... [2025-12-04 10:08:41.954931][3350.337820076], took 14.14min
2025-12-04T10:08:42.1865224Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-bf15e775351f3d84.xml
2025-12-04T10:08:42.1866050Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-cd1c50b62bb47a1b.xml
2025-12-04T10:08:42.1866863Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-3e5313e420476f15.xml
2025-12-04T10:08:42.1867663Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-b23b654b51890d24.xml
2025-12-04T10:08:42.1868451Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-2e7c8f13f7be0603.xml
2025-12-04T10:08:42.1869347Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-8d6cdce6581fa448.xml
2025-12-04T10:08:42.2068273Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-9dce38c1d023996d.xml
2025-12-04T10:08:42.2562518Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-b570798f966501a4.xml
2025-12-04T10:08:42.2878467Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-be9e2a318f1480ff.xml
2025-12-04T10:08:42.3207326Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-e62e290dfdad5699.xml
2025-12-04T10:08:42.6633827Z Uploading logs for 57119749248 to S3
2025-12-04T10:08:42.7007168Z Uploading artifacts took 0.34 seconds
2025-12-04T10:08:42.7007630Z inductor/test_aot_inductor 6/6 failed!
2025-12-04T10:08:42.7012218Z Running inductor/test_torchinductor_codegen_dynamic_shapes 2/4 ... [2025-12-04 10:08:42.701016][3351.083910326]
2025-12-04T10:08:42.7012929Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:08:42.7017306Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_codegen_dynamic_shapes.py', '--shard-id=2', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:08:42.701476]
2025-12-04T10:19:47.8566431Z 
2025-12-04T10:19:47.8567675Z PRINTING LOG FILE of inductor/test_torchinductor_codegen_dynamic_shapes 2/4 (test/test-reports/inductor.test_torchinductor_codegen_dynamic_shapes_2.4_37f84ce4dcc870f4_.log)
2025-12-04T10:19:47.8569310Z W1204 10:08:51.991000 26638 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:19:47.8571615Z Test results will be stored in test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-0c75da116b2f10f8.xml
2025-12-04T10:19:47.8573239Z ============================= test session starts ==============================
2025-12-04T10:19:47.8574144Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:19:47.8574990Z cachedir: .pytest_cache
2025-12-04T10:19:47.8576190Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:19:47.8577682Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:19:47.8578231Z configfile: pytest.ini
2025-12-04T10:19:47.8579569Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:19:47.8580391Z collecting ... collected 1750 items
2025-12-04T10:19:47.8580851Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T10:19:47.8888534Z Running 441 items in this shard: test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_AllenaiLongformerBase_repro_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__dyn_quant_matmul_4bit_fp32_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__dyn_quant_pack_4bit_weight_bf16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__dyn_quant_pack_4bit_weight_fp32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__unsafe_masked_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__unsafe_masked_index_put_accumulate_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_adaptive_avg_pool2d1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex10_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_inplace_permuted_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_adding_tensor_offsets_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_alexnet_prefix_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_allow_reuse_disable_if_exceed_peak_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_angle_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_any_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_cache_hit_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_with_persistent_cache_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_arange1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_argmax_argmin1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_as_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool2d_backward4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bitwise2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bitwise3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bmm1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int32_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int64_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_uint8_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_buffer_batch_norm_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_buffer_use_after_remove_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_builtins_round_float_ndigits_neg_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_empty_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_extern_kernel_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_unbacked_legacy_empty_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cauchy_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_check_stack_no_cycles_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_clamp_type_promotion_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_complex_from_real_imag_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_concat_add_inplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_consecutive_split_cumsum_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_constant_pad_1d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_constant_pad_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv1d_depthwise_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv1d_with_permute_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv3d_channels_last_use_block_ptr_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv_inference_heuristics_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv_with_as_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_convolution1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_convolution2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_convolution3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_copy_non_blocking_is_pinned_use_cat_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cpu_scalar_with_cpu_scalar_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cumprod_zero_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_custom_op_1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_custom_scan_op_compiled_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_deterministic_codegen_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_deterministic_codegen_on_graph_break_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div_presicion_accuracy_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dont_constant_fold_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dropout2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dropout3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dropout_trivial_0_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtype_sympy_expr_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float16_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float16_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_fusion_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int32_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int64_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int64_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int64_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int8_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int8_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_uint8_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_uint8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_uint8_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_elu_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_empty2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_erfinv_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_expanded_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fallback_mutable_op_with_return_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fill1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_floordiv_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fmod_zero_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_forced_buffer_realize_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fractional_max_pool2d3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fractional_max_pool2d4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fractional_max_pool2d5_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_generated_code_has_size_stride_assert_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_getitem_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_glu_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_arange1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_arange2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_both_scalars_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_misaligned_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_hardsigmoid_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_hardswish_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_float_zero_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_propagation_floordiv_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_propagation_remainder_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put_fallback1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put_fallback2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put_reinplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inductor_triton_bucketize_respects_masking_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_activations_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_add_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_mixed_dtype_ops_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_resize_as_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_where_pointwise_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_input_mutation1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_int_input_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_issue102546_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_large_broadcast_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_large_grid_use_block_ptr_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_large_offset_pointwise_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_layer_norm_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_leaky_relu_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_linear2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_linear_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_list_clearing_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_lite_mode_fallback_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_lite_mode_not_decompose_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_lite_regional_compile_invoke_subgraph_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_log1p_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_log2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_log_fp64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_logcumsumexp_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mark_dynamic_with_hint_override_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d6_dilation_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d_with_indices_backward5_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_min_max_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_min_max_reduction_nan_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mixed_mm2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mixed_mm3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mm_mixed_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_multilayer_sum_low_prec_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mutable_custom_op_fixed_layout2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_nan_sort_stable_True_descending_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_neg_max_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_new_empty_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_nll_loss_forward_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_one_hot_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pad_cast_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pattern_matcher_unbacked_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_bessel_j0_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_erf_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_erfc_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_exp2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_expm1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_gammaincc_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_modified_bessel_i0_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_multigammaln_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_ndtr_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_polygamma_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_scaled_modified_bessel_k1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_shifted_chebyshev_polynomial_v_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_xlogy_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pow3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pow_int_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_prepare_softmax_with_fast_math_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_rand_like_deterministic_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_reduction1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_reduction2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_reduction_config_limit_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_remainder_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_remove_noop_slice_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_remove_noop_view_default_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_repeat_as_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_repeat_interleave_Tensor_decomp_int32_nd_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_repeat_interleave_decomposition_has_clamp_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_resize_as_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_round_correctness_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_rsqrt_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scalar_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scaled_dot_product_efficient_attention_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter6_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter_bf16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scheduler_vertical_fusion1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_setitem_with_int_parameter_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sgn_extremal_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_should_pad_bench_for_bmm_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sigmoid_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sign_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sin_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_scatter_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_softmax_backward_data_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_softmax_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_softmax_one_kernel_persist_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sort_stable_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_cumsum_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_squeeze_varargs_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sum3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sum4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sum_int_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_to_device_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_to_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_torch_device_split_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_triu_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_uint_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unbacked_float_item_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unbacked_floordiv_simplify_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unbacked_floordiv_simplify_errors_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unbind_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unspec_inputs_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unspec_inputs_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unsqueeze_inplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_upsample_nearest2d_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_var_correction_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_var_mean_tile_reduction_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_views5_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_views7_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_weight_norm_conv2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_where_broadcast_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_xblock_divides_xnumel_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_AllenaiLongformerBase_repro_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test__dyn_quant_matmul_4bit_fp32_input_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test__dyn_quant_pack_4bit_weight_fp32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test__unsafe_masked_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_avg_pool_errors_with_long_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_avg_pool_with_output_size_0_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_max_pool2d1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_max_pool2d3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_pool_errors_with_long_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_complex5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_complex_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_complex_strided_fallback_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_inplace_permuted_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_alexnet_prefix_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_any_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_arange2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_arange6_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_argmax_argmin3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool2d2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool3d_backward3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_default_kwargs_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int32_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int64_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int64_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_nd_tiling_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_buffer_batch_norm_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_builtins_round_float_ndigits_pos_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_builtins_round_float_ndigits_zero_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cat_negative_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cat_of_loops_and_extern_kernel_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cat_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cauchy_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_chunk_recompiles_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_clamp_type_promotion_non_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_computed_buffer_inlining_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_config_option_dont_assume_alignment_recompiles_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_1d_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_nd_inplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv_bn_fuse_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv_inference_heuristics_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv_with_as_strided_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_copy_non_blocking_is_pinned_use_cat_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_copy_non_blocking_is_pinned_use_cat_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cpu_scalar_with_cpu_scalar_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cummin_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cumprod_zero_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_op_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_scan_op_compiled_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_scan_op_multi_input_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_scan_would_split_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_deterministic_codegen_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dist_bf16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_div1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_div3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtype_mismatch_issue_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float16_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float16_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float16_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float32_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float32_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float64_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_fusion_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int16_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int16_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int32_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int32_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int32_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int64_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int64_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int8_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_uint8_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_uint8_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_uint8_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_embedding_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_empty2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_empty_strided_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_emulate_precision_triton_fp_fusion_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_exp2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fallback_mutable_op_no_mutated_tensors_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fallback_mutable_op_with_return_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fft_real_input_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_float_index_expression_type_promotion_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_floordiv_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fuse_large_params_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_generated_code_has_alignment_assert_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_getitem_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_misaligned_input_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_refcount_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_hardsigmoid_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_horizonal_fusion2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_propagation_device_assert_masked_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_propagation_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_propagation_flip_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_propagation_floordiv_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put_fallback1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put_fallback2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_select_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_inplace_mixed_dtype_ops_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_int_input_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_invalid_operand_issue1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_isinf2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_issue102546_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_kernel_names_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_large_block_sizes_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_large_grid_use_block_ptr_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_large_pointwise_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_large_tensor_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lerp_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_like_rands_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linear1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linear2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linear_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_list_clearing_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_invoke_subgraph_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_log2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_low_memory_max_pool_dilation_1_dim_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_low_memory_max_pool_dilation_1_dim_3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_low_memory_max_pool_dilation_2_dim_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_min_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d4_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d6_dilation_1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d6_dilation_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d_with_indices_backward6_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mean_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mul_index_expr_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_multi_gpu_device_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_multilayer_var_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nan_assert_inside_triton_kernel_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nan_sort_stable_False_descending_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_new_empty_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nll_loss_forward_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pixel_shuffle_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_bessel_j1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_chebyshev_polynomial_t_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_chebyshev_polynomial_w_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_expit_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_gammaincc_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_i0_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_log1p_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_modified_bessel_k0_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_psi_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_round_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_shifted_chebyshev_polynomial_v_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pow3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pow_int_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pow_symfloat_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_randn_generator_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_randn_like_empty_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_reduction1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_reflection_pad2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_relu_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_remove_noop_view_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_repeat_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_repeat_interleave_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_repeat_interleave_Tensor_decomp_int32_nd_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_require_stride_expanded_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_resize_as_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_roll_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_rsqrt_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scalar_output_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scatter5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scatter_reduce1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scheduler_vertical_fusion1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sdpa_unaligned_mask_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_searchsorted_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_select_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sin_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_size_asserts_for_multi_output_fallback_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sizehint_issue1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_mutation3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_scatter2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_scatter_reinplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_softmax_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_softmax_one_kernel_persist_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_cumprod_low_prec_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_cumsum_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_cumsum_low_prec_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sqrt_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_squeeze1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sum3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sum5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_tanh_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_tmp_not_defined_issue1_use_block_ptr_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_to_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_triton_argmin_argmax_transpose_logical_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_triton_kernel_bool_param_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unfold_zero_dimension_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unroll_small_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unspec_inputs_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unspec_inputs_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unspec_inputs_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unsqueeze_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unsqueeze_inplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_upsample_bilinear2d_a_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_vectorized_ops_masked_var_novec_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_view_as_complex_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_view_as_real_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_view_detach_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_views2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_views3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_views7_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_zeros_dynamic_shapes_cuda
2025-12-04T10:19:47.9191670Z 
2025-12-04T10:19:47.9193493Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_AllenaiLongformerBase_repro_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py W1204 10:08:54.874000 26638 site-packages/torch/_dynamo/variables/torch.py:1533] [0/0] Calling <built-in method div of type object at 0x7fb939a06bc0> on only torch.SymInt arguments is not yet supported.
2025-12-04T10:19:47.9196111Z W1204 10:08:54.874000 26638 site-packages/torch/_dynamo/variables/torch.py:1533] [0/0] To support this behavior, we need to allow const-propping tensors that store symint data.
2025-12-04T10:19:47.9197681Z W1204 10:08:54.874000 26638 site-packages/torch/_dynamo/variables/torch.py:1533] [0/0] For now, dynamo will explicitly graph break when it encounters user code with this behavior.
2025-12-04T10:19:47.9198836Z W1204 10:08:54.874000 26638 site-packages/torch/_dynamo/variables/torch.py:1533] [0/0] 
2025-12-04T10:19:47.9199395Z XFAIL [23.3568s] [  0%]
2025-12-04T10:19:47.9200529Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__dyn_quant_matmul_4bit_fp32_input_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [4.6635s] [  0%]
2025-12-04T10:19:47.9202521Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__dyn_quant_pack_4bit_weight_bf16_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [4.3064s] [  0%]
2025-12-04T10:19:47.9204498Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__dyn_quant_pack_4bit_weight_fp32_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.5684s] [  0%]
2025-12-04T10:19:47.9206417Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__unsafe_masked_index_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9125s] [  1%]
2025-12-04T10:19:47.9208394Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__unsafe_masked_index_put_accumulate_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0043s] [  1%]
2025-12-04T10:19:47.9210371Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_adaptive_avg_pool2d1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [8.0236s] [  1%]
2025-12-04T10:19:47.9212235Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex10_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9747s] [  1%]
2025-12-04T10:19:47.9214318Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex4_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [3.3084s] [  2%]
2025-12-04T10:19:47.9216181Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (Skipped!) [  2%]
2025-12-04T10:19:47.9218122Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9021s] [  2%]
2025-12-04T10:19:47.9220154Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_inplace_permuted_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.2064s] [  2%]
2025-12-04T10:19:47.9222049Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_adding_tensor_offsets_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.2876s] [  2%]
2025-12-04T10:19:47.9223934Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_alexnet_prefix_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [8.2190s] [  3%]
2025-12-04T10:19:47.9225840Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_allow_reuse_disable_if_exceed_peak_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.5679s] [  3%]
2025-12-04T10:19:47.9227744Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_angle_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0761s] [  3%]
2025-12-04T10:19:47.9229558Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_any_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.3983s] [  3%]
2025-12-04T10:19:47.9231697Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_cache_hit_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py W1204 10:09:59.203000 26638 site-packages/torch/_export/__init__.py:71] +============================+
2025-12-04T10:19:47.9233405Z W1204 10:09:59.204000 26638 site-packages/torch/_export/__init__.py:72] |     !!!   WARNING   !!!    |
2025-12-04T10:19:47.9234248Z W1204 10:09:59.204000 26638 site-packages/torch/_export/__init__.py:73] +============================+
2025-12-04T10:19:47.9235966Z W1204 10:09:59.204000 26638 site-packages/torch/_export/__init__.py:74] torch._export.aot_compile()/torch._export.aot_load() is being deprecated, please switch to directly calling torch._inductor.aoti_compile_and_package(torch.export.export())/torch._inductor.aoti_load_package() instead.
2025-12-04T10:19:47.9237439Z PASSED [5.6791s] [  4%]
2025-12-04T10:19:47.9238565Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_with_persistent_cache_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [5.1087s] [  4%]
2025-12-04T10:19:47.9240446Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_arange1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.8514s] [  4%]
2025-12-04T10:19:47.9242237Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_argmax_argmin1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.1264s] [  4%]
2025-12-04T10:19:47.9244052Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_as_strided_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.9365s] [  4%]
2025-12-04T10:19:47.9245893Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool2d_backward4_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.3442s] [  5%]
2025-12-04T10:19:47.9247715Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bitwise2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0401s] [  5%]
2025-12-04T10:19:47.9249570Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bitwise3_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9872s] [  5%]
2025-12-04T10:19:47.9251329Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bmm1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.8679s] [  5%]
2025-12-04T10:19:47.9253151Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int32_int8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.2882s] [  6%]
2025-12-04T10:19:47.9255151Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int64_uint8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.2796s] [  6%]
2025-12-04T10:19:47.9257149Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_uint8_uint8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.1477s] [  6%]
2025-12-04T10:19:47.9259032Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_buffer_batch_norm_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.5560s] [  6%]
2025-12-04T10:19:47.9260907Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_buffer_use_after_remove_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [3.5134s] [  7%]
2025-12-04T10:19:47.9262863Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_builtins_round_float_ndigits_neg_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8061s] [  7%]
2025-12-04T10:19:47.9264731Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [3.2712s] [  7%]
2025-12-04T10:19:47.9266480Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_empty_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.8640s] [  7%]
2025-12-04T10:19:47.9268277Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_extern_kernel_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0362s] [  7%]
2025-12-04T10:19:47.9270085Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_uint8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.7988s] [  8%]
2025-12-04T10:19:47.9272093Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_unbacked_legacy_empty_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.0314s] [  8%]
2025-12-04T10:19:47.9273937Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cauchy_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8695s] [  8%]
2025-12-04T10:19:47.9275760Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_check_stack_no_cycles_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8353s] [  8%]
2025-12-04T10:19:47.9277630Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_clamp_type_promotion_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.8605s] [  9%]
2025-12-04T10:19:47.9279512Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_complex_from_real_imag_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.1709s] [  9%]
2025-12-04T10:19:47.9281381Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_concat_add_inplace_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.2000s] [  9%]
2025-12-04T10:19:47.9283274Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_consecutive_split_cumsum_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.1870s] [  9%]
2025-12-04T10:19:47.9285292Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_constant_pad_1d_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.2387s] [  9%]
2025-12-04T10:19:47.9287169Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_constant_pad_float64_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8992s] [ 10%]
2025-12-04T10:19:47.9289107Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv1d_depthwise_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.3846s] [ 10%]
2025-12-04T10:19:47.9291043Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv1d_with_permute_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (Skipped!) [ 10%]
2025-12-04T10:19:47.9293251Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv3d_channels_last_use_block_ptr_True_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0034s] (triton backend is required for cpu) [ 10%]
2025-12-04T10:19:47.9295367Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv_backward_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [1.0560s] [ 11%]
2025-12-04T10:19:47.9297420Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv_inference_heuristics_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0033s] (cuda only test) [ 11%]
2025-12-04T10:19:47.9299412Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv_with_as_strided_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [4.7183s] [ 11%]
2025-12-04T10:19:47.9301262Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_convolution1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [4.5091s] [ 11%]
2025-12-04T10:19:47.9303081Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_convolution2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.4406s] [ 12%]
2025-12-04T10:19:47.9304894Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_convolution3_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.4916s] [ 12%]
2025-12-04T10:19:47.9307219Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_copy_non_blocking_is_pinned_use_cat_True_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py W1204 10:10:58.614000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9309069Z W1204 10:10:58.616000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9309994Z W1204 10:10:58.617000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9310921Z W1204 10:10:58.618000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9311842Z W1204 10:10:58.618000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9312746Z W1204 10:10:58.619000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9313667Z W1204 10:10:58.620000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9314584Z W1204 10:10:58.621000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9315497Z W1204 10:10:58.622000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9316401Z W1204 10:10:58.622000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9317394Z W1204 10:10:58.623000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9318320Z W1204 10:10:58.624000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9319233Z W1204 10:10:58.625000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9320135Z W1204 10:10:58.625000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9321145Z W1204 10:10:58.626000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9322064Z W1204 10:10:58.627000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9322981Z W1204 10:10:58.628000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9323894Z W1204 10:10:58.628000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9324820Z W1204 10:10:58.629000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9325746Z W1204 10:10:58.630000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9326669Z W1204 10:10:58.631000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9327583Z W1204 10:10:58.631000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9328596Z W1204 10:10:58.632000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9329546Z W1204 10:10:58.633000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9330456Z W1204 10:10:58.634000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9331386Z W1204 10:10:58.635000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9332312Z W1204 10:10:58.635000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9333238Z W1204 10:10:58.636000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9334145Z W1204 10:10:58.637000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9335098Z W1204 10:10:58.638000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9336029Z W1204 10:10:58.638000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9337034Z W1204 10:10:58.639000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9337946Z W1204 10:10:58.640000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9338880Z W1204 10:10:58.641000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9339814Z W1204 10:10:58.641000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9340739Z W1204 10:10:58.642000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9341656Z W1204 10:10:58.643000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9342591Z W1204 10:10:58.644000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9343517Z W1204 10:10:58.645000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9344447Z W1204 10:10:58.646000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9345453Z W1204 10:10:58.646000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9346387Z W1204 10:10:58.647000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9347317Z W1204 10:10:58.648000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9348244Z W1204 10:10:58.649000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9349252Z W1204 10:10:58.650000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9350176Z W1204 10:10:58.650000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9351104Z W1204 10:10:58.651000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9352029Z W1204 10:10:58.652000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9352944Z W1204 10:10:58.653000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9353868Z W1204 10:10:58.653000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9354794Z W1204 10:10:58.654000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9355716Z W1204 10:10:58.655000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9356631Z W1204 10:10:58.656000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9357550Z W1204 10:10:58.656000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9358468Z W1204 10:10:58.657000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9359381Z W1204 10:10:58.658000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9360302Z W1204 10:10:58.659000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9361225Z W1204 10:10:58.660000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9362152Z W1204 10:10:58.660000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9363061Z W1204 10:10:58.661000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9363987Z W1204 10:10:58.662000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9364911Z W1204 10:10:58.663000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9365827Z W1204 10:10:58.663000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9366738Z W1204 10:10:58.664000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9367663Z W1204 10:10:58.665000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9368588Z W1204 10:10:58.666000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9369517Z W1204 10:10:58.666000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9370430Z W1204 10:10:58.667000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9371532Z W1204 10:10:58.668000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9372460Z W1204 10:10:58.669000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9373510Z W1204 10:10:58.670000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9374423Z W1204 10:10:58.670000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9375345Z W1204 10:10:58.671000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9376359Z W1204 10:10:58.672000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9377386Z W1204 10:10:58.673000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9378299Z W1204 10:10:58.673000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9379227Z W1204 10:10:58.674000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9380155Z W1204 10:10:58.675000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9381088Z W1204 10:10:58.676000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9382005Z W1204 10:10:58.676000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9382930Z W1204 10:10:58.677000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9383853Z W1204 10:10:58.678000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9384763Z W1204 10:10:58.679000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9385689Z W1204 10:10:58.679000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9386605Z W1204 10:10:58.680000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9387536Z W1204 10:10:58.681000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9388440Z W1204 10:10:58.682000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9389360Z W1204 10:10:58.683000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9390282Z W1204 10:10:58.683000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9391202Z W1204 10:10:58.684000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9392112Z W1204 10:10:58.685000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9393035Z W1204 10:10:58.686000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9393960Z W1204 10:10:58.686000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9394882Z W1204 10:10:58.687000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9395785Z W1204 10:10:58.688000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9396703Z W1204 10:10:58.689000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9397619Z W1204 10:10:58.689000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9398544Z W1204 10:10:58.690000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9399446Z W1204 10:10:58.691000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9400367Z W1204 10:10:58.692000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:19:47.9401006Z PASSED [12.0757s] [ 12%]
2025-12-04T10:19:47.9402189Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cpu_scalar_with_cpu_scalar_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.7626s] [ 12%]
2025-12-04T10:19:47.9404063Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cumprod_zero_dim_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.7309s] [ 12%]
2025-12-04T10:19:47.9405878Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_custom_op_1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.1294s] [ 13%]
2025-12-04T10:19:47.9408010Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_custom_scan_op_compiled_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0036s] (associative_scan only supported on GPU) [ 13%]
2025-12-04T10:19:47.9410149Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_deterministic_codegen_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [11.3208s] [ 13%]
2025-12-04T10:19:47.9412139Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_deterministic_codegen_on_graph_break_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.4426s] [ 13%]
2025-12-04T10:19:47.9414025Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9860s] [ 14%]
2025-12-04T10:19:47.9415853Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div_presicion_accuracy_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8129s] [ 14%]
2025-12-04T10:19:47.9417822Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dont_constant_fold_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.7506s] [ 14%]
2025-12-04T10:19:47.9419717Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dropout2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 14%]
2025-12-04T10:19:47.9421624Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dropout3_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 14%]
2025-12-04T10:19:47.9423479Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dropout_trivial_0_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.7720s] [ 15%]
2025-12-04T10:19:47.9425344Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtype_sympy_expr_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.2906s] [ 15%]
2025-12-04T10:19:47.9427231Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float16_int64_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.9703s] [ 15%]
2025-12-04T10:19:47.9429162Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float16_uint8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.9607s] [ 15%]
2025-12-04T10:19:47.9431070Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_int16_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.0044s] [ 16%]
2025-12-04T10:19:47.9432972Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_int32_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.9800s] [ 16%]
2025-12-04T10:19:47.9434894Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_int64_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.9573s] [ 16%]
2025-12-04T10:19:47.9436917Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_uint8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.0044s] [ 16%]
2025-12-04T10:19:47.9438810Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_fusion_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.7861s] [ 17%]
2025-12-04T10:19:47.9440688Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int32_uint8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.0043s] [ 17%]
2025-12-04T10:19:47.9442640Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int64_float16_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.0040s] [ 17%]
2025-12-04T10:19:47.9444566Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int64_float32_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.0153s] [ 17%]
2025-12-04T10:19:47.9446484Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int64_uint8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.0042s] [ 17%]
2025-12-04T10:19:47.9448376Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int8_int16_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.9378s] [ 18%]
2025-12-04T10:19:47.9450267Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int8_int32_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.9175s] [ 18%]
2025-12-04T10:19:47.9452156Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_uint8_float32_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.9450s] [ 18%]
2025-12-04T10:19:47.9454066Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_uint8_int8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.8972s] [ 18%]
2025-12-04T10:19:47.9455962Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_uint8_uint8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.8797s] [ 19%]
2025-12-04T10:19:47.9457847Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_elu_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.5956s] [ 19%]
2025-12-04T10:19:47.9459596Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_empty2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.1020s] [ 19%]
2025-12-04T10:19:47.9461337Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_erfinv_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.7959s] [ 19%]
2025-12-04T10:19:47.9463174Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_expanded_reduction_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0350s] [ 19%]
2025-12-04T10:19:47.9465117Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fallback_mutable_op_with_return_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.0538s] [ 20%]
2025-12-04T10:19:47.9466986Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fill1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.7918s] [ 20%]
2025-12-04T10:19:47.9468753Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_floordiv_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.5951s] [ 20%]
2025-12-04T10:19:47.9470537Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fmod_zero_dim_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.0226s] [ 20%]
2025-12-04T10:19:47.9473379Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_forced_buffer_realize_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (Skipped!) [ 21%]
2025-12-04T10:19:47.9475361Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fractional_max_pool2d3_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.3441s] [ 21%]
2025-12-04T10:19:47.9477374Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fractional_max_pool2d4_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [4.3824s] [ 21%]
2025-12-04T10:19:47.9479293Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fractional_max_pool2d5_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.7791s] [ 21%]
2025-12-04T10:19:47.9481459Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_generated_code_has_size_stride_assert_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0034s] (triton backend is required for cpu) [ 21%]
2025-12-04T10:19:47.9483651Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_getitem_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.0687s] [ 22%]
2025-12-04T10:19:47.9485392Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_glu_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [1.8233s] [ 22%]
2025-12-04T10:19:47.9487213Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_arange1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.6630s] [ 22%]
2025-12-04T10:19:47.9489144Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_arange2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.4891s] [ 22%]
2025-12-04T10:19:47.9491097Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_both_scalars_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8422s] [ 23%]
2025-12-04T10:19:47.9493083Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_misaligned_input_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.0153s] [ 23%]
2025-12-04T10:19:47.9495005Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_hardsigmoid_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8987s] [ 23%]
2025-12-04T10:19:47.9496877Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_hardswish_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9197s] [ 23%]
2025-12-04T10:19:47.9498714Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_float_zero_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.1615s] [ 24%]
2025-12-04T10:19:47.9500616Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_propagation_floordiv_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.2158s] [ 24%]
2025-12-04T10:19:47.9502565Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_propagation_remainder_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9749s] [ 24%]
2025-12-04T10:19:47.9504451Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put3_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.2532s] [ 24%]
2025-12-04T10:19:47.9506295Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put_fallback1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9517s] [ 24%]
2025-12-04T10:19:47.9508260Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put_fallback2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0265s] [ 25%]
2025-12-04T10:19:47.9510141Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put_reinplace_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0056s] [ 25%]
2025-12-04T10:19:47.9512124Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inductor_triton_bucketize_respects_masking_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0734s] [ 25%]
2025-12-04T10:19:47.9514205Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_activations_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.8721s] [ 25%]
2025-12-04T10:19:47.9516120Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_add_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 26%]
2025-12-04T10:19:47.9518110Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_mixed_dtype_ops_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (Skipped!) [ 26%]
2025-12-04T10:19:47.9520486Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_resize_as_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py E1204 10:12:27.962000 26638 site-packages/torch/_dynamo/utils.py:3241] Accuracy failed: allclose not within tol=0.0001
2025-12-04T10:19:47.9522009Z PASSED [0.0804s] [ 26%]
2025-12-04T10:19:47.9523092Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_where_pointwise_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.7812s] [ 26%]
2025-12-04T10:19:47.9525051Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_input_mutation1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 26%]
2025-12-04T10:19:47.9526994Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_int_input_dynamic_shapes_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.7701s] [ 27%]
2025-12-04T10:19:47.9528854Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_issue102546_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.7171s] [ 27%]
2025-12-04T10:19:47.9530830Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_large_broadcast_reduction_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0033s] (cpu not supported) [ 27%]
2025-12-04T10:19:47.9532891Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_large_grid_use_block_ptr_False_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.4994s] [ 27%]
2025-12-04T10:19:47.9534833Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_large_offset_pointwise_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.5430s] [ 28%]
2025-12-04T10:19:47.9537805Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_layer_norm_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py W1204 10:12:34.648000 26638 site-packages/torch/_inductor/debug.py:518] [0/0_1] model__151_inference_165 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug/run_2025_12_04_10_12_33_608767-pid_26638/torchinductor/model__151_inference_165.0
2025-12-04T10:19:47.9539925Z PASSED [1.4086s] [ 28%]
2025-12-04T10:19:47.9540931Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_leaky_relu_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9105s] [ 28%]
2025-12-04T10:19:47.9542786Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_linear2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.1148s] [ 28%]
2025-12-04T10:19:47.9544581Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_linear_float64_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.1705s] [ 29%]
2025-12-04T10:19:47.9546470Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_list_clearing_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 29%]
2025-12-04T10:19:47.9548440Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_lite_mode_fallback_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.1388s] [ 29%]
2025-12-04T10:19:47.9550386Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_lite_mode_not_decompose_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0033s] (requires GPU) [ 29%]
2025-12-04T10:19:47.9552522Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (requires GPU) [ 29%]
2025-12-04T10:19:47.9554644Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_lite_regional_compile_invoke_subgraph_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.0693s] [ 30%]
2025-12-04T10:19:47.9556546Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_log1p_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [5.9812s] [ 30%]
2025-12-04T10:19:47.9558289Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_log2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9948s] [ 30%]
2025-12-04T10:19:47.9560036Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_log_fp64_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0211s] [ 30%]
2025-12-04T10:19:47.9561814Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_logcumsumexp_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.1599s] [ 31%]
2025-12-04T10:19:47.9564030Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mark_dynamic_with_hint_override_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipping triton backend only since not big GPU (not enough SM)) [ 31%]
2025-12-04T10:19:47.9566238Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.8082s] [ 31%]
2025-12-04T10:19:47.9568033Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.6041s] [ 31%]
2025-12-04T10:19:47.9569890Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d6_dilation_2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [3.1172s] [ 31%]
2025-12-04T10:19:47.9572185Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d_with_indices_backward5_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.3953s] [ 32%]
2025-12-04T10:19:47.9574191Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_min_max_reduction_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 32%]
2025-12-04T10:19:47.9576125Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_min_max_reduction_nan_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8041s] [ 32%]
2025-12-04T10:19:47.9578393Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mixed_mm2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9313s] [ 32%]
2025-12-04T10:19:47.9580168Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mixed_mm3_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8411s] [ 33%]
2025-12-04T10:19:47.9581959Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mm_mixed_dtype_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.1972s] [ 33%]
2025-12-04T10:19:47.9584024Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_multilayer_sum_low_prec_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0034s] (requires cuda) [ 33%]
2025-12-04T10:19:47.9586064Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mutable_custom_op_fixed_layout2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.3788s] [ 33%]
2025-12-04T10:19:47.9588077Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_nan_sort_stable_True_descending_True_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.1266s] [ 34%]
2025-12-04T10:19:47.9589987Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_neg_max_uint8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8018s] [ 34%]
2025-12-04T10:19:47.9591803Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_new_empty_strided_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.1070s] [ 34%]
2025-12-04T10:19:47.9593656Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_nll_loss_forward_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.4678s] [ 34%]
2025-12-04T10:19:47.9595457Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_one_hot_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9614s] [ 34%]
2025-12-04T10:19:47.9597228Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pad_cast_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.1749s] [ 35%]
2025-12-04T10:19:47.9599082Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pattern_matcher_unbacked_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.3124s] [ 35%]
2025-12-04T10:19:47.9600982Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_bessel_j0_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8190s] [ 35%]
2025-12-04T10:19:47.9602836Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_erf_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9767s] [ 35%]
2025-12-04T10:19:47.9604667Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_erfc_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9638s] [ 36%]
2025-12-04T10:19:47.9606496Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_exp2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9504s] [ 36%]
2025-12-04T10:19:47.9608343Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_expm1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9856s] [ 36%]
2025-12-04T10:19:47.9610194Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_gammaincc_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8873s] [ 36%]
2025-12-04T10:19:47.9612135Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_modified_bessel_i0_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.7902s] [ 36%]
2025-12-04T10:19:47.9614154Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_multigammaln_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0758s] [ 37%]
2025-12-04T10:19:47.9616072Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_ndtr_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0681s] [ 37%]
2025-12-04T10:19:47.9618014Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_polygamma_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9147s] [ 37%]
2025-12-04T10:19:47.9620033Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_scaled_modified_bessel_k1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8020s] [ 37%]
2025-12-04T10:19:47.9622103Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_shifted_chebyshev_polynomial_v_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8019s] [ 38%]
2025-12-04T10:19:47.9624086Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_xlogy_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0875s] [ 38%]
2025-12-04T10:19:47.9625952Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pow3_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 38%]
2025-12-04T10:19:47.9627778Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pow_int_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [3.9913s] [ 38%]
2025-12-04T10:19:47.9629651Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_prepare_softmax_with_fast_math_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.2456s] [ 39%]
2025-12-04T10:19:47.9631666Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_rand_like_deterministic_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 39%]
2025-12-04T10:19:47.9633597Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_reduction1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0404s] [ 39%]
2025-12-04T10:19:47.9635408Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_reduction2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9663s] [ 39%]
2025-12-04T10:19:47.9637446Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_reduction_config_limit_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0034s] (triton backend is required for cpu) [ 39%]
2025-12-04T10:19:47.9639497Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_remainder_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.1096s] [ 40%]
2025-12-04T10:19:47.9641317Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_remove_noop_slice_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8775s] [ 40%]
2025-12-04T10:19:47.9643211Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_remove_noop_view_default_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.3706s] [ 40%]
2025-12-04T10:19:47.9645101Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_repeat_as_strided_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.7332s] [ 40%]
2025-12-04T10:19:47.9647074Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_repeat_interleave_Tensor_decomp_int32_nd_2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0373s] [ 41%]
2025-12-04T10:19:47.9649561Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_repeat_interleave_decomposition_has_clamp_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0036s] (repeat_interleave decomp doesn't support dynamic output size) [ 41%]
2025-12-04T10:19:47.9651809Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_resize_as_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [22.6377s] [ 41%]
2025-12-04T10:19:47.9653645Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_round_correctness_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8167s] [ 41%]
2025-12-04T10:19:47.9655526Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_rsqrt_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8030s] [ 41%]
2025-12-04T10:19:47.9657348Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scalar_input_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8782s] [ 42%]
2025-12-04T10:19:47.9659356Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scaled_dot_product_efficient_attention_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 42%]
2025-12-04T10:19:47.9661327Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8047s] [ 42%]
2025-12-04T10:19:47.9663102Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter6_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.9560s] [ 42%]
2025-12-04T10:19:47.9664896Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter_bf16_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.4898s] [ 43%]
2025-12-04T10:19:47.9666774Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scheduler_vertical_fusion1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.1551s] [ 43%]
2025-12-04T10:19:47.9669032Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_False_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Does not support SDPA or pre-SM80 hardware) [ 43%]
2025-12-04T10:19:47.9671940Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_True_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Does not support SDPA or pre-SM80 hardware) [ 43%]
2025-12-04T10:19:47.9674158Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_setitem_with_int_parameter_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.5847s] [ 43%]
2025-12-04T10:19:47.9676049Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sgn_extremal_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.7466s] [ 44%]
2025-12-04T10:19:47.9677905Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_should_pad_bench_for_bmm_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.0249s] [ 44%]
2025-12-04T10:19:47.9679747Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sigmoid_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0681s] [ 44%]
2025-12-04T10:19:47.9681524Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sign_dtype_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0645s] [ 44%]
2025-12-04T10:19:47.9683270Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sin_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0392s] [ 45%]
2025-12-04T10:19:47.9685193Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_scatter_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.6867s] [ 45%]
2025-12-04T10:19:47.9687060Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_softmax_backward_data_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8922s] [ 45%]
2025-12-04T10:19:47.9688903Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_softmax_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.6137s] [ 45%]
2025-12-04T10:19:47.9690849Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_softmax_one_kernel_persist_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.2356s] [ 46%]
2025-12-04T10:19:47.9692698Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sort_stable_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.1440s] [ 46%]
2025-12-04T10:19:47.9694489Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_cumsum_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.2997s] [ 46%]
2025-12-04T10:19:47.9696253Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.2941s] [ 46%]
2025-12-04T10:19:47.9698111Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_squeeze_varargs_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.7609s] [ 46%]
2025-12-04T10:19:47.9699904Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sum3_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8842s] [ 47%]
2025-12-04T10:19:47.9701636Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sum4_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0359s] [ 47%]
2025-12-04T10:19:47.9703399Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sum_int_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.5068s] [ 47%]
2025-12-04T10:19:47.9705229Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_to_device_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (Skipped!) [ 47%]
2025-12-04T10:19:47.9707061Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_to_dtype_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0199s] [ 48%]
2025-12-04T10:19:47.9708873Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_torch_device_split_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.1093s] [ 48%]
2025-12-04T10:19:47.9710666Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_triu_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.9149s] [ 48%]
2025-12-04T10:19:47.9712392Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_uint_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.7460s] [ 48%]
2025-12-04T10:19:47.9714183Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unbacked_float_item_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9381s] [ 48%]
2025-12-04T10:19:47.9716117Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unbacked_floordiv_simplify_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0190s] [ 49%]
2025-12-04T10:19:47.9718099Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unbacked_floordiv_simplify_errors_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.0240s] [ 49%]
2025-12-04T10:19:47.9720132Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unbind_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.1831s] [ 49%]
2025-12-04T10:19:47.9722098Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unspec_inputs_float32_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0034s] (Testing mixed devices) [ 49%]
2025-12-04T10:19:47.9724245Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unspec_inputs_int64_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (Testing mixed devices) [ 50%]
2025-12-04T10:19:47.9726309Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unsqueeze_inplace_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9899s] [ 50%]
2025-12-04T10:19:47.9728230Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_upsample_nearest2d_backward_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [4.5830s] [ 50%]
2025-12-04T10:19:47.9730122Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_var_correction_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.6481s] [ 50%]
2025-12-04T10:19:47.9732016Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_var_mean_tile_reduction_True_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.6593s] [ 51%]
2025-12-04T10:19:47.9733856Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_views5_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.1559s] [ 51%]
2025-12-04T10:19:47.9735605Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_views7_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9596s] [ 51%]
2025-12-04T10:19:47.9737467Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_weight_norm_conv2d_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [3.7935s] [ 51%]
2025-12-04T10:19:47.9739304Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_where_broadcast_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.6955s] [ 51%]
2025-12-04T10:19:47.9741169Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_xblock_divides_xnumel_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8973s] [ 52%]
2025-12-04T10:19:47.9743111Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_AllenaiLongformerBase_repro_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.3547s] [ 52%]
2025-12-04T10:19:47.9745340Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test__dyn_quant_matmul_4bit_fp32_input_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0033s] (No _dyn_quant_matmul_4bit implementation on CUDA) [ 52%]
2025-12-04T10:19:47.9747811Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test__dyn_quant_pack_4bit_weight_fp32_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (No _dyn_quant_pack_4bit_weight implementation on CUDA) [ 52%]
2025-12-04T10:19:47.9750005Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test__unsafe_masked_index_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5730s] [ 53%]
2025-12-04T10:19:47.9751968Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_avg_pool_errors_with_long_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.6279s] [ 53%]
2025-12-04T10:19:47.9753974Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_avg_pool_with_output_size_0_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.1465s] [ 53%]
2025-12-04T10:19:47.9756012Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_max_pool2d1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [4.2365s] [ 53%]
2025-12-04T10:19:47.9757904Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_max_pool2d3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [6.2163s] [ 53%]
2025-12-04T10:19:47.9759905Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_pool_errors_with_long_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.2459s] [ 54%]
2025-12-04T10:19:47.9761792Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_complex5_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.8504s] [ 54%]
2025-12-04T10:19:47.9763619Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_complex_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.6096s] [ 54%]
2025-12-04T10:19:47.9765517Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_complex_strided_fallback_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1862s] [ 54%]
2025-12-04T10:19:47.9767455Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_inplace_permuted_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.0588s] [ 55%]
2025-12-04T10:19:47.9769338Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_alexnet_prefix_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [5.5127s] [ 55%]
2025-12-04T10:19:47.9771440Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_any_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.7087s] [ 55%]
2025-12-04T10:19:47.9773235Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_arange2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2032s] [ 55%]
2025-12-04T10:19:47.9775011Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_arange6_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3202s] [ 56%]
2025-12-04T10:19:47.9776891Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_argmax_argmin3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4432s] [ 56%]
2025-12-04T10:19:47.9778732Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool2d2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.3588s] [ 56%]
2025-12-04T10:19:47.9780579Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool3d_backward3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [2.7478s] [ 56%]
2025-12-04T10:19:47.9782503Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_default_kwargs_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2308s] [ 56%]
2025-12-04T10:19:47.9784442Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int32_int32_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.5961s] [ 57%]
2025-12-04T10:19:47.9786381Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int64_int32_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.5921s] [ 57%]
2025-12-04T10:19:47.9788314Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int64_int8_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.5827s] [ 57%]
2025-12-04T10:19:47.9790380Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int8_uint8_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.5751s] [ 57%]
2025-12-04T10:19:47.9792328Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_nd_tiling_False_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.8609s] [ 58%]
2025-12-04T10:19:47.9794239Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_buffer_batch_norm_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [3.1174s] [ 58%]
2025-12-04T10:19:47.9796281Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_builtins_round_float_ndigits_pos_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2558s] [ 58%]
2025-12-04T10:19:47.9798307Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_builtins_round_float_ndigits_zero_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2344s] [ 58%]
2025-12-04T10:19:47.9800228Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cat_negative_dim_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.7886s] [ 58%]
2025-12-04T10:19:47.9802208Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cat_of_loops_and_extern_kernel_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 59%]
2025-12-04T10:19:47.9804160Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cat_uint8_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4613s] [ 59%]
2025-12-04T10:19:47.9805937Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cauchy_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.6168s] [ 59%]
2025-12-04T10:19:47.9807757Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_chunk_recompiles_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.9788s] [ 59%]
2025-12-04T10:19:47.9809690Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_clamp_type_promotion_non_tensor_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2444s] [ 60%]
2025-12-04T10:19:47.9811678Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_computed_buffer_inlining_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2186s] [ 60%]
2025-12-04T10:19:47.9813737Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_config_option_dont_assume_alignment_recompiles_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5165s] [ 60%]
2025-12-04T10:19:47.9815740Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_1d_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.7983s] [ 60%]
2025-12-04T10:19:47.9817688Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_float64_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4966s] [ 60%]
2025-12-04T10:19:47.9819585Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_nd_inplace_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1915s] [ 61%]
2025-12-04T10:19:47.9821747Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv_bn_fuse_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0033s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 61%]
2025-12-04T10:19:47.9823920Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv_inference_heuristics_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [2.1554s] [ 61%]
2025-12-04T10:19:47.9825958Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv_with_as_strided_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [2.6527s] [ 61%]
2025-12-04T10:19:47.9827941Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_copy_non_blocking_is_pinned_use_cat_False_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [2.1997s] [ 62%]
2025-12-04T10:19:47.9830009Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_copy_non_blocking_is_pinned_use_cat_True_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [12.5666s] [ 62%]
2025-12-04T10:19:47.9832083Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cpu_scalar_with_cpu_scalar_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.7479s] [ 62%]
2025-12-04T10:19:47.9833941Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cummin_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [1.3612s] [ 62%]
2025-12-04T10:19:47.9835749Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cumprod_zero_dim_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1595s] [ 63%]
2025-12-04T10:19:47.9837582Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_op_2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4625s] [ 63%]
2025-12-04T10:19:47.9839447Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_scan_op_compiled_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.6357s] [ 63%]
2025-12-04T10:19:47.9841364Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_scan_op_multi_input_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1719s] [ 63%]
2025-12-04T10:19:47.9843291Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_scan_would_split_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4763s] [ 63%]
2025-12-04T10:19:47.9845220Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_deterministic_codegen_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [11.5569s] [ 64%]
2025-12-04T10:19:47.9847168Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dist_bf16_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (Requires sm80) [ 64%]
2025-12-04T10:19:47.9849024Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_div1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5704s] [ 64%]
2025-12-04T10:19:47.9850754Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_div3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4017s] [ 64%]
2025-12-04T10:19:47.9852634Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtype_mismatch_issue_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 65%]
2025-12-04T10:19:47.9854823Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float16_float32_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0034s] (uses bfloat16 which requires SM >= 80) [ 65%]
2025-12-04T10:19:47.9857219Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float16_int16_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (uses bfloat16 which requires SM >= 80) [ 65%]
2025-12-04T10:19:47.9859527Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float16_uint8_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (uses bfloat16 which requires SM >= 80) [ 65%]
2025-12-04T10:19:47.9861944Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float32_int16_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 65%]
2025-12-04T10:19:47.9864252Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float32_int64_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0034s] (uses bfloat16 which requires SM >= 80) [ 66%]
2025-12-04T10:19:47.9866637Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float64_float32_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 66%]
2025-12-04T10:19:47.9868752Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_fusion_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2796s] [ 66%]
2025-12-04T10:19:47.9870857Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int16_float64_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (uses bfloat16 which requires SM >= 80) [ 66%]
2025-12-04T10:19:47.9873500Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int16_uint8_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0036s] (uses bfloat16 which requires SM >= 80) [ 67%]
2025-12-04T10:19:47.9875816Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int32_float16_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (uses bfloat16 which requires SM >= 80) [ 67%]
2025-12-04T10:19:47.9878107Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int32_float32_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 67%]
2025-12-04T10:19:47.9880425Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int32_float64_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 67%]
2025-12-04T10:19:47.9882730Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int64_int16_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (uses bfloat16 which requires SM >= 80) [ 68%]
2025-12-04T10:19:47.9885010Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int64_int8_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 68%]
2025-12-04T10:19:47.9887294Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int8_int64_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 68%]
2025-12-04T10:19:47.9889562Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int8_uint8_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 68%]
2025-12-04T10:19:47.9891848Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_uint8_float16_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0029s] (uses bfloat16 which requires SM >= 80) [ 68%]
2025-12-04T10:19:47.9894146Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_uint8_int16_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0029s] (uses bfloat16 which requires SM >= 80) [ 69%]
2025-12-04T10:19:47.9896679Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_uint8_int32_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (uses bfloat16 which requires SM >= 80) [ 69%]
2025-12-04T10:19:47.9898742Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_embedding_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.0159s] [ 69%]
2025-12-04T10:19:47.9900545Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_empty2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.1063s] [ 69%]
2025-12-04T10:19:47.9902420Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_empty_strided_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.1055s] [ 70%]
2025-12-04T10:19:47.9904345Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_emulate_precision_triton_fp_fusion_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2887s] [ 70%]
2025-12-04T10:19:47.9906247Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_exp2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3630s] [ 70%]
2025-12-04T10:19:47.9908154Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fallback_mutable_op_no_mutated_tensors_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.0368s] [ 70%]
2025-12-04T10:19:47.9910191Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fallback_mutable_op_with_return_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.0540s] [ 70%]
2025-12-04T10:19:47.9912169Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fft_real_input_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 71%]
2025-12-04T10:19:47.9914175Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_float_index_expression_type_promotion_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2111s] [ 71%]
2025-12-04T10:19:47.9916104Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_floordiv_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4573s] [ 71%]
2025-12-04T10:19:47.9918210Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fuse_large_params_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0005s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 71%]
2025-12-04T10:19:47.9920442Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_generated_code_has_alignment_assert_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3170s] [ 72%]
2025-12-04T10:19:47.9922356Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_getitem_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.0343s] [ 72%]
2025-12-04T10:19:47.9924252Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_misaligned_input_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.9638s] [ 72%]
2025-12-04T10:19:47.9926522Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_refcount_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0005s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 72%]
2025-12-04T10:19:47.9928697Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_hardsigmoid_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3179s] [ 73%]
2025-12-04T10:19:47.9930557Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_horizonal_fusion2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4485s] [ 73%]
2025-12-04T10:19:47.9932434Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.7328s] [ 73%]
2025-12-04T10:19:47.9934193Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.8863s] [ 73%]
2025-12-04T10:19:47.9936113Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_propagation_device_assert_masked_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4714s] [ 73%]
2025-12-04T10:19:47.9938228Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_propagation_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1975s] [ 74%]
2025-12-04T10:19:47.9940140Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_propagation_flip_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2016s] [ 74%]
2025-12-04T10:19:47.9942084Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_propagation_floordiv_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.7441s] [ 74%]
2025-12-04T10:19:47.9943955Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [3.0350s] [ 74%]
2025-12-04T10:19:47.9945809Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put_fallback1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3750s] [ 75%]
2025-12-04T10:19:47.9947687Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put_fallback2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4434s] [ 75%]
2025-12-04T10:19:47.9949543Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_select_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.4645s] [ 75%]
2025-12-04T10:19:47.9951472Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_inplace_mixed_dtype_ops_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 75%]
2025-12-04T10:19:47.9953454Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_int_input_dynamic_shapes_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1922s] [ 75%]
2025-12-04T10:19:47.9955394Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_invalid_operand_issue1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.2120s] [ 76%]
2025-12-04T10:19:47.9957246Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_isinf2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2126s] [ 76%]
2025-12-04T10:19:47.9959051Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_issue102546_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1550s] [ 76%]
2025-12-04T10:19:47.9960935Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_kernel_names_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 76%]
2025-12-04T10:19:47.9962831Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_large_block_sizes_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [6.8157s] [ 77%]
2025-12-04T10:19:47.9964732Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_large_grid_use_block_ptr_False_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.5035s] [ 77%]
2025-12-04T10:19:47.9966732Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_large_pointwise_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.2373s] [ 77%]
2025-12-04T10:19:47.9968632Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_large_tensor_reduction_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.0067s] [ 77%]
2025-12-04T10:19:47.9970524Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lerp_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 78%]
2025-12-04T10:19:47.9972804Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_like_rands_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.1981s] [ 78%]
2025-12-04T10:19:47.9974570Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linear1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.1485s] [ 78%]
2025-12-04T10:19:47.9976438Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linear2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.2373s] [ 78%]
2025-12-04T10:19:47.9978430Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linear_float64_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (cuda failed for float64 linear) [ 78%]
2025-12-04T10:19:47.9980492Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_list_clearing_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 79%]
2025-12-04T10:19:47.9982609Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [2.0736s] [ 79%]
2025-12-04T10:19:47.9984861Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [1.8701s] [ 79%]
2025-12-04T10:19:47.9987026Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py FAILED [1.8548s] [ 79%]
2025-12-04T10:19:47.9988142Z 
2025-12-04T10:19:47.9988292Z ==================================== RERUNS ====================================
2025-12-04T10:19:47.9989003Z _ DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda _
2025-12-04T10:19:47.9989675Z Traceback (most recent call last):
2025-12-04T10:19:47.9990510Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in test_lite_regional_compile_flex_attention
2025-12-04T10:19:47.9991401Z     _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x))
2025-12-04T10:19:47.9992231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2430, in run_fw_bw_and_get_code
2025-12-04T10:19:47.9993034Z     return run_and_get_code(run_with_backward)
2025-12-04T10:19:47.9993796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:19:47.9994539Z     result = fn(*args, **kwargs)
2025-12-04T10:19:47.9995245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2426, in run_with_backward
2025-12-04T10:19:47.9995964Z     result = fn()
2025-12-04T10:19:47.9996535Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in <lambda>
2025-12-04T10:19:47.9997243Z     _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x))
2025-12-04T10:19:47.9998016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 936, in compile_wrapper
2025-12-04T10:19:47.9998888Z     raise e.with_traceback(None) from e.__cause__  # User compiler error
2025-12-04T10:19:47.9999474Z torch._dynamo.exc.Unsupported: Attempt to trace generator
2025-12-04T10:19:48.0000246Z   Explanation: Generators cannot be compiled directly with `torch.compile`.
2025-12-04T10:19:48.0001063Z   Hint: Call a generator from inside of a non-generator Python function and compile that function instead.
2025-12-04T10:19:48.0002165Z   Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.
2025-12-04T10:19:48.0003041Z 
2025-12-04T10:19:48.0003177Z   Developer debug context: 
2025-12-04T10:19:48.0003383Z 
2025-12-04T10:19:48.0003936Z  For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html
2025-12-04T10:19:48.0004592Z 
2025-12-04T10:19:48.0004815Z To execute this test, run the following from the base repo dir:
2025-12-04T10:19:48.0006061Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda
2025-12-04T10:19:48.0007093Z 
2025-12-04T10:19:48.0007367Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:19:48.0008003Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:19:48.0010771Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)]
2025-12-04T10:19:48.0013472Z stats [('calls_captured', 12), ('unique_graphs', 1)]
2025-12-04T10:19:48.0014017Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)]
2025-12-04T10:19:48.0014549Z inductor [('fxgraph_cache_miss', 2)]
2025-12-04T10:19:48.0014900Z graph_break []
2025-12-04T10:19:48.0015266Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:19:48.0016430Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0017418Z   warnings.warn(
2025-12-04T10:19:48.0018313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0019267Z   warnings.warn(
2025-12-04T10:19:48.0020702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/flex_attention.py:1624: UserWarning: flex_attention called without torch.compile() - this will use an unfused implementation that materializes the full scores matrix instead of generating a fused kernel.
2025-12-04T10:19:48.0022114Z 
2025-12-04T10:19:48.0022278Z SOLUTION: Use torch.compile(flex_attention)(...)
2025-12-04T10:19:48.0022571Z 
2025-12-04T10:19:48.0022768Z If you want to debug your score_mod/mask_mod, you can set:
2025-12-04T10:19:48.0023373Z torch.nn.attention.flex_attention._FLEX_ATTENTION_DISABLE_COMPILE_DEBUG = True
2025-12-04T10:19:48.0023823Z 
2025-12-04T10:19:48.0024384Z This will allow you to use print statements or breakpoints. Note: This doesn't work with the backwards pass and may produce incorrect results.
2025-12-04T10:19:48.0025190Z   _warn_once(
2025-12-04T10:19:48.0025754Z _ DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda _
2025-12-04T10:19:48.0026417Z Traceback (most recent call last):
2025-12-04T10:19:48.0027236Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in test_lite_regional_compile_flex_attention
2025-12-04T10:19:48.0028119Z     _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x))
2025-12-04T10:19:48.0029014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2430, in run_fw_bw_and_get_code
2025-12-04T10:19:48.0029800Z     return run_and_get_code(run_with_backward)
2025-12-04T10:19:48.0030557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:19:48.0031287Z     result = fn(*args, **kwargs)
2025-12-04T10:19:48.0032039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2426, in run_with_backward
2025-12-04T10:19:48.0032762Z     result = fn()
2025-12-04T10:19:48.0033327Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in <lambda>
2025-12-04T10:19:48.0034033Z     _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x))
2025-12-04T10:19:48.0034801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 936, in compile_wrapper
2025-12-04T10:19:48.0035670Z     raise e.with_traceback(None) from e.__cause__  # User compiler error
2025-12-04T10:19:48.0036262Z torch._dynamo.exc.Unsupported: Attempt to trace generator
2025-12-04T10:19:48.0036878Z   Explanation: Generators cannot be compiled directly with `torch.compile`.
2025-12-04T10:19:48.0037703Z   Hint: Call a generator from inside of a non-generator Python function and compile that function instead.
2025-12-04T10:19:48.0038798Z   Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.
2025-12-04T10:19:48.0039484Z 
2025-12-04T10:19:48.0039616Z   Developer debug context: 
2025-12-04T10:19:48.0039821Z 
2025-12-04T10:19:48.0040348Z  For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html
2025-12-04T10:19:48.0041017Z 
2025-12-04T10:19:48.0041235Z To execute this test, run the following from the base repo dir:
2025-12-04T10:19:48.0042481Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda
2025-12-04T10:19:48.0043498Z 
2025-12-04T10:19:48.0043775Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:19:48.0044405Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:19:48.0047154Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)]
2025-12-04T10:19:48.0049850Z stats [('calls_captured', 12), ('unique_graphs', 1)]
2025-12-04T10:19:48.0050404Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)]
2025-12-04T10:19:48.0050926Z inductor [('fxgraph_cache_miss', 2)]
2025-12-04T10:19:48.0051266Z graph_break []
2025-12-04T10:19:48.0051643Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:19:48.0052745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0053728Z   warnings.warn(
2025-12-04T10:19:48.0054610Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0055580Z   warnings.warn(
2025-12-04T10:19:48.0057168Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/flex_attention.py:1624: UserWarning: flex_attention called without torch.compile() - this will use an unfused implementation that materializes the full scores matrix instead of generating a fused kernel.
2025-12-04T10:19:48.0058561Z 
2025-12-04T10:19:48.0058741Z SOLUTION: Use torch.compile(flex_attention)(...)
2025-12-04T10:19:48.0059033Z 
2025-12-04T10:19:48.0059216Z If you want to debug your score_mod/mask_mod, you can set:
2025-12-04T10:19:48.0059837Z torch.nn.attention.flex_attention._FLEX_ATTENTION_DISABLE_COMPILE_DEBUG = True
2025-12-04T10:19:48.0060335Z 
2025-12-04T10:19:48.0060909Z This will allow you to use print statements or breakpoints. Note: This doesn't work with the backwards pass and may produce incorrect results.
2025-12-04T10:19:48.0061702Z   _warn_once(
2025-12-04T10:19:48.0062063Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:19:48.0064822Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)]
2025-12-04T10:19:48.0067505Z stats [('calls_captured', 12), ('unique_graphs', 1)]
2025-12-04T10:19:48.0082558Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)]
2025-12-04T10:19:48.0083288Z inductor [('fxgraph_cache_miss', 2)]
2025-12-04T10:19:48.0083648Z graph_break []
2025-12-04T10:19:48.0084050Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:19:48.0085161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0086142Z   warnings.warn(
2025-12-04T10:19:48.0087065Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0088037Z   warnings.warn(
2025-12-04T10:19:48.0088343Z =================================== FAILURES ===================================
2025-12-04T10:19:48.0089053Z _ DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda _
2025-12-04T10:19:48.0089743Z Traceback (most recent call last):
2025-12-04T10:19:48.0090547Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in test_lite_regional_compile_flex_attention
2025-12-04T10:19:48.0091428Z     _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x))
2025-12-04T10:19:48.0092250Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2430, in run_fw_bw_and_get_code
2025-12-04T10:19:48.0093050Z     return run_and_get_code(run_with_backward)
2025-12-04T10:19:48.0093791Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:19:48.0094521Z     result = fn(*args, **kwargs)
2025-12-04T10:19:48.0095228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2426, in run_with_backward
2025-12-04T10:19:48.0095938Z     result = fn()
2025-12-04T10:19:48.0096597Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in <lambda>
2025-12-04T10:19:48.0097310Z     _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x))
2025-12-04T10:19:48.0098102Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 936, in compile_wrapper
2025-12-04T10:19:48.0098960Z     raise e.with_traceback(None) from e.__cause__  # User compiler error
2025-12-04T10:19:48.0099559Z torch._dynamo.exc.Unsupported: Attempt to trace generator
2025-12-04T10:19:48.0100395Z   Explanation: Generators cannot be compiled directly with `torch.compile`.
2025-12-04T10:19:48.0101235Z   Hint: Call a generator from inside of a non-generator Python function and compile that function instead.
2025-12-04T10:19:48.0102323Z   Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.
2025-12-04T10:19:48.0103027Z 
2025-12-04T10:19:48.0103147Z   Developer debug context: 
2025-12-04T10:19:48.0103471Z 
2025-12-04T10:19:48.0104023Z  For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html
2025-12-04T10:19:48.0104684Z 
2025-12-04T10:19:48.0104920Z To execute this test, run the following from the base repo dir:
2025-12-04T10:19:48.0106159Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda
2025-12-04T10:19:48.0107191Z 
2025-12-04T10:19:48.0107465Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:19:48.0108106Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:19:48.0110885Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)]
2025-12-04T10:19:48.0113593Z stats [('calls_captured', 12), ('unique_graphs', 1)]
2025-12-04T10:19:48.0114136Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)]
2025-12-04T10:19:48.0114666Z inductor [('fxgraph_cache_miss', 2)]
2025-12-04T10:19:48.0115022Z graph_break []
2025-12-04T10:19:48.0115392Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:19:48.0116505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0117479Z   warnings.warn(
2025-12-04T10:19:48.0118375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0119337Z   warnings.warn(
2025-12-04T10:19:48.0120774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/flex_attention.py:1624: UserWarning: flex_attention called without torch.compile() - this will use an unfused implementation that materializes the full scores matrix instead of generating a fused kernel.
2025-12-04T10:19:48.0122191Z 
2025-12-04T10:19:48.0122358Z SOLUTION: Use torch.compile(flex_attention)(...)
2025-12-04T10:19:48.0122654Z 
2025-12-04T10:19:48.0122855Z If you want to debug your score_mod/mask_mod, you can set:
2025-12-04T10:19:48.0123466Z torch.nn.attention.flex_attention._FLEX_ATTENTION_DISABLE_COMPILE_DEBUG = True
2025-12-04T10:19:48.0123918Z 
2025-12-04T10:19:48.0124480Z This will allow you to use print statements or breakpoints. Note: This doesn't work with the backwards pass and may produce incorrect results.
2025-12-04T10:19:48.0125290Z   _warn_once(
2025-12-04T10:19:48.0125673Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:19:48.0128494Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)]
2025-12-04T10:19:48.0131198Z stats [('calls_captured', 12), ('unique_graphs', 1)]
2025-12-04T10:19:48.0131760Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)]
2025-12-04T10:19:48.0132288Z inductor [('fxgraph_cache_miss', 2)]
2025-12-04T10:19:48.0132694Z graph_break []
2025-12-04T10:19:48.0133078Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:19:48.0134184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0135174Z   warnings.warn(
2025-12-04T10:19:48.0136056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0137117Z   warnings.warn(
2025-12-04T10:19:48.0137508Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:19:48.0140271Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)]
2025-12-04T10:19:48.0142985Z stats [('calls_captured', 12), ('unique_graphs', 1)]
2025-12-04T10:19:48.0143528Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)]
2025-12-04T10:19:48.0144053Z inductor [('fxgraph_cache_miss', 2)]
2025-12-04T10:19:48.0144411Z graph_break []
2025-12-04T10:19:48.0144783Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:19:48.0145881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0146856Z   warnings.warn(
2025-12-04T10:19:48.0147748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0148704Z   warnings.warn(
2025-12-04T10:19:48.0149872Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-0c75da116b2f10f8.xml -
2025-12-04T10:19:48.0151188Z =========================== short test summary info ============================
2025-12-04T10:19:48.0152554Z FAILED [1.8548s] inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda - torch._dynamo.exc.Unsupported: Attempt to trace generator
2025-12-04T10:19:48.0154000Z   Explanation: Generators cannot be compiled directly with `torch.compile`.
2025-12-04T10:19:48.0154843Z   Hint: Call a generator from inside of a non-generator Python function and compile that function instead.
2025-12-04T10:19:48.0155941Z   Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.
2025-12-04T10:19:48.0156641Z 
2025-12-04T10:19:48.0156760Z   Developer debug context: 
2025-12-04T10:19:48.0156967Z 
2025-12-04T10:19:48.0157507Z  For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html
2025-12-04T10:19:48.0158165Z 
2025-12-04T10:19:48.0158404Z To execute this test, run the following from the base repo dir:
2025-12-04T10:19:48.0159742Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda
2025-12-04T10:19:48.0160780Z 
2025-12-04T10:19:48.0161052Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:19:48.0161650Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:19:48.0162304Z == 1 failed, 256 passed, 61 skipped, 32 xfailed, 2 rerun in 462.93s (0:07:42) ==
2025-12-04T10:19:48.0162786Z Got exit code 1
2025-12-04T10:19:48.0163063Z Retrying single test...
2025-12-04T10:19:48.0163708Z W1204 10:16:50.891000 36017 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:19:48.0165124Z Test results will be stored in test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-fd0863b8a222871a.xml
2025-12-04T10:19:48.0166244Z ============================= test session starts ==============================
2025-12-04T10:19:48.0166913Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:19:48.0167521Z cachedir: .pytest_cache
2025-12-04T10:19:48.0168225Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:19:48.0169021Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:19:48.0169382Z configfile: pytest.ini
2025-12-04T10:19:48.0170110Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:19:48.0171255Z collecting ... collected 1750 items / 440 deselected / 1310 selected
2025-12-04T10:19:48.0172688Z stepcurrent: skipping 349 already run items. Running only test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda
2025-12-04T10:19:48.0173925Z Running 1 items in this shard
2025-12-04T10:19:48.0174140Z 
2025-12-04T10:19:48.0175208Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [4.7316s] [100%]
2025-12-04T10:19:48.0177518Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [2.0298s] [100%]
2025-12-04T10:19:48.0179675Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py FAILED [1.7950s] [100%]
2025-12-04T10:19:48.0180789Z 
2025-12-04T10:19:48.0180933Z ==================================== RERUNS ====================================
2025-12-04T10:19:48.0181642Z _ DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda _
2025-12-04T10:19:48.0182321Z Traceback (most recent call last):
2025-12-04T10:19:48.0183123Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in test_lite_regional_compile_flex_attention
2025-12-04T10:19:48.0184000Z     _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x))
2025-12-04T10:19:48.0184829Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2430, in run_fw_bw_and_get_code
2025-12-04T10:19:48.0185620Z     return run_and_get_code(run_with_backward)
2025-12-04T10:19:48.0186370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:19:48.0187107Z     result = fn(*args, **kwargs)
2025-12-04T10:19:48.0187812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2426, in run_with_backward
2025-12-04T10:19:48.0188521Z     result = fn()
2025-12-04T10:19:48.0189258Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in <lambda>
2025-12-04T10:19:48.0189981Z     _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x))
2025-12-04T10:19:48.0190756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 936, in compile_wrapper
2025-12-04T10:19:48.0191632Z     raise e.with_traceback(None) from e.__cause__  # User compiler error
2025-12-04T10:19:48.0192343Z torch._dynamo.exc.Unsupported: Attempt to trace generator
2025-12-04T10:19:48.0192972Z   Explanation: Generators cannot be compiled directly with `torch.compile`.
2025-12-04T10:19:48.0193787Z   Hint: Call a generator from inside of a non-generator Python function and compile that function instead.
2025-12-04T10:19:48.0194884Z   Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.
2025-12-04T10:19:48.0195587Z 
2025-12-04T10:19:48.0195712Z   Developer debug context: 
2025-12-04T10:19:48.0195918Z 
2025-12-04T10:19:48.0196463Z  For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html
2025-12-04T10:19:48.0197118Z 
2025-12-04T10:19:48.0197339Z To execute this test, run the following from the base repo dir:
2025-12-04T10:19:48.0198586Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda
2025-12-04T10:19:48.0199626Z 
2025-12-04T10:19:48.0199895Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:19:48.0200537Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:19:48.0203310Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)]
2025-12-04T10:19:48.0205981Z stats [('calls_captured', 12), ('unique_graphs', 1)]
2025-12-04T10:19:48.0206543Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)]
2025-12-04T10:19:48.0207069Z inductor [('fxgraph_cache_miss', 2)]
2025-12-04T10:19:48.0207421Z graph_break []
2025-12-04T10:19:48.0207790Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:19:48.0208883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0209857Z   warnings.warn(
2025-12-04T10:19:48.0210734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0211698Z   warnings.warn(
2025-12-04T10:19:48.0213128Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/flex_attention.py:1624: UserWarning: flex_attention called without torch.compile() - this will use an unfused implementation that materializes the full scores matrix instead of generating a fused kernel.
2025-12-04T10:19:48.0214525Z 
2025-12-04T10:19:48.0214702Z SOLUTION: Use torch.compile(flex_attention)(...)
2025-12-04T10:19:48.0214996Z 
2025-12-04T10:19:48.0215196Z If you want to debug your score_mod/mask_mod, you can set:
2025-12-04T10:19:48.0215798Z torch.nn.attention.flex_attention._FLEX_ATTENTION_DISABLE_COMPILE_DEBUG = True
2025-12-04T10:19:48.0216241Z 
2025-12-04T10:19:48.0216966Z This will allow you to use print statements or breakpoints. Note: This doesn't work with the backwards pass and may produce incorrect results.
2025-12-04T10:19:48.0217769Z   _warn_once(
2025-12-04T10:19:48.0218314Z _ DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda _
2025-12-04T10:19:48.0218993Z Traceback (most recent call last):
2025-12-04T10:19:48.0219816Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in test_lite_regional_compile_flex_attention
2025-12-04T10:19:48.0220756Z     _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x))
2025-12-04T10:19:48.0221560Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2430, in run_fw_bw_and_get_code
2025-12-04T10:19:48.0222361Z     return run_and_get_code(run_with_backward)
2025-12-04T10:19:48.0223112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:19:48.0223836Z     result = fn(*args, **kwargs)
2025-12-04T10:19:48.0224537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2426, in run_with_backward
2025-12-04T10:19:48.0225258Z     result = fn()
2025-12-04T10:19:48.0225825Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in <lambda>
2025-12-04T10:19:48.0226524Z     _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x))
2025-12-04T10:19:48.0227309Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 936, in compile_wrapper
2025-12-04T10:19:48.0228179Z     raise e.with_traceback(None) from e.__cause__  # User compiler error
2025-12-04T10:19:48.0228771Z torch._dynamo.exc.Unsupported: Attempt to trace generator
2025-12-04T10:19:48.0229386Z   Explanation: Generators cannot be compiled directly with `torch.compile`.
2025-12-04T10:19:48.0230213Z   Hint: Call a generator from inside of a non-generator Python function and compile that function instead.
2025-12-04T10:19:48.0231314Z   Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.
2025-12-04T10:19:48.0232002Z 
2025-12-04T10:19:48.0232121Z   Developer debug context: 
2025-12-04T10:19:48.0232342Z 
2025-12-04T10:19:48.0232871Z  For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html
2025-12-04T10:19:48.0233544Z 
2025-12-04T10:19:48.0233769Z To execute this test, run the following from the base repo dir:
2025-12-04T10:19:48.0235016Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda
2025-12-04T10:19:48.0236060Z 
2025-12-04T10:19:48.0236332Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:19:48.0236973Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:19:48.0239756Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)]
2025-12-04T10:19:48.0242466Z stats [('calls_captured', 12), ('unique_graphs', 1)]
2025-12-04T10:19:48.0243009Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)]
2025-12-04T10:19:48.0243542Z inductor [('fxgraph_cache_miss', 2)]
2025-12-04T10:19:48.0243898Z graph_break []
2025-12-04T10:19:48.0244269Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:19:48.0245449Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0246424Z   warnings.warn(
2025-12-04T10:19:48.0247318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0248271Z   warnings.warn(
2025-12-04T10:19:48.0249719Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/flex_attention.py:1624: UserWarning: flex_attention called without torch.compile() - this will use an unfused implementation that materializes the full scores matrix instead of generating a fused kernel.
2025-12-04T10:19:48.0251191Z 
2025-12-04T10:19:48.0251354Z SOLUTION: Use torch.compile(flex_attention)(...)
2025-12-04T10:19:48.0251648Z 
2025-12-04T10:19:48.0251848Z If you want to debug your score_mod/mask_mod, you can set:
2025-12-04T10:19:48.0252453Z torch.nn.attention.flex_attention._FLEX_ATTENTION_DISABLE_COMPILE_DEBUG = True
2025-12-04T10:19:48.0252914Z 
2025-12-04T10:19:48.0253477Z This will allow you to use print statements or breakpoints. Note: This doesn't work with the backwards pass and may produce incorrect results.
2025-12-04T10:19:48.0254282Z   _warn_once(
2025-12-04T10:19:48.0254655Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:19:48.0257482Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)]
2025-12-04T10:19:48.0260191Z stats [('calls_captured', 12), ('unique_graphs', 1)]
2025-12-04T10:19:48.0260749Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)]
2025-12-04T10:19:48.0261272Z inductor [('fxgraph_cache_miss', 2)]
2025-12-04T10:19:48.0261605Z graph_break []
2025-12-04T10:19:48.0261986Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:19:48.0263091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0264062Z   warnings.warn(
2025-12-04T10:19:48.0264943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0265908Z   warnings.warn(
2025-12-04T10:19:48.0266225Z =================================== FAILURES ===================================
2025-12-04T10:19:48.0266912Z _ DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda _
2025-12-04T10:19:48.0267590Z Traceback (most recent call last):
2025-12-04T10:19:48.0268411Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in test_lite_regional_compile_flex_attention
2025-12-04T10:19:48.0269291Z     _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x))
2025-12-04T10:19:48.0270097Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2430, in run_fw_bw_and_get_code
2025-12-04T10:19:48.0270901Z     return run_and_get_code(run_with_backward)
2025-12-04T10:19:48.0271982Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:19:48.0272712Z     result = fn(*args, **kwargs)
2025-12-04T10:19:48.0273409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2426, in run_with_backward
2025-12-04T10:19:48.0274133Z     result = fn()
2025-12-04T10:19:48.0274869Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in <lambda>
2025-12-04T10:19:48.0275569Z     _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x))
2025-12-04T10:19:48.0276353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 936, in compile_wrapper
2025-12-04T10:19:48.0277221Z     raise e.with_traceback(None) from e.__cause__  # User compiler error
2025-12-04T10:19:48.0277813Z torch._dynamo.exc.Unsupported: Attempt to trace generator
2025-12-04T10:19:48.0278517Z   Explanation: Generators cannot be compiled directly with `torch.compile`.
2025-12-04T10:19:48.0279343Z   Hint: Call a generator from inside of a non-generator Python function and compile that function instead.
2025-12-04T10:19:48.0280442Z   Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.
2025-12-04T10:19:48.0281128Z 
2025-12-04T10:19:48.0281262Z   Developer debug context: 
2025-12-04T10:19:48.0281468Z 
2025-12-04T10:19:48.0282002Z  For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html
2025-12-04T10:19:48.0282677Z 
2025-12-04T10:19:48.0282897Z To execute this test, run the following from the base repo dir:
2025-12-04T10:19:48.0284145Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda
2025-12-04T10:19:48.0285167Z 
2025-12-04T10:19:48.0285449Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:19:48.0286066Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:19:48.0288830Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)]
2025-12-04T10:19:48.0291528Z stats [('calls_captured', 12), ('unique_graphs', 1)]
2025-12-04T10:19:48.0292082Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)]
2025-12-04T10:19:48.0292612Z inductor [('fxgraph_cache_miss', 2)]
2025-12-04T10:19:48.0292946Z graph_break []
2025-12-04T10:19:48.0293325Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:19:48.0294416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0295382Z   warnings.warn(
2025-12-04T10:19:48.0296337Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0297310Z   warnings.warn(
2025-12-04T10:19:48.0298751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/flex_attention.py:1624: UserWarning: flex_attention called without torch.compile() - this will use an unfused implementation that materializes the full scores matrix instead of generating a fused kernel.
2025-12-04T10:19:48.0300151Z 
2025-12-04T10:19:48.0300315Z SOLUTION: Use torch.compile(flex_attention)(...)
2025-12-04T10:19:48.0300623Z 
2025-12-04T10:19:48.0300805Z If you want to debug your score_mod/mask_mod, you can set:
2025-12-04T10:19:48.0301426Z torch.nn.attention.flex_attention._FLEX_ATTENTION_DISABLE_COMPILE_DEBUG = True
2025-12-04T10:19:48.0301861Z 
2025-12-04T10:19:48.0302432Z This will allow you to use print statements or breakpoints. Note: This doesn't work with the backwards pass and may produce incorrect results.
2025-12-04T10:19:48.0303215Z   _warn_once(
2025-12-04T10:19:48.0303724Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:19:48.0306501Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)]
2025-12-04T10:19:48.0309266Z stats [('calls_captured', 12), ('unique_graphs', 1)]
2025-12-04T10:19:48.0309817Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)]
2025-12-04T10:19:48.0310327Z inductor [('fxgraph_cache_miss', 2)]
2025-12-04T10:19:48.0310683Z graph_break []
2025-12-04T10:19:48.0311068Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:19:48.0312153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0313127Z   warnings.warn(
2025-12-04T10:19:48.0314012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0314979Z   warnings.warn(
2025-12-04T10:19:48.0315350Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:19:48.0318130Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)]
2025-12-04T10:19:48.0320795Z stats [('calls_captured', 12), ('unique_graphs', 1)]
2025-12-04T10:19:48.0321352Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)]
2025-12-04T10:19:48.0321868Z inductor [('fxgraph_cache_miss', 2)]
2025-12-04T10:19:48.0322216Z graph_break []
2025-12-04T10:19:48.0322588Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:19:48.0323678Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0324631Z   warnings.warn(
2025-12-04T10:19:48.0325525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0326493Z   warnings.warn(
2025-12-04T10:19:48.0327654Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-fd0863b8a222871a.xml -
2025-12-04T10:19:48.0328958Z =========================== short test summary info ============================
2025-12-04T10:19:48.0330329Z FAILED [1.7950s] inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda - torch._dynamo.exc.Unsupported: Attempt to trace generator
2025-12-04T10:19:48.0331772Z   Explanation: Generators cannot be compiled directly with `torch.compile`.
2025-12-04T10:19:48.0332603Z   Hint: Call a generator from inside of a non-generator Python function and compile that function instead.
2025-12-04T10:19:48.0333762Z   Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.
2025-12-04T10:19:48.0334464Z 
2025-12-04T10:19:48.0334583Z   Developer debug context: 
2025-12-04T10:19:48.0334791Z 
2025-12-04T10:19:48.0335332Z  For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html
2025-12-04T10:19:48.0335996Z 
2025-12-04T10:19:48.0336363Z To execute this test, run the following from the base repo dir:
2025-12-04T10:19:48.0337598Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda
2025-12-04T10:19:48.0338630Z 
2025-12-04T10:19:48.0338900Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:19:48.0339501Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:19:48.0340041Z ================== 1 failed, 440 deselected, 2 rerun in 8.68s ==================
2025-12-04T10:19:48.0340483Z Got exit code 1
2025-12-04T10:19:48.0340758Z Retrying single test...
2025-12-04T10:19:48.0341400Z W1204 10:17:13.656000 36216 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:19:48.0342805Z Test results will be stored in test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-6fcb35b3fc35a71c.xml
2025-12-04T10:19:48.0343937Z ============================= test session starts ==============================
2025-12-04T10:19:48.0344604Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:19:48.0345213Z cachedir: .pytest_cache
2025-12-04T10:19:48.0345913Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:19:48.0346704Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:19:48.0347068Z configfile: pytest.ini
2025-12-04T10:19:48.0347798Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:19:48.0348694Z collecting ... collected 1750 items / 440 deselected / 1310 selected
2025-12-04T10:19:48.0350031Z stepcurrent: skipping 349 already run items. Running only test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda
2025-12-04T10:19:48.0351251Z Running 1 items in this shard
2025-12-04T10:19:48.0351463Z 
2025-12-04T10:19:48.0352547Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [4.6647s] [100%]
2025-12-04T10:19:48.0354792Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [2.0683s] [100%]
2025-12-04T10:19:48.0356939Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py FAILED [1.8384s] [100%]
2025-12-04T10:19:48.0358044Z 
2025-12-04T10:19:48.0358193Z ==================================== RERUNS ====================================
2025-12-04T10:19:48.0358896Z _ DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda _
2025-12-04T10:19:48.0359567Z Traceback (most recent call last):
2025-12-04T10:19:48.0360372Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in test_lite_regional_compile_flex_attention
2025-12-04T10:19:48.0361249Z     _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x))
2025-12-04T10:19:48.0362151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2430, in run_fw_bw_and_get_code
2025-12-04T10:19:48.0362946Z     return run_and_get_code(run_with_backward)
2025-12-04T10:19:48.0363701Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:19:48.0364432Z     result = fn(*args, **kwargs)
2025-12-04T10:19:48.0365136Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2426, in run_with_backward
2025-12-04T10:19:48.0365912Z     result = fn()
2025-12-04T10:19:48.0366477Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in <lambda>
2025-12-04T10:19:48.0367182Z     _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x))
2025-12-04T10:19:48.0367955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 936, in compile_wrapper
2025-12-04T10:19:48.0368822Z     raise e.with_traceback(None) from e.__cause__  # User compiler error
2025-12-04T10:19:48.0369418Z torch._dynamo.exc.Unsupported: Attempt to trace generator
2025-12-04T10:19:48.0370047Z   Explanation: Generators cannot be compiled directly with `torch.compile`.
2025-12-04T10:19:48.0370858Z   Hint: Call a generator from inside of a non-generator Python function and compile that function instead.
2025-12-04T10:19:48.0372309Z   Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.
2025-12-04T10:19:48.0373021Z 
2025-12-04T10:19:48.0373143Z   Developer debug context: 
2025-12-04T10:19:48.0373351Z 
2025-12-04T10:19:48.0373897Z  For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html
2025-12-04T10:19:48.0374556Z 
2025-12-04T10:19:48.0374777Z To execute this test, run the following from the base repo dir:
2025-12-04T10:19:48.0376031Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda
2025-12-04T10:19:48.0377138Z 
2025-12-04T10:19:48.0377408Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:19:48.0378051Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:19:48.0380801Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)]
2025-12-04T10:19:48.0383527Z stats [('calls_captured', 12), ('unique_graphs', 1)]
2025-12-04T10:19:48.0384088Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)]
2025-12-04T10:19:48.0384610Z inductor [('fxgraph_cache_miss', 2)]
2025-12-04T10:19:48.0384959Z graph_break []
2025-12-04T10:19:48.0385327Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:19:48.0386424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0387406Z   warnings.warn(
2025-12-04T10:19:48.0388396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0389367Z   warnings.warn(
2025-12-04T10:19:48.0390964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/flex_attention.py:1624: UserWarning: flex_attention called without torch.compile() - this will use an unfused implementation that materializes the full scores matrix instead of generating a fused kernel.
2025-12-04T10:19:48.0392354Z 
2025-12-04T10:19:48.0392532Z SOLUTION: Use torch.compile(flex_attention)(...)
2025-12-04T10:19:48.0392824Z 
2025-12-04T10:19:48.0393024Z If you want to debug your score_mod/mask_mod, you can set:
2025-12-04T10:19:48.0393630Z torch.nn.attention.flex_attention._FLEX_ATTENTION_DISABLE_COMPILE_DEBUG = True
2025-12-04T10:19:48.0394164Z 
2025-12-04T10:19:48.0394722Z This will allow you to use print statements or breakpoints. Note: This doesn't work with the backwards pass and may produce incorrect results.
2025-12-04T10:19:48.0395516Z   _warn_once(
2025-12-04T10:19:48.0396062Z _ DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda _
2025-12-04T10:19:48.0396735Z Traceback (most recent call last):
2025-12-04T10:19:48.0397549Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in test_lite_regional_compile_flex_attention
2025-12-04T10:19:48.0398435Z     _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x))
2025-12-04T10:19:48.0399249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2430, in run_fw_bw_and_get_code
2025-12-04T10:19:48.0400048Z     return run_and_get_code(run_with_backward)
2025-12-04T10:19:48.0400802Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:19:48.0401534Z     result = fn(*args, **kwargs)
2025-12-04T10:19:48.0402230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2426, in run_with_backward
2025-12-04T10:19:48.0402952Z     result = fn()
2025-12-04T10:19:48.0403530Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in <lambda>
2025-12-04T10:19:48.0404225Z     _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x))
2025-12-04T10:19:48.0405017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 936, in compile_wrapper
2025-12-04T10:19:48.0405887Z     raise e.with_traceback(None) from e.__cause__  # User compiler error
2025-12-04T10:19:48.0406479Z torch._dynamo.exc.Unsupported: Attempt to trace generator
2025-12-04T10:19:48.0407094Z   Explanation: Generators cannot be compiled directly with `torch.compile`.
2025-12-04T10:19:48.0407919Z   Hint: Call a generator from inside of a non-generator Python function and compile that function instead.
2025-12-04T10:19:48.0409016Z   Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.
2025-12-04T10:19:48.0409701Z 
2025-12-04T10:19:48.0409832Z   Developer debug context: 
2025-12-04T10:19:48.0410038Z 
2025-12-04T10:19:48.0410560Z  For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html
2025-12-04T10:19:48.0411235Z 
2025-12-04T10:19:48.0411453Z To execute this test, run the following from the base repo dir:
2025-12-04T10:19:48.0412700Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda
2025-12-04T10:19:48.0413721Z 
2025-12-04T10:19:48.0414016Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:19:48.0414643Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:19:48.0417609Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)]
2025-12-04T10:19:48.0420316Z stats [('calls_captured', 12), ('unique_graphs', 1)]
2025-12-04T10:19:48.0420876Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)]
2025-12-04T10:19:48.0421405Z inductor [('fxgraph_cache_miss', 2)]
2025-12-04T10:19:48.0421745Z graph_break []
2025-12-04T10:19:48.0422131Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:19:48.0423303Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0424262Z   warnings.warn(
2025-12-04T10:19:48.0425162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0426134Z   warnings.warn(
2025-12-04T10:19:48.0427579Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/flex_attention.py:1624: UserWarning: flex_attention called without torch.compile() - this will use an unfused implementation that materializes the full scores matrix instead of generating a fused kernel.
2025-12-04T10:19:48.0428973Z 
2025-12-04T10:19:48.0429138Z SOLUTION: Use torch.compile(flex_attention)(...)
2025-12-04T10:19:48.0429450Z 
2025-12-04T10:19:48.0429636Z If you want to debug your score_mod/mask_mod, you can set:
2025-12-04T10:19:48.0430261Z torch.nn.attention.flex_attention._FLEX_ATTENTION_DISABLE_COMPILE_DEBUG = True
2025-12-04T10:19:48.0430697Z 
2025-12-04T10:19:48.0431274Z This will allow you to use print statements or breakpoints. Note: This doesn't work with the backwards pass and may produce incorrect results.
2025-12-04T10:19:48.0432056Z   _warn_once(
2025-12-04T10:19:48.0432433Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:19:48.0435226Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)]
2025-12-04T10:19:48.0437905Z stats [('calls_captured', 12), ('unique_graphs', 1)]
2025-12-04T10:19:48.0438456Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)]
2025-12-04T10:19:48.0438963Z inductor [('fxgraph_cache_miss', 2)]
2025-12-04T10:19:48.0439308Z graph_break []
2025-12-04T10:19:48.0439684Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:19:48.0440767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0441734Z   warnings.warn(
2025-12-04T10:19:48.0442625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0443592Z   warnings.warn(
2025-12-04T10:19:48.0443897Z =================================== FAILURES ===================================
2025-12-04T10:19:48.0444611Z _ DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda _
2025-12-04T10:19:48.0445286Z Traceback (most recent call last):
2025-12-04T10:19:48.0446090Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in test_lite_regional_compile_flex_attention
2025-12-04T10:19:48.0446971Z     _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x))
2025-12-04T10:19:48.0447797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2430, in run_fw_bw_and_get_code
2025-12-04T10:19:48.0448769Z     return run_and_get_code(run_with_backward)
2025-12-04T10:19:48.0449515Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:19:48.0450249Z     result = fn(*args, **kwargs)
2025-12-04T10:19:48.0450958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2426, in run_with_backward
2025-12-04T10:19:48.0451765Z     result = fn()
2025-12-04T10:19:48.0452315Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in <lambda>
2025-12-04T10:19:48.0453016Z     _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x))
2025-12-04T10:19:48.0453804Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 936, in compile_wrapper
2025-12-04T10:19:48.0454657Z     raise e.with_traceback(None) from e.__cause__  # User compiler error
2025-12-04T10:19:48.0455247Z torch._dynamo.exc.Unsupported: Attempt to trace generator
2025-12-04T10:19:48.0455885Z   Explanation: Generators cannot be compiled directly with `torch.compile`.
2025-12-04T10:19:48.0456784Z   Hint: Call a generator from inside of a non-generator Python function and compile that function instead.
2025-12-04T10:19:48.0457873Z   Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.
2025-12-04T10:19:48.0458581Z 
2025-12-04T10:19:48.0458701Z   Developer debug context: 
2025-12-04T10:19:48.0458908Z 
2025-12-04T10:19:48.0459448Z  For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html
2025-12-04T10:19:48.0460110Z 
2025-12-04T10:19:48.0460347Z To execute this test, run the following from the base repo dir:
2025-12-04T10:19:48.0461578Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda
2025-12-04T10:19:48.0462615Z 
2025-12-04T10:19:48.0462884Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:19:48.0463515Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:19:48.0466287Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)]
2025-12-04T10:19:48.0468991Z stats [('calls_captured', 12), ('unique_graphs', 1)]
2025-12-04T10:19:48.0469533Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)]
2025-12-04T10:19:48.0470059Z inductor [('fxgraph_cache_miss', 2)]
2025-12-04T10:19:48.0470411Z graph_break []
2025-12-04T10:19:48.0470778Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:19:48.0472224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0473201Z   warnings.warn(
2025-12-04T10:19:48.0474094Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0475050Z   warnings.warn(
2025-12-04T10:19:48.0476485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/flex_attention.py:1624: UserWarning: flex_attention called without torch.compile() - this will use an unfused implementation that materializes the full scores matrix instead of generating a fused kernel.
2025-12-04T10:19:48.0477894Z 
2025-12-04T10:19:48.0478230Z SOLUTION: Use torch.compile(flex_attention)(...)
2025-12-04T10:19:48.0478526Z 
2025-12-04T10:19:48.0478729Z If you want to debug your score_mod/mask_mod, you can set:
2025-12-04T10:19:48.0479336Z torch.nn.attention.flex_attention._FLEX_ATTENTION_DISABLE_COMPILE_DEBUG = True
2025-12-04T10:19:48.0479790Z 
2025-12-04T10:19:48.0480350Z This will allow you to use print statements or breakpoints. Note: This doesn't work with the backwards pass and may produce incorrect results.
2025-12-04T10:19:48.0481239Z   _warn_once(
2025-12-04T10:19:48.0481614Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:19:48.0484381Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)]
2025-12-04T10:19:48.0487061Z stats [('calls_captured', 12), ('unique_graphs', 1)]
2025-12-04T10:19:48.0487620Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)]
2025-12-04T10:19:48.0488147Z inductor [('fxgraph_cache_miss', 2)]
2025-12-04T10:19:48.0488497Z graph_break []
2025-12-04T10:19:48.0488862Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:19:48.0489957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0490930Z   warnings.warn(
2025-12-04T10:19:48.0491810Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0492775Z   warnings.warn(
2025-12-04T10:19:48.0493157Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:19:48.0495927Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)]
2025-12-04T10:19:48.0498736Z stats [('calls_captured', 12), ('unique_graphs', 1)]
2025-12-04T10:19:48.0499278Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)]
2025-12-04T10:19:48.0499806Z inductor [('fxgraph_cache_miss', 2)]
2025-12-04T10:19:48.0500162Z graph_break []
2025-12-04T10:19:48.0500524Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:19:48.0501622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0502589Z   warnings.warn(
2025-12-04T10:19:48.0503481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:19:48.0504429Z   warnings.warn(
2025-12-04T10:19:48.0505597Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-6fcb35b3fc35a71c.xml -
2025-12-04T10:19:48.0506902Z =========================== short test summary info ============================
2025-12-04T10:19:48.0508348Z FAILED [1.8384s] inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda - torch._dynamo.exc.Unsupported: Attempt to trace generator
2025-12-04T10:19:48.0509791Z   Explanation: Generators cannot be compiled directly with `torch.compile`.
2025-12-04T10:19:48.0510624Z   Hint: Call a generator from inside of a non-generator Python function and compile that function instead.
2025-12-04T10:19:48.0511807Z   Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.
2025-12-04T10:19:48.0512495Z 
2025-12-04T10:19:48.0512631Z   Developer debug context: 
2025-12-04T10:19:48.0512839Z 
2025-12-04T10:19:48.0513364Z  For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html
2025-12-04T10:19:48.0514038Z 
2025-12-04T10:19:48.0514258Z To execute this test, run the following from the base repo dir:
2025-12-04T10:19:48.0515510Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda
2025-12-04T10:19:48.0516531Z 
2025-12-04T10:19:48.0516820Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:19:48.0517416Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:19:48.0517944Z ================== 1 failed, 440 deselected, 2 rerun in 8.70s ==================
2025-12-04T10:19:48.0518398Z Got exit code 1
2025-12-04T10:19:48.0519367Z FAILED CONSISTENTLY: test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda
2025-12-04T10:19:48.0520709Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:19:48.0521718Z W1204 10:17:36.320000 36415 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:19:48.0523125Z Test results will be stored in test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-f8b2416e9d43ac69.xml
2025-12-04T10:19:48.0524256Z ============================= test session starts ==============================
2025-12-04T10:19:48.0524914Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:19:48.0525521Z cachedir: .pytest_cache
2025-12-04T10:19:48.0526233Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:19:48.0527022Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:19:48.0527365Z configfile: pytest.ini
2025-12-04T10:19:48.0528094Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:19:48.0528337Z collecting ... collected 1750 items / 350 deselected / 1400 selected
2025-12-04T10:19:48.0528501Z stepcurrent: skipping 350 already run items.
2025-12-04T10:19:48.0528619Z Running 91 items in this shard
2025-12-04T10:19:48.0528624Z 
2025-12-04T10:19:48.0529611Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_invoke_subgraph_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [4.9039s] [  1%]
2025-12-04T10:19:48.0530423Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_log2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.8361s] [  2%]
2025-12-04T10:19:48.0531366Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_low_memory_max_pool_dilation_1_dim_2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [2.1538s] [  3%]
2025-12-04T10:19:48.0532385Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_low_memory_max_pool_dilation_1_dim_3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [6.2089s] [  4%]
2025-12-04T10:19:48.0533321Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_low_memory_max_pool_dilation_2_dim_2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [2.5604s] [  5%]
2025-12-04T10:19:48.0534210Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_min_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.7639s] [  6%]
2025-12-04T10:19:48.0535041Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d4_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [3.0247s] [  7%]
2025-12-04T10:19:48.0535946Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d6_dilation_1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [2.7706s] [  8%]
2025-12-04T10:19:48.0536900Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d6_dilation_2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [2.9282s] [  9%]
2025-12-04T10:19:48.0537831Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d_with_indices_backward6_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.3683s] [ 10%]
2025-12-04T10:19:48.0538654Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mean_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.5212s] [ 12%]
2025-12-04T10:19:48.0539494Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mul_index_expr_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5776s] [ 13%]
2025-12-04T10:19:48.0540530Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_multi_gpu_device_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (requires multiple cuda devices) [ 14%]
2025-12-04T10:19:48.0541389Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_multilayer_var_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.9535s] [ 15%]
2025-12-04T10:19:48.0542331Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nan_assert_inside_triton_kernel_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4113s] [ 16%]
2025-12-04T10:19:48.0543288Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nan_sort_stable_False_descending_True_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1395s] [ 17%]
2025-12-04T10:19:48.0544125Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_new_empty_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.1191s] [ 18%]
2025-12-04T10:19:48.0544978Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nll_loss_forward_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.8117s] [ 19%]
2025-12-04T10:19:48.0545885Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pixel_shuffle_channels_last_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [2.3575s] [ 20%]
2025-12-04T10:19:48.0546777Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_bessel_j1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.7731s] [ 21%]
2025-12-04T10:19:48.0547716Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_chebyshev_polynomial_t_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.5134s] [ 23%]
2025-12-04T10:19:48.0548730Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_chebyshev_polynomial_w_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [1.0405s] [ 24%]
2025-12-04T10:19:48.0549597Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_expit_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1915s] [ 25%]
2025-12-04T10:19:48.0550483Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_gammaincc_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.1575s] [ 26%]
2025-12-04T10:19:48.0551384Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_i0_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2490s] [ 27%]
2025-12-04T10:19:48.0552242Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_log1p_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2497s] [ 28%]
2025-12-04T10:19:48.0553170Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_modified_bessel_k0_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.6418s] [ 29%]
2025-12-04T10:19:48.0554008Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_psi_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.5504s] [ 30%]
2025-12-04T10:19:48.0554882Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_round_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2620s] [ 31%]
2025-12-04T10:19:48.0555863Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_shifted_chebyshev_polynomial_v_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [1.0660s] [ 32%]
2025-12-04T10:19:48.0556741Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pow3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 34%]
2025-12-04T10:19:48.0557556Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pow_int_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.4988s] [ 35%]
2025-12-04T10:19:48.0558410Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pow_symfloat_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4773s] [ 36%]
2025-12-04T10:19:48.0559272Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_randn_generator_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5635s] [ 37%]
2025-12-04T10:19:48.0560119Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_randn_like_empty_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.1598s] [ 38%]
2025-12-04T10:19:48.0560974Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_reduction1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3739s] [ 39%]
2025-12-04T10:19:48.0562430Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_reflection_pad2d_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py W1204 10:18:23.024000 36415 site-packages/torch/utils/_sympy/interp.py:179] [0/0] failed while executing pow_by_natural([VR[4, int_oo], VR[-1, -1]])
2025-12-04T10:19:48.0563057Z W1204 10:18:23.560000 36415 site-packages/torch/utils/_sympy/interp.py:179] [0/0] failed while executing pow_by_natural([VR[-int_oo, int_oo], VR[-1, -1]])
2025-12-04T10:19:48.0563164Z PASSED [2.3943s] [ 40%]
2025-12-04T10:19:48.0563989Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_relu_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3397s] [ 41%]
2025-12-04T10:19:48.0564929Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_remove_noop_view_dtype_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2395s] [ 42%]
2025-12-04T10:19:48.0565746Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_repeat_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.9065s] [ 43%]
2025-12-04T10:19:48.0566641Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_repeat_interleave_2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3294s] [ 45%]
2025-12-04T10:19:48.0567685Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_repeat_interleave_Tensor_decomp_int32_nd_2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5351s] [ 46%]
2025-12-04T10:19:48.0568665Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_require_stride_expanded_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (Skipped!) [ 47%]
2025-12-04T10:19:48.0569494Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_resize_as_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [19.8656s] [ 48%]
2025-12-04T10:19:48.0570307Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_roll_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.5791s] [ 49%]
2025-12-04T10:19:48.0571398Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_rsqrt_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2768s] [ 50%]
2025-12-04T10:19:48.0572248Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scalar_output_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.1325s] [ 51%]
2025-12-04T10:19:48.0573090Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scatter5_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.6721s] [ 52%]
2025-12-04T10:19:48.0573942Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scatter_reduce1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4243s] [ 53%]
2025-12-04T10:19:48.0574870Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scheduler_vertical_fusion1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.7659s] [ 54%]
2025-12-04T10:19:48.0575749Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sdpa_unaligned_mask_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5627s] [ 56%]
2025-12-04T10:19:48.0576683Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_searchsorted_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [13.7973s] [ 57%]
2025-12-04T10:19:48.0577547Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_select_scatter_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5948s] [ 58%]
2025-12-04T10:19:48.0578340Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sin_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5009s] [ 59%]
2025-12-04T10:19:48.0579324Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_size_asserts_for_multi_output_fallback_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1578s] [ 60%]
2025-12-04T10:19:48.0580185Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sizehint_issue1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.8376s] [ 61%]
2025-12-04T10:19:48.0581007Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.6224s] [ 62%]
2025-12-04T10:19:48.0581976Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_mutation3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3567s] [ 63%]
2025-12-04T10:19:48.0582846Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_scatter2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3713s] [ 64%]
2025-12-04T10:19:48.0583784Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_scatter_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.0612s] [ 65%]
2025-12-04T10:19:48.0584697Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_scatter_reinplace_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [2.3586s] [ 67%]
2025-12-04T10:19:48.0585519Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_softmax_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.1851s] [ 68%]
2025-12-04T10:19:48.0586427Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_softmax_one_kernel_persist_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5542s] [ 69%]
2025-12-04T10:19:48.0587408Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_cumprod_low_prec_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (Requires sm80) [ 70%]
2025-12-04T10:19:48.0588250Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_cumsum_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [6.6486s] [ 71%]
2025-12-04T10:19:48.0589214Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_cumsum_low_prec_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (Requires sm80) [ 72%]
2025-12-04T10:19:48.0590295Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sqrt_dynamic_shapes_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (sqrt dynamic shapes only supports cpu) [ 73%]
2025-12-04T10:19:48.0591138Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_squeeze1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.6889s] [ 74%]
2025-12-04T10:19:48.0591936Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sum3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3984s] [ 75%]
2025-12-04T10:19:48.0592729Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sum5_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4989s] [ 76%]
2025-12-04T10:19:48.0593543Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_tanh_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4368s] [ 78%]
2025-12-04T10:19:48.0594511Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_tmp_not_defined_issue1_use_block_ptr_True_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.7638s] [ 79%]
2025-12-04T10:19:48.0595346Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_to_dtype_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4644s] [ 80%]
2025-12-04T10:19:48.0596344Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_triton_argmin_argmax_transpose_logical_index_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [3.2268s] [ 81%]
2025-12-04T10:19:48.0597249Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_triton_kernel_bool_param_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4998s] [ 82%]
2025-12-04T10:19:48.0598224Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unfold_zero_dimension_tensor_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1141s] [ 83%]
2025-12-04T10:19:48.0599128Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unroll_small_reduction_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.1768s] [ 84%]
2025-12-04T10:19:48.0600011Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unspec_inputs_float16_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5993s] [ 85%]
2025-12-04T10:19:48.0600964Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unspec_inputs_float32_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5079s] [ 86%]
2025-12-04T10:19:48.0601865Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unspec_inputs_float64_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.6759s] [ 87%]
2025-12-04T10:19:48.0602696Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unsqueeze_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.0717s] [ 89%]
2025-12-04T10:19:48.0603583Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unsqueeze_inplace_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4361s] [ 90%]
2025-12-04T10:19:48.0604476Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_upsample_bilinear2d_a_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [3.2228s] [ 91%]
2025-12-04T10:19:48.0605414Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_vectorized_ops_masked_var_novec_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3384s] [ 92%]
2025-12-04T10:19:48.0606265Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_view_as_complex_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.2636s] [ 93%]
2025-12-04T10:19:48.0607096Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_view_as_real_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2346s] [ 94%]
2025-12-04T10:19:48.0607934Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_view_detach_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.1546s] [ 95%]
2025-12-04T10:19:48.0608745Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_views2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [3.9210s] [ 96%]
2025-12-04T10:19:48.0609573Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_views3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4985s] [ 97%]
2025-12-04T10:19:48.0610378Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_views7_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3701s] [ 98%]
2025-12-04T10:19:48.0611197Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_zeros_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.8997s] [100%]
2025-12-04T10:19:48.0611203Z 
2025-12-04T10:19:48.0612204Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-f8b2416e9d43ac69.xml -
2025-12-04T10:19:48.0612463Z ==== 74 passed, 6 skipped, 350 deselected, 11 xfailed in 126.01s (0:02:06) =====
2025-12-04T10:19:48.0613390Z The following tests failed consistently: ['test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda']
2025-12-04T10:19:48.0613395Z 
2025-12-04T10:19:48.0614257Z FINISHED PRINTING LOG FILE of inductor/test_torchinductor_codegen_dynamic_shapes 2/4 (test/test-reports/inductor.test_torchinductor_codegen_dynamic_shapes_2.4_37f84ce4dcc870f4_.log)
2025-12-04T10:19:48.0614263Z 
2025-12-04T10:19:48.0614758Z Finished inductor/test_torchinductor_codegen_dynamic_shapes 2/4 ... [2025-12-04 10:19:47.858455][4016.241347114], took 11.09min
2025-12-04T10:19:48.0615806Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-0c75da116b2f10f8.xml
2025-12-04T10:19:48.0616982Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-fd0863b8a222871a.xml
2025-12-04T10:19:48.0618033Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-6fcb35b3fc35a71c.xml
2025-12-04T10:19:48.0773006Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-f8b2416e9d43ac69.xml
2025-12-04T10:19:48.4069903Z Uploading logs for 57119749248 to S3
2025-12-04T10:19:48.4519181Z Uploading artifacts took 0.34 seconds
2025-12-04T10:19:48.4519749Z inductor/test_torchinductor_codegen_dynamic_shapes 2/4 failed!
2025-12-04T10:19:48.4524034Z Running inductor/test_torchinductor_opinfo 2/17 ... [2025-12-04 10:19:48.452221][4016.835117137]
2025-12-04T10:19:48.4524666Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:19:48.4529099Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=2', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:19:48.452667]
2025-12-04T10:30:08.2074160Z 
2025-12-04T10:30:08.2075475Z inductor/test_torchinductor_opinfo 2/17 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_2.17_595df7515ef47f8b_.log
2025-12-04T10:30:08.2196235Z Running 196 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___ror___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acos_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_arange_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argwhere_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bmm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cartesian_prod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdist_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chunk_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_min_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_column_stack_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_contiguous_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_corrcoef_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cov_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cov_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummax_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumprod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_embed_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagflat_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_floor_rounding_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_trunc_rounding_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_trunc_rounding_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_strided_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eq_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erf_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erf_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flip_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gather_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gradient_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gradient_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_histc_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hsplit_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hsplit_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_i0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_prod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_inner_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isnan_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lgamma_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_eigvalsh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lstsq_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_power_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_slogdet_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vander_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log10_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_normal_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logit_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lu_solve_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_scatter_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_sum_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_matmul_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_matrix_exp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_movedim_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_ones_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_zeros_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_zeros_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_zeros_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nextafter_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_batch_norm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_celu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_similarity_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_ctc_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_grid_sample_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_leaky_relu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool3d_grad_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_rms_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_qr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_like_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_like_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_remainder_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_renorm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_as_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_neg_3_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_sum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_scatter_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_blackman_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_hamming_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinc_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinh_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sort_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i1_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_unbiased_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_svd_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensordot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trace_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_triu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_consecutive_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_bool
2025-12-04T10:30:08.2310415Z 
2025-12-04T10:30:08.2310837Z Finished inductor/test_torchinductor_opinfo 2/17 ... [2025-12-04 10:30:08.207758][4636.5906515], took 10.33min
2025-12-04T10:30:08.2312305Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-8ad43f769763d7e0.xml
2025-12-04T10:30:08.2895100Z Running inductor/test_torchinductor_opinfo 7/17 ... [2025-12-04 10:30:08.289203][4636.672096974]
2025-12-04T10:30:08.2895725Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:30:08.2899405Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=7', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:30:08.289677]
2025-12-04T10:41:46.2950398Z 
2025-12-04T10:41:46.2951589Z inductor/test_torchinductor_opinfo 7/17 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_7.17_bf87dc9c512027f2_.log
2025-12-04T10:41:46.3076911Z Running 209 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___radd___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rand___cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmod___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rxor___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__batch_norm_with_update_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__batch_norm_with_update_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__segment_reduce_lengths_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addcmul_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addr_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_all_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argsort_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argwhere_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bernoulli_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_and_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_left_shift_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_right_shift_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_xor_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bucketize_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bucketize_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cfloat_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_max_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_min_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_column_stack_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumulative_trapezoid_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_deg2rad_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_digamma_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dot_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_permuted_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_strided_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_strided_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erf_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_as_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfftn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ge_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_fill_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_select_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_select_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isinf_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isposinf_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kthvalue_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_rank_hermitian_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_tensor_overload_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log10_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logdet_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_tensor_overload_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumsum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumsum_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumsum_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_normalize_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_prod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_sum_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_var_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_binary_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_pool2d_with_indices_backward_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_no_dim_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_maximum_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_maximum_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_movedim_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmedian_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmedian_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_max_pool1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardsigmoid_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_huber_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_bilinear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_linear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_local_response_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool1d_grad_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool1d_grad_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_one_hot_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_constant_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pdist_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_prelu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softplus_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_like_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_interleave_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize_as__cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_conj_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_roll_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rot90_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsqrt_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scalar_tensor_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_add_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_mean_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sgn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sgn_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_blackman_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_general_hamming_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_hann_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sort_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i1_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sub_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensor_split_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tile_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_sparse_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trace_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapz_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_consecutive_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_uint16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_mean_unbiased_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_xlogy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_xlogy_cuda_float32
2025-12-04T10:41:46.3201051Z 
2025-12-04T10:41:46.3201489Z Finished inductor/test_torchinductor_opinfo 7/17 ... [2025-12-04 10:41:46.294728][5334.677622919], took 11.63min
2025-12-04T10:41:46.3203009Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-6495f5d67df68869.xml
2025-12-04T10:41:46.6663552Z Uploading artifacts took 0.28 seconds
2025-12-04T10:41:46.6668555Z Running inductor/test_torchinductor_opinfo 12/17 ... [2025-12-04 10:41:46.666615][5335.049510926]
2025-12-04T10:41:46.6669215Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:41:46.6673471Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=12', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:41:46.667054]
2025-12-04T10:49:42.5548764Z 
2025-12-04T10:49:42.5552041Z inductor/test_torchinductor_opinfo 12/17 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_12.17_a032934f54d29036_.log
2025-12-04T10:49:42.5666643Z Running 195 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___getitem___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___ror___cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__native_batch_norm_legit_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_abs_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acosh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_add_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addcdiv_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addmm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_scatter_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_baddbmm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_or_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_or_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cartesian_prod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chunk_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_constant_pad_nd_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumsum_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diff_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_no_rounding_mode_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfftn_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfftn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flipud_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flipud_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_power_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ge_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_grid_sampler_3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_half_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_heaviside_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_mean_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_prod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_prod_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isneginf_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isneginf_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isneginf_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kron_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lcm_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_inv_ex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_multi_dot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_hermitian_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_singular_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vander_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_xor_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logit_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mT_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_fill_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_logsumexp_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_prod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_prod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_binary_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_median_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_binary_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mul_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nansum_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_batch_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_batch_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_layer_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_neg_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_zeros_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_batch_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_dropout3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_gelu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_group_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardswish_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardtanh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hinge_embedding_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_instance_norm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_linear_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_mish_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multi_margin_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_circular_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_constant_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_silu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_in_place_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_in_place_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_outer_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_outer_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_outer_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pca_lowrank_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pow_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pow_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_interleave_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsqrt_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_mean_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_prod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_sum_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sigmoid_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sparse_sampled_addmm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_airy_ai_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtri_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sqrt_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sqrt_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_svd_lowrank_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_svd_lowrank_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tan_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensordot_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tile_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__efficient_attention_forward_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trace_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trace_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_indices_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_triu_indices_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_consecutive_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unravel_index_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_split_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_complex_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_like_cuda_float32
2025-12-04T10:49:42.5779492Z 
2025-12-04T10:49:42.5779918Z Finished inductor/test_torchinductor_opinfo 12/17 ... [2025-12-04 10:49:42.555027][5810.937919587], took 7.93min
2025-12-04T10:49:42.5781399Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-f9f6352517dfd8be.xml
2025-12-04T10:49:42.6605562Z Running inductor/test_torchinductor_opinfo 17/17 ... [2025-12-04 10:49:42.660147][5811.043041087]
2025-12-04T10:49:42.6606615Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:49:42.6610231Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=17', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:49:42.660648]
2025-12-04T10:59:27.7335252Z 
2025-12-04T10:59:27.7336662Z inductor/test_torchinductor_opinfo 17/17 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_17.17_0b4f962be1a8215a_.log
2025-12-04T10:59:27.7460591Z Running 206 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___getitem___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___getitem___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmatmul___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmod___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addmm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addr_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_alias_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_all_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_any_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argwhere_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argwhere_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_partial_views_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bernoulli_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bfloat16_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bincount_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_or_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_or_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdist_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdouble_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdouble_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cfloat_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cholesky_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_physical_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_constant_pad_nd_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_contiguous_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumulative_trapezoid_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumulative_trapezoid_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_embed_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_digamma_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_permuted_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expm1_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_uint32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gather_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_geometric_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gt_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hash_tensor_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hash_tensor_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isinf_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_eigvalsh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_inv_ex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_ldl_factor_ex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_power_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_xor_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mT_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_logaddexp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_logsumexp_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_mean_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_binary_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_no_dim_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_median_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_no_dim_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_with_dim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_movedim_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nan_to_num_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_neg_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_max_pool3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_batch_norm_without_cudnn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cross_entropy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_dropout2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_embedding_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_with_train_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_gelu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_group_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_area_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_layer_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_mish_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_mse_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multilabel_margin_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_constant_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_rrelu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_threshold_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_positive_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_positive_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_like_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize__cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize__cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rot90_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_sum_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_sum_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_nuttall_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_softmax_with_dtype_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j1_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i0e_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1e_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtri_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtri_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_mean_unbiased_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sub_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unravel_index_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_unbiased_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vsplit_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vsplit_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_where_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_where_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_xlogy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_like_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_like_cuda_uint8
2025-12-04T10:59:27.7581114Z 
2025-12-04T10:59:27.7581549Z Finished inductor/test_torchinductor_opinfo 17/17 ... [2025-12-04 10:59:27.734029][6396.116922776], took 9.75min
2025-12-04T10:59:27.7583151Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-25ab0fa1230b07b5.xml
2025-12-04T10:59:27.8264746Z Running inductor/test_cuda_select_algorithm 3/5 ... [2025-12-04 10:59:27.826112][6396.209005792]
2025-12-04T10:59:27.8265391Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:59:27.8268625Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cuda_select_algorithm.py', '--shard-id=3', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:59:27.826567]
2025-12-04T11:20:45.1673555Z 
2025-12-04T11:20:45.1674843Z PRINTING LOG FILE of inductor/test_cuda_select_algorithm 3/5 (test/test-reports/inductor.test_cuda_select_algorithm_3.5_e3565bc7025c1889_.log)
2025-12-04T11:20:45.1676095Z W1204 10:59:37.235000 86349 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.1678089Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-74cab4bdcde89184.xml
2025-12-04T11:20:45.1679711Z ============================= test session starts ==============================
2025-12-04T11:20:45.1680906Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.1681698Z cachedir: .pytest_cache
2025-12-04T11:20:45.1682912Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.1684090Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.1684473Z configfile: pytest.ini
2025-12-04T11:20:45.1685755Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.1686769Z collecting ... collected 58 items
2025-12-04T11:20:45.1687377Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T11:20:45.1705380Z Running 14 items in this shard: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.1724015Z 
2025-12-04T11:20:45.1725525Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [4.2856s] [  7%]
2025-12-04T11:20:45.1728915Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.8332s] [  7%]
2025-12-04T11:20:45.1731781Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.8264s] [  7%]
﻿2025-12-04T11:20:45.1738167Z 
2025-12-04T11:20:45.1738400Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.1739677Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.1740922Z Traceback (most recent call last):
2025-12-04T11:20:45.1742113Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 204, in test_int8_woq_mm_concat_cuda
2025-12-04T11:20:45.1743477Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 3)
2025-12-04T11:20:45.1744671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.1745804Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.1747054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.1748270Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.1749048Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.1749479Z 
2025-12-04T11:20:45.1749662Z Expected 3 but got 6.
2025-12-04T11:20:45.1750050Z Absolute difference: 3
2025-12-04T11:20:45.1750552Z Relative difference: 1.0
2025-12-04T11:20:45.1750776Z 
2025-12-04T11:20:45.1751143Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.1753110Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.1754613Z 
2025-12-04T11:20:45.1754982Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.1756035Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.1756753Z stats [('calls_captured', 36)]
2025-12-04T11:20:45.1758175Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)]
2025-12-04T11:20:45.1759821Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.1760680Z graph_break []
2025-12-04T11:20:45.1761356Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.1763463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.1765182Z   warnings.warn(
2025-12-04T11:20:45.1766725Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.1768502Z   warnings.warn(
2025-12-04T11:20:45.1769993Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.1771575Z Traceback (most recent call last):
2025-12-04T11:20:45.1772889Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 204, in test_int8_woq_mm_concat_cuda
2025-12-04T11:20:45.1774300Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 3)
2025-12-04T11:20:45.1775785Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.1777247Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.1778711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.1780270Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.1781120Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.1781598Z 
2025-12-04T11:20:45.1781769Z Expected 3 but got 6.
2025-12-04T11:20:45.1782318Z Absolute difference: 3
2025-12-04T11:20:45.1782826Z Relative difference: 1.0
2025-12-04T11:20:45.1783373Z 
2025-12-04T11:20:45.1783811Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.1785458Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.1786543Z 
2025-12-04T11:20:45.1786814Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.1787489Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.1787990Z stats [('calls_captured', 36)]
2025-12-04T11:20:45.1788734Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)]
2025-12-04T11:20:45.1789649Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.1790127Z graph_break []
2025-12-04T11:20:45.1790506Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.1791608Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.1792591Z   warnings.warn(
2025-12-04T11:20:45.1793505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.1794470Z   warnings.warn(
2025-12-04T11:20:45.1796949Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.1797437Z stats [('calls_captured', 36)]
2025-12-04T11:20:45.1797883Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.1798804Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)]
2025-12-04T11:20:45.1799592Z graph_break []
2025-12-04T11:20:45.1799977Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.1801078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.1802063Z   warnings.warn(
2025-12-04T11:20:45.1802952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.1803930Z   warnings.warn(
2025-12-04T11:20:45.1804236Z =================================== FAILURES ===================================
2025-12-04T11:20:45.1805086Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.1806072Z Traceback (most recent call last):
2025-12-04T11:20:45.1806859Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 204, in test_int8_woq_mm_concat_cuda
2025-12-04T11:20:45.1807781Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 3)
2025-12-04T11:20:45.1808629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.1809442Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.1810267Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.1811163Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.1811650Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.1811901Z 
2025-12-04T11:20:45.1812024Z Expected 3 but got 6.
2025-12-04T11:20:45.1812307Z Absolute difference: 3
2025-12-04T11:20:45.1812609Z Relative difference: 1.0
2025-12-04T11:20:45.1812806Z 
2025-12-04T11:20:45.1813035Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.1814380Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.1815462Z 
2025-12-04T11:20:45.1815734Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.1816500Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.1816994Z stats [('calls_captured', 36)]
2025-12-04T11:20:45.1817754Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)]
2025-12-04T11:20:45.1818642Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.1819124Z graph_break []
2025-12-04T11:20:45.1819512Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.1820623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.1821592Z   warnings.warn(
2025-12-04T11:20:45.1822496Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.1823472Z   warnings.warn(
2025-12-04T11:20:45.1823851Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.1824346Z stats [('calls_captured', 36)]
2025-12-04T11:20:45.1824800Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.1825705Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)]
2025-12-04T11:20:45.1826479Z graph_break []
2025-12-04T11:20:45.1826865Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.1827966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.1828937Z   warnings.warn(
2025-12-04T11:20:45.1829814Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.1830794Z   warnings.warn(
2025-12-04T11:20:45.1831188Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.1831656Z stats [('calls_captured', 36)]
2025-12-04T11:20:45.1832114Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.1833136Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)]
2025-12-04T11:20:45.1833919Z graph_break []
2025-12-04T11:20:45.1834290Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.1835385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.1836386Z   warnings.warn(
2025-12-04T11:20:45.1837265Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.1838240Z   warnings.warn(
2025-12-04T11:20:45.1839248Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-74cab4bdcde89184.xml -
2025-12-04T11:20:45.1840401Z =========================== short test summary info ============================
2025-12-04T11:20:45.1841694Z FAILED [0.8264s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.1842834Z 
2025-12-04T11:20:45.1842943Z Expected 3 but got 6.
2025-12-04T11:20:45.1843242Z Absolute difference: 3
2025-12-04T11:20:45.1843542Z Relative difference: 1.0
2025-12-04T11:20:45.1843740Z 
2025-12-04T11:20:45.1843965Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.1845262Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.1846340Z 
2025-12-04T11:20:45.1846611Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.1847213Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.1847695Z ========================== 1 failed, 2 rerun in 5.98s ==========================
2025-12-04T11:20:45.1848111Z Got exit code 1
2025-12-04T11:20:45.1848386Z Retrying single test...
2025-12-04T11:20:45.1849009Z W1204 10:59:57.781000 86519 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.1850253Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-77e37a2f8b75b3d9.xml
2025-12-04T11:20:45.1851219Z ============================= test session starts ==============================
2025-12-04T11:20:45.1851886Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.1852477Z cachedir: .pytest_cache
2025-12-04T11:20:45.1853188Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.1853981Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.1854345Z configfile: pytest.ini
2025-12-04T11:20:45.1855066Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.1855967Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.1857431Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.1858711Z Running 1 items in this shard
2025-12-04T11:20:45.1858927Z 
2025-12-04T11:20:45.1860313Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:00:01.840475111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1861760Z 
2025-12-04T11:20:45.1862286Z [W1204 11:00:17.953892386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1862953Z 
2025-12-04T11:20:45.1863469Z [W1204 11:00:17.954164824 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1864177Z 
2025-12-04T11:20:45.1864688Z [W1204 11:00:17.954810229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1865333Z 
2025-12-04T11:20:45.1865855Z [W1204 11:00:17.955015966 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1866503Z 
2025-12-04T11:20:45.1867026Z [W1204 11:00:17.956811964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1867679Z 
2025-12-04T11:20:45.1868189Z [W1204 11:00:17.956991029 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1868886Z 
2025-12-04T11:20:45.1869400Z [W1204 11:00:17.957308954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1870063Z 
2025-12-04T11:20:45.1870575Z [W1204 11:00:17.957480472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1871442Z 
2025-12-04T11:20:45.1871970Z [W1204 11:00:17.968243169 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1872619Z 
2025-12-04T11:20:45.1873148Z [W1204 11:00:17.968480778 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1873798Z 
2025-12-04T11:20:45.1874318Z [W1204 11:00:17.968686775 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1874983Z 
2025-12-04T11:20:45.1875497Z [W1204 11:00:17.968979381 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1876155Z 
2025-12-04T11:20:45.1876671Z [W1204 11:00:17.969152214 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1877321Z 
2025-12-04T11:20:45.1877844Z [W1204 11:00:17.969445114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1878489Z 
2025-12-04T11:20:45.1879015Z [W1204 11:00:17.969619167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1879663Z 
2025-12-04T11:20:45.1880177Z [W1204 11:00:17.969903509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1880842Z 
2025-12-04T11:20:45.1881352Z [W1204 11:00:17.970104807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1882011Z 
2025-12-04T11:20:45.1882520Z [W1204 11:00:17.095645911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1883173Z 
2025-12-04T11:20:45.1883700Z [W1204 11:00:17.095970543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1884346Z 
2025-12-04T11:20:45.1884864Z [W1204 11:00:17.096161586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1885511Z 
2025-12-04T11:20:45.1886148Z [W1204 11:00:17.096460349 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1886812Z 
2025-12-04T11:20:45.1887324Z [W1204 11:00:17.096648820 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1887986Z 
2025-12-04T11:20:45.1888495Z [W1204 11:00:17.096954946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1889196Z 
2025-12-04T11:20:45.1889724Z [W1204 11:00:17.097124097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1890372Z 
2025-12-04T11:20:45.1890901Z [W1204 11:00:17.097403096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1891551Z 
2025-12-04T11:20:45.1892068Z [W1204 11:00:17.097570407 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1892786Z 
2025-12-04T11:20:45.1893298Z [W1204 11:00:19.182966350 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1893962Z 
2025-12-04T11:20:45.1894473Z [W1204 11:00:19.184220379 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1895141Z 
2025-12-04T11:20:45.1895660Z [W1204 11:00:19.184419949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1896389Z 
2025-12-04T11:20:45.1896914Z [W1204 11:00:19.184733274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1897563Z 
2025-12-04T11:20:45.1898091Z [W1204 11:00:19.184929591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1898744Z 
2025-12-04T11:20:45.1899258Z [W1204 11:00:19.185236684 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1899924Z 
2025-12-04T11:20:45.1900433Z [W1204 11:00:19.185421198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1901097Z 
2025-12-04T11:20:45.1901611Z [W1204 11:00:19.185705553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1902262Z 
2025-12-04T11:20:45.1902786Z [W1204 11:00:19.185879999 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1903428Z 
2025-12-04T11:20:45.1903958Z [W1204 11:00:19.194064884 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1904608Z 
2025-12-04T11:20:45.1905121Z [W1204 11:00:19.194311849 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1905784Z 
2025-12-04T11:20:45.1906296Z [W1204 11:00:19.194504964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1906960Z 
2025-12-04T11:20:45.1907476Z [W1204 11:00:19.194781605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1908125Z 
2025-12-04T11:20:45.1908653Z [W1204 11:00:19.194960093 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1909300Z 
2025-12-04T11:20:45.1909830Z [W1204 11:00:19.195261953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1910544Z 
2025-12-04T11:20:45.1911057Z [W1204 11:00:19.195438828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1911725Z 
2025-12-04T11:20:45.1912237Z [W1204 11:00:19.195722906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1912934Z 
2025-12-04T11:20:45.1913447Z [W1204 11:00:19.195897647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1914095Z 
2025-12-04T11:20:45.1914623Z [W1204 11:00:19.314678505 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1915270Z 
2025-12-04T11:20:45.1915797Z [W1204 11:00:19.314970990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1916444Z 
2025-12-04T11:20:45.1916961Z [W1204 11:00:19.315163864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1917661Z 
2025-12-04T11:20:45.1918173Z [W1204 11:00:19.315469472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1918832Z 
2025-12-04T11:20:45.1919349Z [W1204 11:00:19.315646782 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1919998Z 
2025-12-04T11:20:45.1920526Z [W1204 11:00:19.315952755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1921174Z 
2025-12-04T11:20:45.1921702Z [W1204 11:00:19.316129178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1922351Z 
2025-12-04T11:20:45.1922869Z [W1204 11:00:19.316420043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1923537Z 
2025-12-04T11:20:45.1924050Z [W1204 11:00:19.316611607 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1924714Z 
2025-12-04T11:20:45.1924850Z ('RERUN', {'yellow': True}) [20.3918s] [100%]
2025-12-04T11:20:45.1926427Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:00:20.770626612 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1927867Z 
2025-12-04T11:20:45.1928397Z [W1204 11:00:20.770918494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1929051Z 
2025-12-04T11:20:45.1929580Z [W1204 11:00:20.771108700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1930228Z 
2025-12-04T11:20:45.1930738Z [W1204 11:00:20.771396373 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1931400Z 
2025-12-04T11:20:45.1931913Z [W1204 11:00:20.771577779 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1932575Z 
2025-12-04T11:20:45.1933091Z [W1204 11:00:20.771882639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1933736Z 
2025-12-04T11:20:45.1934260Z [W1204 11:00:20.772057543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1934909Z 
2025-12-04T11:20:45.1935520Z [W1204 11:00:20.772342053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1936173Z 
2025-12-04T11:20:45.1936781Z [W1204 11:00:20.772515249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1937447Z 
2025-12-04T11:20:45.1937959Z [W1204 11:00:20.781012889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1938655Z 
2025-12-04T11:20:45.1939167Z [W1204 11:00:20.781269766 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1939817Z 
2025-12-04T11:20:45.1940341Z [W1204 11:00:20.781459534 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1940991Z 
2025-12-04T11:20:45.1941523Z [W1204 11:00:20.781744592 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1942205Z 
2025-12-04T11:20:45.1942714Z [W1204 11:00:20.781921396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1943369Z 
2025-12-04T11:20:45.1943883Z [W1204 11:00:20.782226225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1944550Z 
2025-12-04T11:20:45.1945061Z [W1204 11:00:20.782401036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1945702Z 
2025-12-04T11:20:45.1946226Z [W1204 11:00:20.782680355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1946874Z 
2025-12-04T11:20:45.1947405Z [W1204 11:00:20.782851371 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1948057Z 
2025-12-04T11:20:45.1948571Z [W1204 11:00:20.901517992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1949236Z 
2025-12-04T11:20:45.1949745Z [W1204 11:00:20.901811414 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1950402Z 
2025-12-04T11:20:45.1950912Z [W1204 11:00:20.902002693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1951563Z 
2025-12-04T11:20:45.1952088Z [W1204 11:00:20.902301778 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1952734Z 
2025-12-04T11:20:45.1953262Z [W1204 11:00:20.902481222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1953909Z 
2025-12-04T11:20:45.1954426Z [W1204 11:00:20.902785157 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1955088Z 
2025-12-04T11:20:45.1955605Z [W1204 11:00:20.902959692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1956270Z 
2025-12-04T11:20:45.1956781Z [W1204 11:00:20.903241195 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1957426Z 
2025-12-04T11:20:45.1957950Z [W1204 11:00:20.903413893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1958596Z 
2025-12-04T11:20:45.1959119Z [W1204 11:00:20.070163567 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1959857Z 
2025-12-04T11:20:45.1960373Z [W1204 11:00:20.070466279 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1961043Z 
2025-12-04T11:20:45.1961555Z [W1204 11:00:20.070663357 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1962255Z 
2025-12-04T11:20:45.1962768Z [W1204 11:00:20.070958604 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1963436Z 
2025-12-04T11:20:45.1963949Z [W1204 11:00:20.071145989 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1964598Z 
2025-12-04T11:20:45.1965121Z [W1204 11:00:20.071448261 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1965776Z 
2025-12-04T11:20:45.1966304Z [W1204 11:00:20.071628375 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1966987Z 
2025-12-04T11:20:45.1967497Z [W1204 11:00:20.071914899 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1968157Z 
2025-12-04T11:20:45.1968670Z [W1204 11:00:20.072089877 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1969327Z 
2025-12-04T11:20:45.1969839Z [W1204 11:00:20.080524134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1970483Z 
2025-12-04T11:20:45.1971213Z [W1204 11:00:20.080832059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1971869Z 
2025-12-04T11:20:45.1972401Z [W1204 11:00:20.081030061 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1973054Z 
2025-12-04T11:20:45.1973565Z [W1204 11:00:20.081314117 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1974229Z 
2025-12-04T11:20:45.1974743Z [W1204 11:00:20.081492458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1975405Z 
2025-12-04T11:20:45.1975921Z [W1204 11:00:20.081788949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1976636Z 
2025-12-04T11:20:45.1977160Z [W1204 11:00:20.081964710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1977806Z 
2025-12-04T11:20:45.1978338Z [W1204 11:00:20.082262923 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1978989Z 
2025-12-04T11:20:45.1979502Z [W1204 11:00:20.082439281 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1980161Z 
2025-12-04T11:20:45.1980675Z [W1204 11:00:20.202496916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1981338Z 
2025-12-04T11:20:45.1981855Z [W1204 11:00:20.202789151 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1982508Z 
2025-12-04T11:20:45.1983039Z [W1204 11:00:20.202982229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1983686Z 
2025-12-04T11:20:45.1984327Z [W1204 11:00:20.203287309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1984985Z 
2025-12-04T11:20:45.1985498Z [W1204 11:00:20.203471164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1986162Z 
2025-12-04T11:20:45.1986676Z [W1204 11:00:20.203778557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1987382Z 
2025-12-04T11:20:45.1987898Z [W1204 11:00:20.203956237 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1988545Z 
2025-12-04T11:20:45.1989069Z [W1204 11:00:20.204243554 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1989716Z 
2025-12-04T11:20:45.1990243Z [W1204 11:00:20.204415572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1990946Z 
2025-12-04T11:20:45.1991080Z ('RERUN', {'yellow': True}) [0.8457s] [100%]
2025-12-04T11:20:45.1992650Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:00:21.591484339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1994104Z 
2025-12-04T11:20:45.1994618Z [W1204 11:00:21.591787280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1995284Z 
2025-12-04T11:20:45.1995799Z [W1204 11:00:21.591979661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1996446Z 
2025-12-04T11:20:45.1996976Z [W1204 11:00:21.592269567 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1997632Z 
2025-12-04T11:20:45.1998159Z [W1204 11:00:21.592447616 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1998809Z 
2025-12-04T11:20:45.1999323Z [W1204 11:00:21.592767296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.1999989Z 
2025-12-04T11:20:45.2000508Z [W1204 11:00:21.592943942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2001169Z 
2025-12-04T11:20:45.2001686Z [W1204 11:00:21.593229118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2002339Z 
2025-12-04T11:20:45.2002872Z [W1204 11:00:21.593400011 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2003523Z 
2025-12-04T11:20:45.2004056Z [W1204 11:00:21.601801537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2004709Z 
2025-12-04T11:20:45.2005221Z [W1204 11:00:21.602057175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2005883Z 
2025-12-04T11:20:45.2006395Z [W1204 11:00:21.602245364 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2007054Z 
2025-12-04T11:20:45.2007566Z [W1204 11:00:21.602523826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2008218Z 
2025-12-04T11:20:45.2008738Z [W1204 11:00:21.602696622 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2009451Z 
2025-12-04T11:20:45.2009977Z [W1204 11:00:21.602989909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2010630Z 
2025-12-04T11:20:45.2011139Z [W1204 11:00:21.603163757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2011830Z 
2025-12-04T11:20:45.2012343Z [W1204 11:00:21.603443185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2013004Z 
2025-12-04T11:20:45.2013518Z [W1204 11:00:21.603624371 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2014167Z 
2025-12-04T11:20:45.2014695Z [W1204 11:00:21.721047897 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2015345Z 
2025-12-04T11:20:45.2015876Z [W1204 11:00:21.721335371 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2016660Z 
2025-12-04T11:20:45.2017182Z [W1204 11:00:21.721525245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2017847Z 
2025-12-04T11:20:45.2018368Z [W1204 11:00:21.721820804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2019031Z 
2025-12-04T11:20:45.2019543Z [W1204 11:00:21.721999497 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2020196Z 
2025-12-04T11:20:45.2020725Z [W1204 11:00:21.722295721 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2021380Z 
2025-12-04T11:20:45.2021911Z [W1204 11:00:21.722472531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2022567Z 
2025-12-04T11:20:45.2023079Z [W1204 11:00:21.722757128 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2023748Z 
2025-12-04T11:20:45.2024260Z [W1204 11:00:21.722929754 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2024925Z 
2025-12-04T11:20:45.2025440Z [W1204 11:00:21.889770494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2026090Z 
2025-12-04T11:20:45.2026616Z [W1204 11:00:21.890093563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2027264Z 
2025-12-04T11:20:45.2027795Z [W1204 11:00:21.890300142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2028443Z 
2025-12-04T11:20:45.2028952Z [W1204 11:00:21.890587401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2029617Z 
2025-12-04T11:20:45.2030128Z [W1204 11:00:21.890769509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2030797Z 
2025-12-04T11:20:45.2031313Z [W1204 11:00:21.891068843 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2031977Z 
2025-12-04T11:20:45.2032492Z [W1204 11:00:21.891249625 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2033141Z 
2025-12-04T11:20:45.2033745Z [W1204 11:00:21.891534095 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2034401Z 
2025-12-04T11:20:45.2034929Z [W1204 11:00:21.891709417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2035581Z 
2025-12-04T11:20:45.2036092Z [W1204 11:00:21.899735938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2036788Z 
2025-12-04T11:20:45.2037302Z [W1204 11:00:21.899972111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2037966Z 
2025-12-04T11:20:45.2038474Z [W1204 11:00:21.900196171 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2039125Z 
2025-12-04T11:20:45.2039653Z [W1204 11:00:21.900483343 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2040338Z 
2025-12-04T11:20:45.2040862Z [W1204 11:00:21.900673369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2041509Z 
2025-12-04T11:20:45.2042020Z [W1204 11:00:21.900974792 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2042689Z 
2025-12-04T11:20:45.2043200Z [W1204 11:00:21.901149948 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2043865Z 
2025-12-04T11:20:45.2044376Z [W1204 11:00:21.901432333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2045030Z 
2025-12-04T11:20:45.2045561Z [W1204 11:00:21.901605830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2046210Z 
2025-12-04T11:20:45.2046741Z [W1204 11:00:21.019100577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2047389Z 
2025-12-04T11:20:45.2047903Z [W1204 11:00:21.019410159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2048563Z 
2025-12-04T11:20:45.2049078Z [W1204 11:00:21.019603022 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2049740Z 
2025-12-04T11:20:45.2050251Z [W1204 11:00:21.019902421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2050906Z 
2025-12-04T11:20:45.2051434Z [W1204 11:00:21.020108298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2052090Z 
2025-12-04T11:20:45.2052616Z [W1204 11:00:21.020432567 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2053264Z 
2025-12-04T11:20:45.2053777Z [W1204 11:00:21.020649246 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2054437Z 
2025-12-04T11:20:45.2054950Z [W1204 11:00:21.020961166 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2055610Z 
2025-12-04T11:20:45.2056124Z [W1204 11:00:21.021139146 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2056855Z 
2025-12-04T11:20:45.2056974Z FAILED [0.8145s] [100%]
2025-12-04T11:20:45.2057155Z 
2025-12-04T11:20:45.2057316Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.2058216Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.2059023Z Traceback (most recent call last):
2025-12-04T11:20:45.2059822Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 204, in test_int8_woq_mm_concat_cuda
2025-12-04T11:20:45.2060723Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 3)
2025-12-04T11:20:45.2061603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2062375Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2063219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2064092Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2064574Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2064831Z 
2025-12-04T11:20:45.2064954Z Expected 3 but got 6.
2025-12-04T11:20:45.2065272Z Absolute difference: 3
2025-12-04T11:20:45.2065576Z Relative difference: 1.0
2025-12-04T11:20:45.2065783Z 
2025-12-04T11:20:45.2065998Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2067293Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2068362Z 
2025-12-04T11:20:45.2068633Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2069271Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2069759Z stats [('calls_captured', 36)]
2025-12-04T11:20:45.2070514Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)]
2025-12-04T11:20:45.2071592Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2072077Z graph_break []
2025-12-04T11:20:45.2072467Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2074046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.2075503Z   if out == self.unknown_value:
2025-12-04T11:20:45.2076454Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2077422Z   warnings.warn(
2025-12-04T11:20:45.2078317Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2079270Z   warnings.warn(
2025-12-04T11:20:45.2079979Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.2080777Z Traceback (most recent call last):
2025-12-04T11:20:45.2081557Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 204, in test_int8_woq_mm_concat_cuda
2025-12-04T11:20:45.2082479Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 3)
2025-12-04T11:20:45.2083313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2084076Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2084896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2085779Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2086390Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2086649Z 
2025-12-04T11:20:45.2086757Z Expected 3 but got 6.
2025-12-04T11:20:45.2087053Z Absolute difference: 3
2025-12-04T11:20:45.2087354Z Relative difference: 1.0
2025-12-04T11:20:45.2087549Z 
2025-12-04T11:20:45.2087781Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2089061Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2090183Z 
2025-12-04T11:20:45.2090455Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2091090Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2091581Z stats [('calls_captured', 36)]
2025-12-04T11:20:45.2092333Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)]
2025-12-04T11:20:45.2093284Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2093758Z graph_break []
2025-12-04T11:20:45.2094126Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2095699Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.2097257Z   if out == self.unknown_value:
2025-12-04T11:20:45.2098206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2099170Z   warnings.warn(
2025-12-04T11:20:45.2100064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2101042Z   warnings.warn(
2025-12-04T11:20:45.2101428Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2101902Z stats [('calls_captured', 36)]
2025-12-04T11:20:45.2102353Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2103261Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)]
2025-12-04T11:20:45.2104033Z graph_break []
2025-12-04T11:20:45.2104404Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2105502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2106469Z   warnings.warn(
2025-12-04T11:20:45.2107350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2108323Z   warnings.warn(
2025-12-04T11:20:45.2108644Z =================================== FAILURES ===================================
2025-12-04T11:20:45.2109483Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.2110279Z Traceback (most recent call last):
2025-12-04T11:20:45.2111073Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 204, in test_int8_woq_mm_concat_cuda
2025-12-04T11:20:45.2111992Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 3)
2025-12-04T11:20:45.2112816Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2113589Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2114509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2115402Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2115868Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2116135Z 
2025-12-04T11:20:45.2116245Z Expected 3 but got 6.
2025-12-04T11:20:45.2116604Z Absolute difference: 3
2025-12-04T11:20:45.2116892Z Relative difference: 1.0
2025-12-04T11:20:45.2117097Z 
2025-12-04T11:20:45.2117311Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2118603Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2119672Z 
2025-12-04T11:20:45.2119952Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2120586Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2121112Z stats [('calls_captured', 36)]
2025-12-04T11:20:45.2121857Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)]
2025-12-04T11:20:45.2122758Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2123223Z graph_break []
2025-12-04T11:20:45.2123600Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2125174Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.2126635Z   if out == self.unknown_value:
2025-12-04T11:20:45.2127571Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2128549Z   warnings.warn(
2025-12-04T11:20:45.2129442Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2130406Z   warnings.warn(
2025-12-04T11:20:45.2130785Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2131270Z stats [('calls_captured', 36)]
2025-12-04T11:20:45.2131718Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2132607Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)]
2025-12-04T11:20:45.2133375Z graph_break []
2025-12-04T11:20:45.2133753Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2134844Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2135801Z   warnings.warn(
2025-12-04T11:20:45.2136777Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2137746Z   warnings.warn(
2025-12-04T11:20:45.2138121Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2138611Z stats [('calls_captured', 36)]
2025-12-04T11:20:45.2139065Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2139963Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)]
2025-12-04T11:20:45.2140723Z graph_break []
2025-12-04T11:20:45.2141187Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2142280Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2143250Z   warnings.warn(
2025-12-04T11:20:45.2144128Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2145133Z   warnings.warn(
2025-12-04T11:20:45.2146144Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-77e37a2f8b75b3d9.xml -
2025-12-04T11:20:45.2147283Z =========================== short test summary info ============================
2025-12-04T11:20:45.2148576Z FAILED [0.8145s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2149712Z 
2025-12-04T11:20:45.2149822Z Expected 3 but got 6.
2025-12-04T11:20:45.2150120Z Absolute difference: 3
2025-12-04T11:20:45.2150413Z Relative difference: 1.0
2025-12-04T11:20:45.2150621Z 
2025-12-04T11:20:45.2150838Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2152142Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2153206Z 
2025-12-04T11:20:45.2153491Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2154074Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.2154605Z ================== 1 failed, 13 deselected, 2 rerun in 22.09s ==================
2025-12-04T11:20:45.2155049Z Got exit code 1
2025-12-04T11:20:45.2155330Z Retrying single test...
2025-12-04T11:20:45.2155958Z W1204 11:00:33.561000 86694 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.2157201Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3ba19b390afd5854.xml
2025-12-04T11:20:45.2158170Z ============================= test session starts ==============================
2025-12-04T11:20:45.2158823Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.2159427Z cachedir: .pytest_cache
2025-12-04T11:20:45.2160142Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.2160930Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.2161274Z configfile: pytest.ini
2025-12-04T11:20:45.2162010Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.2162911Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.2164267Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2165524Z Running 1 items in this shard
2025-12-04T11:20:45.2165754Z 
2025-12-04T11:20:45.2167051Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:00:37.646486466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2168486Z 
2025-12-04T11:20:45.2169075Z [W1204 11:00:53.486280731 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2169738Z 
2025-12-04T11:20:45.2170264Z [W1204 11:00:53.486541899 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2170915Z 
2025-12-04T11:20:45.2171675Z [W1204 11:00:53.487159035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2172395Z 
2025-12-04T11:20:45.2172911Z [W1204 11:00:53.487368813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2173575Z 
2025-12-04T11:20:45.2174090Z [W1204 11:00:53.489227826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2174751Z 
2025-12-04T11:20:45.2175273Z [W1204 11:00:53.489404642 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2175922Z 
2025-12-04T11:20:45.2176594Z [W1204 11:00:53.489726629 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2177244Z 
2025-12-04T11:20:45.2177764Z [W1204 11:00:53.489895277 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2178413Z 
2025-12-04T11:20:45.2178920Z [W1204 11:00:53.500684314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2179581Z 
2025-12-04T11:20:45.2180093Z [W1204 11:00:53.500930971 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2180749Z 
2025-12-04T11:20:45.2181266Z [W1204 11:00:53.501120512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2181916Z 
2025-12-04T11:20:45.2182449Z [W1204 11:00:53.501400885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2183113Z 
2025-12-04T11:20:45.2183643Z [W1204 11:00:53.501572741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2184293Z 
2025-12-04T11:20:45.2184804Z [W1204 11:00:53.501865671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2185466Z 
2025-12-04T11:20:45.2185981Z [W1204 11:00:53.502035585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2186646Z 
2025-12-04T11:20:45.2187162Z [W1204 11:00:53.502317094 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2187814Z 
2025-12-04T11:20:45.2188349Z [W1204 11:00:53.502485434 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2189003Z 
2025-12-04T11:20:45.2189532Z [W1204 11:00:53.624582823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2190182Z 
2025-12-04T11:20:45.2190702Z [W1204 11:00:53.624905435 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2191370Z 
2025-12-04T11:20:45.2191881Z [W1204 11:00:53.625092982 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2192545Z 
2025-12-04T11:20:45.2193061Z [W1204 11:00:53.625387441 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2193728Z 
2025-12-04T11:20:45.2194351Z [W1204 11:00:53.625560299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2195004Z 
2025-12-04T11:20:45.2195533Z [W1204 11:00:53.625849571 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2196179Z 
2025-12-04T11:20:45.2196706Z [W1204 11:00:53.626025173 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2197393Z 
2025-12-04T11:20:45.2197905Z [W1204 11:00:53.626302703 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2198570Z 
2025-12-04T11:20:45.2199088Z [W1204 11:00:53.626469155 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2199748Z 
2025-12-04T11:20:45.2200263Z [W1204 11:00:55.710732486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2200975Z 
2025-12-04T11:20:45.2201499Z [W1204 11:00:55.711955233 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2202150Z 
2025-12-04T11:20:45.2202672Z [W1204 11:00:55.712150608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2203323Z 
2025-12-04T11:20:45.2203835Z [W1204 11:00:55.712437201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2204495Z 
2025-12-04T11:20:45.2205008Z [W1204 11:00:55.712630266 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2205669Z 
2025-12-04T11:20:45.2206184Z [W1204 11:00:55.712930295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2206833Z 
2025-12-04T11:20:45.2207361Z [W1204 11:00:55.713106713 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2208009Z 
2025-12-04T11:20:45.2208528Z [W1204 11:00:55.713385614 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2209177Z 
2025-12-04T11:20:45.2209692Z [W1204 11:00:55.713560130 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2210349Z 
2025-12-04T11:20:45.2210860Z [W1204 11:00:55.721874542 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2211518Z 
2025-12-04T11:20:45.2212035Z [W1204 11:00:55.722126391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2212682Z 
2025-12-04T11:20:45.2213209Z [W1204 11:00:55.722316080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2213853Z 
2025-12-04T11:20:45.2214382Z [W1204 11:00:55.722591300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2215030Z 
2025-12-04T11:20:45.2215540Z [W1204 11:00:55.722767172 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2216202Z 
2025-12-04T11:20:45.2216806Z [W1204 11:00:55.723057582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2217469Z 
2025-12-04T11:20:45.2217985Z [W1204 11:00:55.723232472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2218735Z 
2025-12-04T11:20:45.2219269Z [W1204 11:00:55.723512102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2219919Z 
2025-12-04T11:20:45.2220446Z [W1204 11:00:55.723682292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2221129Z 
2025-12-04T11:20:45.2221647Z [W1204 11:00:55.841636637 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2222309Z 
2025-12-04T11:20:45.2222818Z [W1204 11:00:55.841925179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2223479Z 
2025-12-04T11:20:45.2223991Z [W1204 11:00:55.842113664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2224656Z 
2025-12-04T11:20:45.2225166Z [W1204 11:00:55.842411593 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2225843Z 
2025-12-04T11:20:45.2226369Z [W1204 11:00:55.842592035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2227019Z 
2025-12-04T11:20:45.2227531Z [W1204 11:00:55.842891664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2228195Z 
2025-12-04T11:20:45.2228703Z [W1204 11:00:55.843062327 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2229363Z 
2025-12-04T11:20:45.2229871Z [W1204 11:00:55.843340021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2230536Z 
2025-12-04T11:20:45.2231050Z [W1204 11:00:55.843519682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2231704Z 
2025-12-04T11:20:45.2231851Z ('RERUN', {'yellow': True}) [20.1353s] [100%]
2025-12-04T11:20:45.2233409Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:00:55.294580613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2234851Z 
2025-12-04T11:20:45.2235366Z [W1204 11:00:55.294860416 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2236033Z 
2025-12-04T11:20:45.2236544Z [W1204 11:00:55.295045956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2237202Z 
2025-12-04T11:20:45.2237719Z [W1204 11:00:55.295321906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2238365Z 
2025-12-04T11:20:45.2238894Z [W1204 11:00:55.295498608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2239540Z 
2025-12-04T11:20:45.2240072Z [W1204 11:00:55.295792143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2240716Z 
2025-12-04T11:20:45.2241226Z [W1204 11:00:55.295965329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2241886Z 
2025-12-04T11:20:45.2242398Z [W1204 11:00:55.296242076 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2243059Z 
2025-12-04T11:20:45.2243641Z [W1204 11:00:55.296409163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2244292Z 
2025-12-04T11:20:45.2244816Z [W1204 11:00:55.304752908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2245463Z 
2025-12-04T11:20:45.2245983Z [W1204 11:00:55.304992280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2246661Z 
2025-12-04T11:20:45.2247172Z [W1204 11:00:55.305178289 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2247829Z 
2025-12-04T11:20:45.2248340Z [W1204 11:00:55.305447963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2249003Z 
2025-12-04T11:20:45.2249515Z [W1204 11:00:55.305619612 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2250197Z 
2025-12-04T11:20:45.2250721Z [W1204 11:00:55.305904088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2251367Z 
2025-12-04T11:20:45.2251886Z [W1204 11:00:55.306075103 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2252533Z 
2025-12-04T11:20:45.2253046Z [W1204 11:00:55.306357035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2253708Z 
2025-12-04T11:20:45.2254219Z [W1204 11:00:55.306526971 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2254877Z 
2025-12-04T11:20:45.2255392Z [W1204 11:00:56.425691044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2256041Z 
2025-12-04T11:20:45.2256647Z [W1204 11:00:56.425970995 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2257293Z 
2025-12-04T11:20:45.2257821Z [W1204 11:00:56.426155476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2258473Z 
2025-12-04T11:20:45.2258984Z [W1204 11:00:56.426445978 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2259649Z 
2025-12-04T11:20:45.2260160Z [W1204 11:00:56.426620050 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2260819Z 
2025-12-04T11:20:45.2261339Z [W1204 11:00:56.426911983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2262000Z 
2025-12-04T11:20:45.2262513Z [W1204 11:00:56.427081021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2263160Z 
2025-12-04T11:20:45.2263682Z [W1204 11:00:56.427355773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2264331Z 
2025-12-04T11:20:45.2264853Z [W1204 11:00:56.427520914 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2265501Z 
2025-12-04T11:20:45.2266010Z [W1204 11:00:56.592502746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2266666Z 
2025-12-04T11:20:45.2267258Z [W1204 11:00:56.592809066 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2267922Z 
2025-12-04T11:20:45.2268436Z [W1204 11:00:56.593003371 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2269079Z 
2025-12-04T11:20:45.2269600Z [W1204 11:00:56.593293002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2270287Z 
2025-12-04T11:20:45.2270808Z [W1204 11:00:56.593473274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2271652Z 
2025-12-04T11:20:45.2272162Z [W1204 11:00:56.593768048 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2272823Z 
2025-12-04T11:20:45.2273332Z [W1204 11:00:56.593943222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2273996Z 
2025-12-04T11:20:45.2274509Z [W1204 11:00:56.594223642 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2275233Z 
2025-12-04T11:20:45.2275759Z [W1204 11:00:56.594396788 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2276409Z 
2025-12-04T11:20:45.2276934Z [W1204 11:00:56.602623396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2277579Z 
2025-12-04T11:20:45.2278091Z [W1204 11:00:56.602876355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2278758Z 
2025-12-04T11:20:45.2279273Z [W1204 11:00:56.603066602 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2279940Z 
2025-12-04T11:20:45.2280455Z [W1204 11:00:56.603344763 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2281106Z 
2025-12-04T11:20:45.2281629Z [W1204 11:00:56.603521164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2282275Z 
2025-12-04T11:20:45.2282800Z [W1204 11:00:56.603812686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2283445Z 
2025-12-04T11:20:45.2283956Z [W1204 11:00:56.603986235 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2284614Z 
2025-12-04T11:20:45.2285124Z [W1204 11:00:56.604267437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2285783Z 
2025-12-04T11:20:45.2286299Z [W1204 11:00:56.604441727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2286949Z 
2025-12-04T11:20:45.2287470Z [W1204 11:00:56.722288395 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2288117Z 
2025-12-04T11:20:45.2288639Z [W1204 11:00:56.722578767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2289291Z 
2025-12-04T11:20:45.2289802Z [W1204 11:00:56.722769312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2290461Z 
2025-12-04T11:20:45.2290971Z [W1204 11:00:56.723066328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2328642Z 
2025-12-04T11:20:45.2329511Z [W1204 11:00:56.723244892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2330210Z 
2025-12-04T11:20:45.2330724Z [W1204 11:00:56.723545329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2331389Z 
2025-12-04T11:20:45.2331896Z [W1204 11:00:56.723721865 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2332600Z 
2025-12-04T11:20:45.2333127Z [W1204 11:00:56.724001763 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2333775Z 
2025-12-04T11:20:45.2334290Z [W1204 11:00:56.724171507 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2334941Z 
2025-12-04T11:20:45.2335079Z ('RERUN', {'yellow': True}) [0.8417s] [100%]
2025-12-04T11:20:45.2336769Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:00:56.117593935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2338279Z 
2025-12-04T11:20:45.2338795Z [W1204 11:00:56.117887041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2339455Z 
2025-12-04T11:20:45.2339985Z [W1204 11:00:56.118077247 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2340632Z 
2025-12-04T11:20:45.2341156Z [W1204 11:00:56.118361239 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2341801Z 
2025-12-04T11:20:45.2342316Z [W1204 11:00:56.118543227 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2342984Z 
2025-12-04T11:20:45.2343501Z [W1204 11:00:56.118838898 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2344165Z 
2025-12-04T11:20:45.2344680Z [W1204 11:00:56.119017502 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2345335Z 
2025-12-04T11:20:45.2345863Z [W1204 11:00:56.119299200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2346511Z 
2025-12-04T11:20:45.2347038Z [W1204 11:00:56.119467582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2347688Z 
2025-12-04T11:20:45.2348205Z [W1204 11:00:56.127893405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2348873Z 
2025-12-04T11:20:45.2349384Z [W1204 11:00:56.128140874 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2350049Z 
2025-12-04T11:20:45.2350561Z [W1204 11:00:56.128327445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2351225Z 
2025-12-04T11:20:45.2351736Z [W1204 11:00:56.128610853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2352386Z 
2025-12-04T11:20:45.2352910Z [W1204 11:00:56.128782445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2353555Z 
2025-12-04T11:20:45.2354131Z [W1204 11:00:56.129070081 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2354797Z 
2025-12-04T11:20:45.2355311Z [W1204 11:00:56.129239352 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2355974Z 
2025-12-04T11:20:45.2356483Z [W1204 11:00:56.129517002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2357180Z 
2025-12-04T11:20:45.2357692Z [W1204 11:00:56.129684787 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2358341Z 
2025-12-04T11:20:45.2358864Z [W1204 11:00:56.248126998 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2359516Z 
2025-12-04T11:20:45.2360041Z [W1204 11:00:56.248413205 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2360694Z 
2025-12-04T11:20:45.2361209Z [W1204 11:00:56.248616552 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2361904Z 
2025-12-04T11:20:45.2362418Z [W1204 11:00:56.248913040 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2363081Z 
2025-12-04T11:20:45.2363595Z [W1204 11:00:56.249087532 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2364243Z 
2025-12-04T11:20:45.2364770Z [W1204 11:00:56.249382004 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2365417Z 
2025-12-04T11:20:45.2365944Z [W1204 11:00:56.249554124 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2366592Z 
2025-12-04T11:20:45.2367106Z [W1204 11:00:56.249833089 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2367776Z 
2025-12-04T11:20:45.2368289Z [W1204 11:00:56.250025099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2368951Z 
2025-12-04T11:20:45.2369466Z [W1204 11:00:57.418156039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2370117Z 
2025-12-04T11:20:45.2370641Z [W1204 11:00:57.418454328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2371555Z 
2025-12-04T11:20:45.2372083Z [W1204 11:00:57.418654489 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2372731Z 
2025-12-04T11:20:45.2373250Z [W1204 11:00:57.418943951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2373917Z 
2025-12-04T11:20:45.2374428Z [W1204 11:00:57.419127951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2375088Z 
2025-12-04T11:20:45.2375599Z [W1204 11:00:57.419420935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2376252Z 
2025-12-04T11:20:45.2376850Z [W1204 11:00:57.419595387 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2377501Z 
2025-12-04T11:20:45.2378026Z [W1204 11:00:57.419875396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2378673Z 
2025-12-04T11:20:45.2379306Z [W1204 11:00:57.420072956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2379976Z 
2025-12-04T11:20:45.2380491Z [W1204 11:00:57.428306360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2381149Z 
2025-12-04T11:20:45.2381660Z [W1204 11:00:57.428578954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2382356Z 
2025-12-04T11:20:45.2382884Z [W1204 11:00:57.428774462 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2383540Z 
2025-12-04T11:20:45.2384062Z [W1204 11:00:57.429055240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2384708Z 
2025-12-04T11:20:45.2385226Z [W1204 11:00:57.429231933 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2385937Z 
2025-12-04T11:20:45.2386451Z [W1204 11:00:57.429523725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2387112Z 
2025-12-04T11:20:45.2387621Z [W1204 11:00:57.429699797 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2388290Z 
2025-12-04T11:20:45.2388801Z [W1204 11:00:57.429981137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2389450Z 
2025-12-04T11:20:45.2389977Z [W1204 11:00:57.430188557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2390625Z 
2025-12-04T11:20:45.2391158Z [W1204 11:00:57.549908308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2391813Z 
2025-12-04T11:20:45.2392328Z [W1204 11:00:57.550221782 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2392988Z 
2025-12-04T11:20:45.2393498Z [W1204 11:00:57.550418745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2394162Z 
2025-12-04T11:20:45.2394674Z [W1204 11:00:57.550715201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2395321Z 
2025-12-04T11:20:45.2395847Z [W1204 11:00:57.550891038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2396496Z 
2025-12-04T11:20:45.2397025Z [W1204 11:00:57.551187353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2397675Z 
2025-12-04T11:20:45.2398187Z [W1204 11:00:57.551362311 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2398849Z 
2025-12-04T11:20:45.2399359Z [W1204 11:00:57.551642194 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2400024Z 
2025-12-04T11:20:45.2400535Z [W1204 11:00:57.551812909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2401188Z 
2025-12-04T11:20:45.2401311Z FAILED [0.8255s] [100%]
2025-12-04T11:20:45.2401492Z 
2025-12-04T11:20:45.2401641Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.2402474Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.2403347Z Traceback (most recent call last):
2025-12-04T11:20:45.2404143Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 204, in test_int8_woq_mm_concat_cuda
2025-12-04T11:20:45.2405044Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 3)
2025-12-04T11:20:45.2405892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2406721Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2407561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2408441Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2408924Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2409176Z 
2025-12-04T11:20:45.2409301Z Expected 3 but got 6.
2025-12-04T11:20:45.2409582Z Absolute difference: 3
2025-12-04T11:20:45.2409889Z Relative difference: 1.0
2025-12-04T11:20:45.2410086Z 
2025-12-04T11:20:45.2410316Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2411661Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2412732Z 
2025-12-04T11:20:45.2413006Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2413646Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2414135Z stats [('calls_captured', 36)]
2025-12-04T11:20:45.2414892Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)]
2025-12-04T11:20:45.2415781Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2416261Z graph_break []
2025-12-04T11:20:45.2416744Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2418309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.2419776Z   if out == self.unknown_value:
2025-12-04T11:20:45.2420733Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2421713Z   warnings.warn(
2025-12-04T11:20:45.2422592Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2423565Z   warnings.warn(
2025-12-04T11:20:45.2424274Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.2425079Z Traceback (most recent call last):
2025-12-04T11:20:45.2425857Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 204, in test_int8_woq_mm_concat_cuda
2025-12-04T11:20:45.2426773Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 3)
2025-12-04T11:20:45.2427619Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2428397Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2429230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2430122Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2430593Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2430859Z 
2025-12-04T11:20:45.2430970Z Expected 3 but got 6.
2025-12-04T11:20:45.2431349Z Absolute difference: 3
2025-12-04T11:20:45.2431641Z Relative difference: 1.0
2025-12-04T11:20:45.2431855Z 
2025-12-04T11:20:45.2432072Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2433369Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2434464Z 
2025-12-04T11:20:45.2434749Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2435374Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2435865Z stats [('calls_captured', 36)]
2025-12-04T11:20:45.2436619Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)]
2025-12-04T11:20:45.2437524Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2437992Z graph_break []
2025-12-04T11:20:45.2438416Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2439998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.2441464Z   if out == self.unknown_value:
2025-12-04T11:20:45.2442407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2443387Z   warnings.warn(
2025-12-04T11:20:45.2444284Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2445252Z   warnings.warn(
2025-12-04T11:20:45.2445628Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2446116Z stats [('calls_captured', 36)]
2025-12-04T11:20:45.2446564Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2447445Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)]
2025-12-04T11:20:45.2448215Z graph_break []
2025-12-04T11:20:45.2448591Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2449681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2450637Z   warnings.warn(
2025-12-04T11:20:45.2451527Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2452500Z   warnings.warn(
2025-12-04T11:20:45.2452803Z =================================== FAILURES ===================================
2025-12-04T11:20:45.2453644Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.2454437Z Traceback (most recent call last):
2025-12-04T11:20:45.2455226Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 204, in test_int8_woq_mm_concat_cuda
2025-12-04T11:20:45.2456129Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 3)
2025-12-04T11:20:45.2457059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2457834Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2458672Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2459654Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2460139Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2460391Z 
2025-12-04T11:20:45.2460515Z Expected 3 but got 6.
2025-12-04T11:20:45.2460800Z Absolute difference: 3
2025-12-04T11:20:45.2461099Z Relative difference: 1.0
2025-12-04T11:20:45.2461290Z 
2025-12-04T11:20:45.2461520Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2462858Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2463930Z 
2025-12-04T11:20:45.2464199Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2464830Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2465313Z stats [('calls_captured', 36)]
2025-12-04T11:20:45.2466052Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)]
2025-12-04T11:20:45.2466995Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2467471Z graph_break []
2025-12-04T11:20:45.2467847Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2469408Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.2470869Z   if out == self.unknown_value:
2025-12-04T11:20:45.2472041Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2473013Z   warnings.warn(
2025-12-04T11:20:45.2473894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2474856Z   warnings.warn(
2025-12-04T11:20:45.2475246Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2475726Z stats [('calls_captured', 36)]
2025-12-04T11:20:45.2476163Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2477059Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)]
2025-12-04T11:20:45.2477824Z graph_break []
2025-12-04T11:20:45.2478189Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2479275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2480244Z   warnings.warn(
2025-12-04T11:20:45.2481125Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2482073Z   warnings.warn(
2025-12-04T11:20:45.2482456Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2482938Z stats [('calls_captured', 36)]
2025-12-04T11:20:45.2483375Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2484265Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)]
2025-12-04T11:20:45.2485030Z graph_break []
2025-12-04T11:20:45.2485403Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2486644Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2487621Z   warnings.warn(
2025-12-04T11:20:45.2488505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2489467Z   warnings.warn(
2025-12-04T11:20:45.2490454Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3ba19b390afd5854.xml -
2025-12-04T11:20:45.2491647Z =========================== short test summary info ============================
2025-12-04T11:20:45.2492936Z FAILED [0.8255s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2494025Z 
2025-12-04T11:20:45.2494150Z Expected 3 but got 6.
2025-12-04T11:20:45.2494433Z Absolute difference: 3
2025-12-04T11:20:45.2494734Z Relative difference: 1.0
2025-12-04T11:20:45.2494990Z 
2025-12-04T11:20:45.2495215Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2496572Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2497649Z 
2025-12-04T11:20:45.2497919Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2498508Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.2499043Z ================== 1 failed, 13 deselected, 2 rerun in 21.84s ==================
2025-12-04T11:20:45.2499491Z Got exit code 1
2025-12-04T11:20:45.2500514Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2501906Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:20:45.2502906Z W1204 11:01:08.865000 86869 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.2504155Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4ad317a243ecdd30.xml
2025-12-04T11:20:45.2505135Z ============================= test session starts ==============================
2025-12-04T11:20:45.2505792Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.2506402Z cachedir: .pytest_cache
2025-12-04T11:20:45.2507125Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.2507909Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.2508274Z configfile: pytest.ini
2025-12-04T11:20:45.2509010Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.2509910Z collecting ... collected 58 items / 1 deselected / 57 selected
2025-12-04T11:20:45.2510395Z stepcurrent: skipping 1 already run items.
2025-12-04T11:20:45.2510795Z Running 13 items in this shard
2025-12-04T11:20:45.2511003Z 
2025-12-04T11:20:45.2511889Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.9275s] [  7%]
2025-12-04T11:20:45.2513750Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4960s] [  7%]
2025-12-04T11:20:45.2515634Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 FAILED [0.4879s] [  7%]
2025-12-04T11:20:45.2516567Z 
2025-12-04T11:20:45.2516712Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.2517517Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.2518341Z Traceback (most recent call last):
2025-12-04T11:20:45.2519081Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.2519963Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.2520793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2521570Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2522401Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2523326Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2523804Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2524055Z 
2025-12-04T11:20:45.2524168Z Expected 1 but got 2.
2025-12-04T11:20:45.2524461Z Absolute difference: 1
2025-12-04T11:20:45.2524769Z Relative difference: 1.0
2025-12-04T11:20:45.2524959Z 
2025-12-04T11:20:45.2525184Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2526435Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2527487Z 
2025-12-04T11:20:45.2527758Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2528393Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2528876Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2529975Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2531247Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2531723Z graph_break []
2025-12-04T11:20:45.2532087Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2533176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2534146Z   warnings.warn(
2025-12-04T11:20:45.2535036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2535997Z   warnings.warn(
2025-12-04T11:20:45.2536773Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.2537552Z Traceback (most recent call last):
2025-12-04T11:20:45.2538315Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.2539190Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.2540024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2540795Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2541616Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2542504Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2543072Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2543327Z 
2025-12-04T11:20:45.2543450Z Expected 1 but got 2.
2025-12-04T11:20:45.2543733Z Absolute difference: 1
2025-12-04T11:20:45.2544039Z Relative difference: 1.0
2025-12-04T11:20:45.2544233Z 
2025-12-04T11:20:45.2544467Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2545733Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2546807Z 
2025-12-04T11:20:45.2547075Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2547708Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2548189Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2549283Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2550585Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2551058Z graph_break []
2025-12-04T11:20:45.2551436Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2552525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2553499Z   warnings.warn(
2025-12-04T11:20:45.2554384Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2555350Z   warnings.warn(
2025-12-04T11:20:45.2555720Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2556205Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2556650Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2557891Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2559015Z graph_break []
2025-12-04T11:20:45.2559400Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2560491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2561446Z   warnings.warn(
2025-12-04T11:20:45.2562333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2563297Z   warnings.warn(
2025-12-04T11:20:45.2563616Z =================================== FAILURES ===================================
2025-12-04T11:20:45.2564411Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.2565186Z Traceback (most recent call last):
2025-12-04T11:20:45.2565942Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.2566814Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.2567642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2568404Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2569239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2570180Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2570659Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2570914Z 
2025-12-04T11:20:45.2571244Z Expected 1 but got 2.
2025-12-04T11:20:45.2571547Z Absolute difference: 1
2025-12-04T11:20:45.2571835Z Relative difference: 1.0
2025-12-04T11:20:45.2572042Z 
2025-12-04T11:20:45.2572261Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2573610Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2574651Z 
2025-12-04T11:20:45.2574920Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2575561Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2576052Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2577252Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2578566Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2579045Z graph_break []
2025-12-04T11:20:45.2579430Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2580530Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2581494Z   warnings.warn(
2025-12-04T11:20:45.2582390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2583363Z   warnings.warn(
2025-12-04T11:20:45.2583736Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2584230Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2584685Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2585955Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2587075Z graph_break []
2025-12-04T11:20:45.2587453Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2588539Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2589512Z   warnings.warn(
2025-12-04T11:20:45.2590385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2591355Z   warnings.warn(
2025-12-04T11:20:45.2591735Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2592207Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2592654Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2593908Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2595037Z graph_break []
2025-12-04T11:20:45.2595398Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2596486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2597460Z   warnings.warn(
2025-12-04T11:20:45.2598458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2599412Z   warnings.warn(
2025-12-04T11:20:45.2600430Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4ad317a243ecdd30.xml -
2025-12-04T11:20:45.2601580Z =========================== short test summary info ============================
2025-12-04T11:20:45.2602887Z FAILED [0.4879s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2603949Z 
2025-12-04T11:20:45.2604057Z Expected 1 but got 2.
2025-12-04T11:20:45.2604356Z Absolute difference: 1
2025-12-04T11:20:45.2604658Z Relative difference: 1.0
2025-12-04T11:20:45.2604851Z 
2025-12-04T11:20:45.2605074Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2606332Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2607419Z 
2025-12-04T11:20:45.2607685Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2608281Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.2608797Z =================== 1 failed, 1 deselected, 2 rerun in 4.94s ===================
2025-12-04T11:20:45.2609240Z Got exit code 1
2025-12-04T11:20:45.2609515Z Retrying single test...
2025-12-04T11:20:45.2610153Z W1204 11:01:29.485000 87046 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.2611387Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f482798b2b39d897.xml
2025-12-04T11:20:45.2612357Z ============================= test session starts ==============================
2025-12-04T11:20:45.2613025Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.2613632Z cachedir: .pytest_cache
2025-12-04T11:20:45.2614334Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.2615123Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.2615481Z configfile: pytest.ini
2025-12-04T11:20:45.2616201Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.2617202Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.2618556Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2619793Z Running 1 items in this shard
2025-12-04T11:20:45.2620005Z 
2025-12-04T11:20:45.2621282Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 11:01:35.407570069 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2622714Z 
2025-12-04T11:20:45.2623231Z [W1204 11:01:51.707034883 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2623901Z 
2025-12-04T11:20:45.2624421Z [W1204 11:01:51.707301033 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2625082Z 
2025-12-04T11:20:45.2625711Z [W1204 11:01:51.714839285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2626364Z 
2025-12-04T11:20:45.2626889Z [W1204 11:01:51.715605529 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2627535Z 
2025-12-04T11:20:45.2628060Z [W1204 11:01:51.715807172 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2628739Z 
2025-12-04T11:20:45.2629253Z [W1204 11:01:51.722963564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2629921Z 
2025-12-04T11:20:45.2630432Z [W1204 11:01:51.723807891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2631094Z 
2025-12-04T11:20:45.2631611Z [W1204 11:01:51.723999328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2632292Z 
2025-12-04T11:20:45.2632814Z [W1204 11:01:51.863930710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2633463Z 
2025-12-04T11:20:45.2633986Z [W1204 11:01:51.865728842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2634641Z 
2025-12-04T11:20:45.2635149Z [W1204 11:01:51.865954245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2635808Z 
2025-12-04T11:20:45.2636322Z [W1204 11:01:51.869979635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2636981Z 
2025-12-04T11:20:45.2637497Z [W1204 11:01:51.870677380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2638149Z 
2025-12-04T11:20:45.2638674Z [W1204 11:01:51.870889043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2639325Z 
2025-12-04T11:20:45.2639852Z [W1204 11:01:51.877011000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2640502Z 
2025-12-04T11:20:45.2641015Z [W1204 11:01:51.877669983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2641680Z 
2025-12-04T11:20:45.2642191Z [W1204 11:01:51.877868518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2642848Z 
2025-12-04T11:20:45.2642982Z ('RERUN', {'yellow': True}) [20.2481s] [100%]
2025-12-04T11:20:45.2644531Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 11:01:51.315864040 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2645942Z 
2025-12-04T11:20:45.2646467Z [W1204 11:01:51.316660570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2647126Z 
2025-12-04T11:20:45.2647637Z [W1204 11:01:51.316876389 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2648296Z 
2025-12-04T11:20:45.2648812Z [W1204 11:01:51.321075483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2649481Z 
2025-12-04T11:20:45.2649995Z [W1204 11:01:51.321745384 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2650720Z 
2025-12-04T11:20:45.2651249Z [W1204 11:01:51.321946566 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2651896Z 
2025-12-04T11:20:45.2652423Z [W1204 11:01:51.328122107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2653108Z 
2025-12-04T11:20:45.2653617Z [W1204 11:01:51.328788702 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2654276Z 
2025-12-04T11:20:45.2654787Z [W1204 11:01:51.328983861 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2655446Z 
2025-12-04T11:20:45.2655955Z [W1204 11:01:52.420327224 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2656702Z 
2025-12-04T11:20:45.2657218Z [W1204 11:01:52.421121224 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2657928Z 
2025-12-04T11:20:45.2658454Z [W1204 11:01:52.421334786 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2659104Z 
2025-12-04T11:20:45.2659631Z [W1204 11:01:52.425326827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2660282Z 
2025-12-04T11:20:45.2660791Z [W1204 11:01:52.425981144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2661449Z 
2025-12-04T11:20:45.2661964Z [W1204 11:01:52.426178311 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2662624Z 
2025-12-04T11:20:45.2663141Z [W1204 11:01:52.432313637 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2663801Z 
2025-12-04T11:20:45.2664325Z [W1204 11:01:52.433164404 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2664974Z 
2025-12-04T11:20:45.2665500Z [W1204 11:01:52.433363808 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2666153Z 
2025-12-04T11:20:45.2666286Z ('RERUN', {'yellow': True}) [0.5140s] [100%]
2025-12-04T11:20:45.2667830Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 11:01:52.805435355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2669243Z 
2025-12-04T11:20:45.2669763Z [W1204 11:01:52.806207540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2670420Z 
2025-12-04T11:20:45.2671132Z [W1204 11:01:52.806413604 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2671801Z 
2025-12-04T11:20:45.2672335Z [W1204 11:01:52.810626479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2672989Z 
2025-12-04T11:20:45.2673504Z [W1204 11:01:52.811296141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2674174Z 
2025-12-04T11:20:45.2674687Z [W1204 11:01:52.811495451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2675351Z 
2025-12-04T11:20:45.2676012Z [W1204 11:01:52.817699856 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2676668Z 
2025-12-04T11:20:45.2677199Z [W1204 11:01:52.818343233 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2677847Z 
2025-12-04T11:20:45.2678374Z [W1204 11:01:52.818538282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2679070Z 
2025-12-04T11:20:45.2679583Z [W1204 11:01:52.908621467 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2680253Z 
2025-12-04T11:20:45.2680762Z [W1204 11:01:52.909405090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2681424Z 
2025-12-04T11:20:45.2681940Z [W1204 11:01:52.909621243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2682637Z 
2025-12-04T11:20:45.2683166Z [W1204 11:01:52.913643458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2683817Z 
2025-12-04T11:20:45.2684343Z [W1204 11:01:52.914315629 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2684994Z 
2025-12-04T11:20:45.2685506Z [W1204 11:01:52.914523982 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2686164Z 
2025-12-04T11:20:45.2686673Z [W1204 11:01:52.920627462 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2687334Z 
2025-12-04T11:20:45.2687854Z [W1204 11:01:52.921487344 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2688520Z 
2025-12-04T11:20:45.2689028Z [W1204 11:01:52.921690413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2689678Z 
2025-12-04T11:20:45.2689802Z FAILED [0.4896s] [100%]
2025-12-04T11:20:45.2689983Z 
2025-12-04T11:20:45.2690124Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.2690931Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.2691712Z Traceback (most recent call last):
2025-12-04T11:20:45.2692465Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.2693333Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.2694178Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2694955Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2695778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2696775Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2697261Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2697513Z 
2025-12-04T11:20:45.2697636Z Expected 1 but got 2.
2025-12-04T11:20:45.2697919Z Absolute difference: 1
2025-12-04T11:20:45.2698226Z Relative difference: 1.0
2025-12-04T11:20:45.2698417Z 
2025-12-04T11:20:45.2698645Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2699914Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2700958Z 
2025-12-04T11:20:45.2701314Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2701951Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2702433Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2703531Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2704835Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2705307Z graph_break []
2025-12-04T11:20:45.2705687Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2707240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.2708766Z   if out == self.unknown_value:
2025-12-04T11:20:45.2709718Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2710699Z   warnings.warn(
2025-12-04T11:20:45.2711580Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2712555Z   warnings.warn(
2025-12-04T11:20:45.2713234Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.2714009Z Traceback (most recent call last):
2025-12-04T11:20:45.2714745Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.2715630Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.2716459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2717218Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2718059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2718954Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2719431Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2719683Z 
2025-12-04T11:20:45.2719791Z Expected 1 but got 2.
2025-12-04T11:20:45.2720087Z Absolute difference: 1
2025-12-04T11:20:45.2720385Z Relative difference: 1.0
2025-12-04T11:20:45.2720577Z 
2025-12-04T11:20:45.2720791Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2722061Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2723113Z 
2025-12-04T11:20:45.2723381Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2724014Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2724483Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2725591Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2726852Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2727326Z graph_break []
2025-12-04T11:20:45.2727690Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2729337Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.2730817Z   if out == self.unknown_value:
2025-12-04T11:20:45.2731763Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2732750Z   warnings.warn(
2025-12-04T11:20:45.2733643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2734610Z   warnings.warn(
2025-12-04T11:20:45.2734994Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2735465Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2735922Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2737272Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2738452Z graph_break []
2025-12-04T11:20:45.2738821Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2739919Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2740891Z   warnings.warn(
2025-12-04T11:20:45.2741767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2742737Z   warnings.warn(
2025-12-04T11:20:45.2743054Z =================================== FAILURES ===================================
2025-12-04T11:20:45.2743869Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.2744635Z Traceback (most recent call last):
2025-12-04T11:20:45.2745389Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.2746266Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.2747086Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2747850Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2748690Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2749577Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2750048Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2750315Z 
2025-12-04T11:20:45.2750430Z Expected 1 but got 2.
2025-12-04T11:20:45.2750728Z Absolute difference: 1
2025-12-04T11:20:45.2751022Z Relative difference: 1.0
2025-12-04T11:20:45.2751231Z 
2025-12-04T11:20:45.2751448Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2752715Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2753758Z 
2025-12-04T11:20:45.2754043Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2754668Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2755157Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2756347Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2757606Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2758069Z graph_break []
2025-12-04T11:20:45.2758455Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2760025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.2761530Z   if out == self.unknown_value:
2025-12-04T11:20:45.2762465Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2763434Z   warnings.warn(
2025-12-04T11:20:45.2764329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2765332Z   warnings.warn(
2025-12-04T11:20:45.2765711Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2766190Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2766635Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2767880Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2769019Z graph_break []
2025-12-04T11:20:45.2769393Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2770492Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2771685Z   warnings.warn(
2025-12-04T11:20:45.2772586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2773552Z   warnings.warn(
2025-12-04T11:20:45.2773937Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2774406Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2774861Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2776112Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2777319Z graph_break []
2025-12-04T11:20:45.2777700Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2778797Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2779780Z   warnings.warn(
2025-12-04T11:20:45.2780656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2781619Z   warnings.warn(
2025-12-04T11:20:45.2782628Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f482798b2b39d897.xml -
2025-12-04T11:20:45.2783779Z =========================== short test summary info ============================
2025-12-04T11:20:45.2785026Z FAILED [0.4896s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2786097Z 
2025-12-04T11:20:45.2786206Z Expected 1 but got 2.
2025-12-04T11:20:45.2786661Z Absolute difference: 1
2025-12-04T11:20:45.2786976Z Relative difference: 1.0
2025-12-04T11:20:45.2787172Z 
2025-12-04T11:20:45.2787390Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2788659Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2789740Z 
2025-12-04T11:20:45.2790023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2790619Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.2791136Z ================== 1 failed, 13 deselected, 2 rerun in 21.29s ==================
2025-12-04T11:20:45.2791582Z Got exit code 1
2025-12-04T11:20:45.2791853Z Retrying single test...
2025-12-04T11:20:45.2792483Z W1204 11:02:04.500000 87228 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.2793777Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cbe2514f89eef609.xml
2025-12-04T11:20:45.2794746Z ============================= test session starts ==============================
2025-12-04T11:20:45.2795409Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.2795999Z cachedir: .pytest_cache
2025-12-04T11:20:45.2796711Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.2796838Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.2796966Z configfile: pytest.ini
2025-12-04T11:20:45.2797509Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.2797748Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.2798735Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2798855Z Running 1 items in this shard
2025-12-04T11:20:45.2798860Z 
2025-12-04T11:20:45.2800150Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 11:02:10.475432435 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2800159Z 
2025-12-04T11:20:45.2800679Z [W1204 11:02:25.913062298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2800685Z 
2025-12-04T11:20:45.2801218Z [W1204 11:02:25.913332120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2801226Z 
2025-12-04T11:20:45.2801734Z [W1204 11:02:25.921073822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2801739Z 
2025-12-04T11:20:45.2802264Z [W1204 11:02:25.921834099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2802271Z 
2025-12-04T11:20:45.2802778Z [W1204 11:02:25.922032885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2802784Z 
2025-12-04T11:20:45.2803302Z [W1204 11:02:25.929145907 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2803307Z 
2025-12-04T11:20:45.2803889Z [W1204 11:02:25.929846701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2803897Z 
2025-12-04T11:20:45.2804408Z [W1204 11:02:25.930061895 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2804425Z 
2025-12-04T11:20:45.2804934Z [W1204 11:02:25.069481247 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2804990Z 
2025-12-04T11:20:45.2805496Z [W1204 11:02:25.071301433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2805501Z 
2025-12-04T11:20:45.2806020Z [W1204 11:02:25.071524197 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2806025Z 
2025-12-04T11:20:45.2806534Z [W1204 11:02:25.075580976 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2806539Z 
2025-12-04T11:20:45.2807095Z [W1204 11:02:25.076271302 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2807100Z 
2025-12-04T11:20:45.2807609Z [W1204 11:02:25.076472418 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2807616Z 
2025-12-04T11:20:45.2808141Z [W1204 11:02:25.082714100 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2808146Z 
2025-12-04T11:20:45.2808651Z [W1204 11:02:25.083438610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2808656Z 
2025-12-04T11:20:45.2809177Z [W1204 11:02:25.083637913 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2809187Z 
2025-12-04T11:20:45.2809321Z ('RERUN', {'yellow': True}) [19.4091s] [100%]
2025-12-04T11:20:45.2810595Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 11:02:26.524022184 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2810603Z 
2025-12-04T11:20:45.2811128Z [W1204 11:02:26.524800070 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2811133Z 
2025-12-04T11:20:45.2811644Z [W1204 11:02:26.525001316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2811651Z 
2025-12-04T11:20:45.2812172Z [W1204 11:02:26.529088579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2812181Z 
2025-12-04T11:20:45.2812687Z [W1204 11:02:26.529729386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2812694Z 
2025-12-04T11:20:45.2813210Z [W1204 11:02:26.529923118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2813218Z 
2025-12-04T11:20:45.2813728Z [W1204 11:02:26.536209937 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2813733Z 
2025-12-04T11:20:45.2814257Z [W1204 11:02:26.536875498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2814262Z 
2025-12-04T11:20:45.2814770Z [W1204 11:02:26.537066228 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2814775Z 
2025-12-04T11:20:45.2815343Z [W1204 11:02:26.627409384 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2815364Z 
2025-12-04T11:20:45.2815876Z [W1204 11:02:26.628201914 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2815881Z 
2025-12-04T11:20:45.2816500Z [W1204 11:02:26.628411871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2816506Z 
2025-12-04T11:20:45.2817030Z [W1204 11:02:26.632494374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2817035Z 
2025-12-04T11:20:45.2817540Z [W1204 11:02:26.633189785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2817545Z 
2025-12-04T11:20:45.2818075Z [W1204 11:02:26.633390987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2818114Z 
2025-12-04T11:20:45.2818625Z [W1204 11:02:26.639513848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2818630Z 
2025-12-04T11:20:45.2819151Z [W1204 11:02:26.640389148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2819158Z 
2025-12-04T11:20:45.2819668Z [W1204 11:02:26.640609341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2819673Z 
2025-12-04T11:20:45.2819819Z ('RERUN', {'yellow': True}) [0.5167s] [100%]
2025-12-04T11:20:45.2821103Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 11:02:26.016255890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2821112Z 
2025-12-04T11:20:45.2821622Z [W1204 11:02:26.017039891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2821640Z 
2025-12-04T11:20:45.2822149Z [W1204 11:02:26.017245077 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2822156Z 
2025-12-04T11:20:45.2822669Z [W1204 11:02:26.021500090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2822674Z 
2025-12-04T11:20:45.2823191Z [W1204 11:02:26.022167493 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2823197Z 
2025-12-04T11:20:45.2823707Z [W1204 11:02:26.022362808 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2823715Z 
2025-12-04T11:20:45.2824241Z [W1204 11:02:26.028572834 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2824246Z 
2025-12-04T11:20:45.2824755Z [W1204 11:02:26.029229997 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2824762Z 
2025-12-04T11:20:45.2825283Z [W1204 11:02:26.029419944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2825288Z 
2025-12-04T11:20:45.2825794Z [W1204 11:02:26.120061872 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2825799Z 
2025-12-04T11:20:45.2826377Z [W1204 11:02:26.120868006 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2826385Z 
2025-12-04T11:20:45.2826892Z [W1204 11:02:26.121083580 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2826897Z 
2025-12-04T11:20:45.2827404Z [W1204 11:02:26.125088773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2827440Z 
2025-12-04T11:20:45.2827959Z [W1204 11:02:26.125762105 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2827965Z 
2025-12-04T11:20:45.2828473Z [W1204 11:02:26.125967572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2828478Z 
2025-12-04T11:20:45.2829005Z [W1204 11:02:26.132112530 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2829039Z 
2025-12-04T11:20:45.2829550Z [W1204 11:02:26.132988808 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2829555Z 
2025-12-04T11:20:45.2830085Z [W1204 11:02:26.133186719 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2830092Z 
2025-12-04T11:20:45.2830196Z FAILED [0.4932s] [100%]
2025-12-04T11:20:45.2830201Z 
2025-12-04T11:20:45.2830361Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.2830870Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.2830998Z Traceback (most recent call last):
2025-12-04T11:20:45.2831543Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.2831778Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.2832245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2832429Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2832971Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2833197Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2833334Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2833339Z 
2025-12-04T11:20:45.2833448Z Expected 1 but got 2.
2025-12-04T11:20:45.2833574Z Absolute difference: 1
2025-12-04T11:20:45.2833690Z Relative difference: 1.0
2025-12-04T11:20:45.2833695Z 
2025-12-04T11:20:45.2833911Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2834839Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2834847Z 
2025-12-04T11:20:45.2835118Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2835356Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2835481Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2836375Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2836620Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2836724Z graph_break []
2025-12-04T11:20:45.2836959Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2838233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.2838359Z   if out == self.unknown_value:
2025-12-04T11:20:45.2839102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2839237Z   warnings.warn(
2025-12-04T11:20:45.2839969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2840071Z   warnings.warn(
2025-12-04T11:20:45.2840582Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.2840727Z Traceback (most recent call last):
2025-12-04T11:20:45.2841240Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.2841523Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.2841979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2842149Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2842697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2842905Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2843039Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2843045Z 
2025-12-04T11:20:45.2843165Z Expected 1 but got 2.
2025-12-04T11:20:45.2843275Z Absolute difference: 1
2025-12-04T11:20:45.2843398Z Relative difference: 1.0
2025-12-04T11:20:45.2843403Z 
2025-12-04T11:20:45.2843624Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2844532Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2844539Z 
2025-12-04T11:20:45.2844821Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2845048Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2845180Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2846068Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2846298Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2846411Z graph_break []
2025-12-04T11:20:45.2846632Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2847858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.2847980Z   if out == self.unknown_value:
2025-12-04T11:20:45.2848705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2848822Z   warnings.warn(
2025-12-04T11:20:45.2849543Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2849646Z   warnings.warn(
2025-12-04T11:20:45.2849964Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2850085Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2850331Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2851222Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2851353Z graph_break []
2025-12-04T11:20:45.2851582Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2852309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2852426Z   warnings.warn(
2025-12-04T11:20:45.2853145Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2853252Z   warnings.warn(
2025-12-04T11:20:45.2853449Z =================================== FAILURES ===================================
2025-12-04T11:20:45.2853960Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.2854087Z Traceback (most recent call last):
2025-12-04T11:20:45.2854613Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.2854845Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.2855316Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2855482Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2856017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2856244Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2856463Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2856471Z 
2025-12-04T11:20:45.2856592Z Expected 1 but got 2.
2025-12-04T11:20:45.2856705Z Absolute difference: 1
2025-12-04T11:20:45.2856815Z Relative difference: 1.0
2025-12-04T11:20:45.2856821Z 
2025-12-04T11:20:45.2857048Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2857959Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2857964Z 
2025-12-04T11:20:45.2858246Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2858466Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2858583Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2859491Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2859721Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2859823Z graph_break []
2025-12-04T11:20:45.2860057Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2861262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.2861393Z   if out == self.unknown_value:
2025-12-04T11:20:45.2862117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2862296Z   warnings.warn(
2025-12-04T11:20:45.2863036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2863138Z   warnings.warn(
2025-12-04T11:20:45.2863369Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2863520Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2863748Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2864646Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2864748Z graph_break []
2025-12-04T11:20:45.2864965Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2865713Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2865849Z   warnings.warn(
2025-12-04T11:20:45.2866579Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2866682Z   warnings.warn(
2025-12-04T11:20:45.2866898Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2867027Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2867255Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2868155Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2868256Z graph_break []
2025-12-04T11:20:45.2868479Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2869215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2869317Z   warnings.warn(
2025-12-04T11:20:45.2870031Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2870149Z   warnings.warn(
2025-12-04T11:20:45.2871174Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cbe2514f89eef609.xml -
2025-12-04T11:20:45.2871372Z =========================== short test summary info ============================
2025-12-04T11:20:45.2872325Z FAILED [0.4932s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2872333Z 
2025-12-04T11:20:45.2872459Z Expected 1 but got 2.
2025-12-04T11:20:45.2872569Z Absolute difference: 1
2025-12-04T11:20:45.2872682Z Relative difference: 1.0
2025-12-04T11:20:45.2872687Z 
2025-12-04T11:20:45.2872921Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2873834Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2873840Z 
2025-12-04T11:20:45.2874124Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2874304Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.2874504Z ================== 1 failed, 13 deselected, 2 rerun in 20.45s ==================
2025-12-04T11:20:45.2874741Z Got exit code 1
2025-12-04T11:20:45.2875569Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.2875982Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:20:45.2876494Z W1204 11:02:38.585000 87410 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.2877151Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3707d31910126ebf.xml
2025-12-04T11:20:45.2877333Z ============================= test session starts ==============================
2025-12-04T11:20:45.2877685Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.2877798Z cachedir: .pytest_cache
2025-12-04T11:20:45.2878341Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.2878514Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.2878641Z configfile: pytest.ini
2025-12-04T11:20:45.2879180Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.2879399Z collecting ... collected 58 items / 2 deselected / 56 selected
2025-12-04T11:20:45.2879556Z stepcurrent: skipping 2 already run items.
2025-12-04T11:20:45.2879674Z Running 12 items in this shard
2025-12-04T11:20:45.2879680Z 
2025-12-04T11:20:45.2880903Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 W1204 11:02:44.250000 87410 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.2881054Z ('RERUN', {'yellow': True}) [4.0054s] [  8%]
2025-12-04T11:20:45.2881918Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5437s] [  8%]
2025-12-04T11:20:45.2882708Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.5503s] [  8%]
2025-12-04T11:20:45.2882716Z 
2025-12-04T11:20:45.2882859Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.2883382Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.2883508Z Traceback (most recent call last):
2025-12-04T11:20:45.2884018Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.2884269Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.2884732Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2884898Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2885451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2885665Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2885814Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2885820Z 
2025-12-04T11:20:45.2885929Z Expected 1 but got 0.
2025-12-04T11:20:45.2886042Z Absolute difference: 1
2025-12-04T11:20:45.2886172Z Relative difference: 1.0
2025-12-04T11:20:45.2886177Z 
2025-12-04T11:20:45.2886395Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2887381Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.2887390Z 
2025-12-04T11:20:45.2887665Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2887886Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2888048Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2888748Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2888989Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2889090Z graph_break []
2025-12-04T11:20:45.2889215Z aten_mm_info [('aten.mm_24_72_1024', 2)]
2025-12-04T11:20:45.2889452Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2890189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2890332Z   warnings.warn(
2025-12-04T11:20:45.2891064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2891171Z   warnings.warn(
2025-12-04T11:20:45.2891694Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.2891819Z Traceback (most recent call last):
2025-12-04T11:20:45.2892331Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.2892579Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.2893044Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2893210Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2893761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2893970Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2894115Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2894123Z 
2025-12-04T11:20:45.2894230Z Expected 1 but got 0.
2025-12-04T11:20:45.2894338Z Absolute difference: 1
2025-12-04T11:20:45.2894464Z Relative difference: 1.0
2025-12-04T11:20:45.2894469Z 
2025-12-04T11:20:45.2894686Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2895608Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.2895614Z 
2025-12-04T11:20:45.2895888Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2896110Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2896245Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2897020Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2897269Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2897371Z graph_break []
2025-12-04T11:20:45.2897494Z aten_mm_info [('aten.mm_24_72_1024', 2)]
2025-12-04T11:20:45.2897731Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2898462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2898567Z   warnings.warn(
2025-12-04T11:20:45.2900261Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2900381Z   warnings.warn(
2025-12-04T11:20:45.2900619Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2900741Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2901008Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2901717Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2901821Z graph_break []
2025-12-04T11:20:45.2901946Z aten_mm_info [('aten.mm_24_72_1024', 2)]
2025-12-04T11:20:45.2902178Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2902907Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2903069Z   warnings.warn(
2025-12-04T11:20:45.2903784Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2903889Z   warnings.warn(
2025-12-04T11:20:45.2904058Z =================================== FAILURES ===================================
2025-12-04T11:20:45.2904571Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.2904711Z Traceback (most recent call last):
2025-12-04T11:20:45.2905222Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.2905458Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.2905939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2906107Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2906642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2906869Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2907005Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2907011Z 
2025-12-04T11:20:45.2907131Z Expected 1 but got 0.
2025-12-04T11:20:45.2907242Z Absolute difference: 1
2025-12-04T11:20:45.2907354Z Relative difference: 1.0
2025-12-04T11:20:45.2907361Z 
2025-12-04T11:20:45.2907590Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2908505Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.2908511Z 
2025-12-04T11:20:45.2908799Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2909020Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2909141Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2909849Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2910081Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2910181Z graph_break []
2025-12-04T11:20:45.2910316Z aten_mm_info [('aten.mm_24_72_1024', 2)]
2025-12-04T11:20:45.2910532Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2911340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2911444Z   warnings.warn(
2025-12-04T11:20:45.2912171Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2912291Z   warnings.warn(
2025-12-04T11:20:45.2912508Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2912658Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2912904Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2913596Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2913713Z graph_break []
2025-12-04T11:20:45.2913835Z aten_mm_info [('aten.mm_24_72_1024', 2)]
2025-12-04T11:20:45.2914052Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2914793Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2914928Z   warnings.warn(
2025-12-04T11:20:45.2915658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2915763Z   warnings.warn(
2025-12-04T11:20:45.2915980Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2916113Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2916341Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2917042Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2917159Z graph_break []
2025-12-04T11:20:45.2917286Z aten_mm_info [('aten.mm_24_72_1024', 2)]
2025-12-04T11:20:45.2917515Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2918241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2918344Z   warnings.warn(
2025-12-04T11:20:45.2919080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2919181Z   warnings.warn(
2025-12-04T11:20:45.2920016Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3707d31910126ebf.xml -
2025-12-04T11:20:45.2920207Z =========================== short test summary info ============================
2025-12-04T11:20:45.2921158Z FAILED [0.5503s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2921166Z 
2025-12-04T11:20:45.2921286Z Expected 1 but got 0.
2025-12-04T11:20:45.2921395Z Absolute difference: 1
2025-12-04T11:20:45.2921506Z Relative difference: 1.0
2025-12-04T11:20:45.2921524Z 
2025-12-04T11:20:45.2921746Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2922649Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.2922655Z 
2025-12-04T11:20:45.2922936Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2923116Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.2923377Z =================== 1 failed, 2 deselected, 2 rerun in 5.13s ===================
2025-12-04T11:20:45.2923496Z Got exit code 1
2025-12-04T11:20:45.2923606Z Retrying single test...
2025-12-04T11:20:45.2924065Z W1204 11:02:58.886000 87587 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.2924733Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dedaec5daecec784.xml
2025-12-04T11:20:45.2924934Z ============================= test session starts ==============================
2025-12-04T11:20:45.2925299Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.2925410Z cachedir: .pytest_cache
2025-12-04T11:20:45.2925943Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.2926072Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.2926187Z configfile: pytest.ini
2025-12-04T11:20:45.2926742Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.2927013Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.2927999Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.2928133Z Running 1 items in this shard
2025-12-04T11:20:45.2928139Z 
2025-12-04T11:20:45.2929415Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:03:04.867016663 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2929422Z 
2025-12-04T11:20:45.2929956Z [W1204 11:03:20.684596446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2929964Z 
2025-12-04T11:20:45.2930478Z [W1204 11:03:20.684856249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2930484Z 
2025-12-04T11:20:45.2931007Z [W1204 11:03:20.693826002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2931015Z 
2025-12-04T11:20:45.2931522Z [W1204 11:03:20.694540265 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2931527Z 
2025-12-04T11:20:45.2932045Z [W1204 11:03:20.694731773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2932050Z 
2025-12-04T11:20:45.2932561Z [W1204 11:03:20.703014190 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2932568Z 
2025-12-04T11:20:45.2933087Z [W1204 11:03:20.703663002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2933092Z 
2025-12-04T11:20:45.2933602Z [W1204 11:03:20.703852153 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2933610Z 
2025-12-04T11:20:45.2934075Z W1204 11:03:20.425000 87587 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.2934598Z [W1204 11:03:20.901339401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2934603Z 
2025-12-04T11:20:45.2935183Z [W1204 11:03:20.903131457 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2935189Z 
2025-12-04T11:20:45.2935714Z [W1204 11:03:20.903352137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2935719Z 
2025-12-04T11:20:45.2936229Z [W1204 11:03:20.908778546 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2936264Z 
2025-12-04T11:20:45.2936872Z [W1204 11:03:20.909471360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2936879Z 
2025-12-04T11:20:45.2937387Z [W1204 11:03:20.909683967 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2937392Z 
2025-12-04T11:20:45.2937915Z [W1204 11:03:20.917269454 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2937926Z 
2025-12-04T11:20:45.2938432Z [W1204 11:03:20.917986983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2938497Z 
2025-12-04T11:20:45.2939006Z [W1204 11:03:20.918199072 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2939029Z 
2025-12-04T11:20:45.2939165Z ('RERUN', {'yellow': True}) [19.8774s] [100%]
2025-12-04T11:20:45.2940437Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:03:20.376636447 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2940443Z 
2025-12-04T11:20:45.2940971Z [W1204 11:03:20.377396283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2940981Z 
2025-12-04T11:20:45.2941490Z [W1204 11:03:20.377607821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2941498Z 
2025-12-04T11:20:45.2942016Z [W1204 11:03:20.383054833 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2942023Z 
2025-12-04T11:20:45.2942535Z [W1204 11:03:20.383716494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2942539Z 
2025-12-04T11:20:45.2943062Z [W1204 11:03:20.383913795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2943067Z 
2025-12-04T11:20:45.2943573Z [W1204 11:03:21.391417901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2943577Z 
2025-12-04T11:20:45.2944092Z [W1204 11:03:21.392070147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2944113Z 
2025-12-04T11:20:45.2944623Z [W1204 11:03:21.392262362 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2944628Z 
2025-12-04T11:20:45.2945139Z [W1204 11:03:21.504280689 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2945144Z 
2025-12-04T11:20:45.2945664Z [W1204 11:03:21.505075056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2945670Z 
2025-12-04T11:20:45.2946178Z [W1204 11:03:21.505287256 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2946184Z 
2025-12-04T11:20:45.2946769Z [W1204 11:03:21.510671171 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2946777Z 
2025-12-04T11:20:45.2947288Z [W1204 11:03:21.511336982 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2947293Z 
2025-12-04T11:20:45.2947814Z [W1204 11:03:21.511535560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2947853Z 
2025-12-04T11:20:45.2948362Z [W1204 11:03:21.519093013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2948367Z 
2025-12-04T11:20:45.2948889Z [W1204 11:03:21.519734683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2948894Z 
2025-12-04T11:20:45.2949407Z [W1204 11:03:21.519931880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2949446Z 
2025-12-04T11:20:45.2949579Z ('RERUN', {'yellow': True}) [0.5614s] [100%]
2025-12-04T11:20:45.2950858Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:03:21.915511863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2950867Z 
2025-12-04T11:20:45.2951380Z [W1204 11:03:21.916272016 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2951385Z 
2025-12-04T11:20:45.2951907Z [W1204 11:03:21.916472848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2951912Z 
2025-12-04T11:20:45.2952426Z [W1204 11:03:21.921954581 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2952432Z 
2025-12-04T11:20:45.2952950Z [W1204 11:03:21.922587647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2952955Z 
2025-12-04T11:20:45.2953466Z [W1204 11:03:21.922779947 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2953474Z 
2025-12-04T11:20:45.2953994Z [W1204 11:03:21.930226060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2953999Z 
2025-12-04T11:20:45.2954509Z [W1204 11:03:21.930868017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2954514Z 
2025-12-04T11:20:45.2955040Z [W1204 11:03:21.931057118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2955047Z 
2025-12-04T11:20:45.2955554Z [W1204 11:03:21.047529530 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2955559Z 
2025-12-04T11:20:45.2956066Z [W1204 11:03:21.048299181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2956073Z 
2025-12-04T11:20:45.2956591Z [W1204 11:03:21.048502969 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2956596Z 
2025-12-04T11:20:45.2957106Z [W1204 11:03:21.054095047 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2957111Z 
2025-12-04T11:20:45.2957693Z [W1204 11:03:21.054753017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2957702Z 
2025-12-04T11:20:45.2958212Z [W1204 11:03:21.054950545 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2958217Z 
2025-12-04T11:20:45.2958736Z [W1204 11:03:21.062502445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2958770Z 
2025-12-04T11:20:45.2959278Z [W1204 11:03:21.063162059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2959283Z 
2025-12-04T11:20:45.2959801Z [W1204 11:03:21.063361165 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.2959806Z 
2025-12-04T11:20:45.2959910Z FAILED [0.5428s] [100%]
2025-12-04T11:20:45.2959916Z 
2025-12-04T11:20:45.2960070Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.2960627Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.2960753Z Traceback (most recent call last):
2025-12-04T11:20:45.2961284Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.2961520Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.2961987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2962168Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2962707Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2962916Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2963068Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2963075Z 
2025-12-04T11:20:45.2963181Z Expected 1 but got 0.
2025-12-04T11:20:45.2963300Z Absolute difference: 1
2025-12-04T11:20:45.2963412Z Relative difference: 1.0
2025-12-04T11:20:45.2963417Z 
2025-12-04T11:20:45.2963634Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2964563Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.2964571Z 
2025-12-04T11:20:45.2964843Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2965077Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2965194Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2965901Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2966148Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2966249Z graph_break []
2025-12-04T11:20:45.2966372Z aten_mm_info [('aten.mm_24_72_1024', 2)]
2025-12-04T11:20:45.2966607Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2967822Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.2967966Z   if out == self.unknown_value:
2025-12-04T11:20:45.2968693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2968798Z   warnings.warn(
2025-12-04T11:20:45.2969594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2969704Z   warnings.warn(
2025-12-04T11:20:45.2970229Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.2970356Z Traceback (most recent call last):
2025-12-04T11:20:45.2970900Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.2971418Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.2971882Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2972066Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2972609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2972818Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2973052Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2973058Z 
2025-12-04T11:20:45.2973166Z Expected 1 but got 0.
2025-12-04T11:20:45.2973277Z Absolute difference: 1
2025-12-04T11:20:45.2973406Z Relative difference: 1.0
2025-12-04T11:20:45.2973413Z 
2025-12-04T11:20:45.2973634Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2974558Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.2974563Z 
2025-12-04T11:20:45.2974835Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2975058Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2975198Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2975899Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2976149Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2976253Z graph_break []
2025-12-04T11:20:45.2976444Z aten_mm_info [('aten.mm_24_72_1024', 2)]
2025-12-04T11:20:45.2976683Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2977889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.2978024Z   if out == self.unknown_value:
2025-12-04T11:20:45.2978754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2978859Z   warnings.warn(
2025-12-04T11:20:45.2979597Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2979703Z   warnings.warn(
2025-12-04T11:20:45.2979925Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2980065Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2980296Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2981004Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2981107Z graph_break []
2025-12-04T11:20:45.2981229Z aten_mm_info [('aten.mm_24_72_1024', 2)]
2025-12-04T11:20:45.2981570Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2982292Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2982397Z   warnings.warn(
2025-12-04T11:20:45.2983125Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2983287Z   warnings.warn(
2025-12-04T11:20:45.2983448Z =================================== FAILURES ===================================
2025-12-04T11:20:45.2983959Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.2984082Z Traceback (most recent call last):
2025-12-04T11:20:45.2984606Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.2984845Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.2985346Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.2985512Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.2986050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.2986276Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.2986409Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.2986415Z 
2025-12-04T11:20:45.2986522Z Expected 1 but got 0.
2025-12-04T11:20:45.2986646Z Absolute difference: 1
2025-12-04T11:20:45.2986757Z Relative difference: 1.0
2025-12-04T11:20:45.2986763Z 
2025-12-04T11:20:45.2986995Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.2987910Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.2987918Z 
2025-12-04T11:20:45.2988188Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.2988422Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2988539Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2989250Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2989481Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2989582Z graph_break []
2025-12-04T11:20:45.2989720Z aten_mm_info [('aten.mm_24_72_1024', 2)]
2025-12-04T11:20:45.2989939Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2991148Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.2991282Z   if out == self.unknown_value:
2025-12-04T11:20:45.2992002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2992120Z   warnings.warn(
2025-12-04T11:20:45.2992839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2992942Z   warnings.warn(
2025-12-04T11:20:45.2993174Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2993290Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2993536Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2994304Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2994411Z graph_break []
2025-12-04T11:20:45.2994547Z aten_mm_info [('aten.mm_24_72_1024', 2)]
2025-12-04T11:20:45.2994762Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2995515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2995629Z   warnings.warn(
2025-12-04T11:20:45.2996347Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2996462Z   warnings.warn(
2025-12-04T11:20:45.2996677Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.2996798Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.2997075Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.2997770Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.2997883Z graph_break []
2025-12-04T11:20:45.2998008Z aten_mm_info [('aten.mm_24_72_1024', 2)]
2025-12-04T11:20:45.2998223Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.2998960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2999064Z   warnings.warn(
2025-12-04T11:20:45.2999777Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.2999899Z   warnings.warn(
2025-12-04T11:20:45.3000746Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dedaec5daecec784.xml -
2025-12-04T11:20:45.3000935Z =========================== short test summary info ============================
2025-12-04T11:20:45.3001882Z FAILED [0.5428s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3001890Z 
2025-12-04T11:20:45.3001997Z Expected 1 but got 0.
2025-12-04T11:20:45.3002120Z Absolute difference: 1
2025-12-04T11:20:45.3002234Z Relative difference: 1.0
2025-12-04T11:20:45.3002239Z 
2025-12-04T11:20:45.3002466Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3003375Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3003383Z 
2025-12-04T11:20:45.3003651Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3003847Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.3004049Z ================== 1 failed, 13 deselected, 2 rerun in 21.01s ==================
2025-12-04T11:20:45.3004166Z Got exit code 1
2025-12-04T11:20:45.3004272Z Retrying single test...
2025-12-04T11:20:45.3004718Z W1204 11:03:33.460000 87769 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.3005390Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2f4f0e9c4ac682e4.xml
2025-12-04T11:20:45.3005623Z ============================= test session starts ==============================
2025-12-04T11:20:45.3005977Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.3006105Z cachedir: .pytest_cache
2025-12-04T11:20:45.3006623Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.3006761Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.3006904Z configfile: pytest.ini
2025-12-04T11:20:45.3007445Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.3007675Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.3008664Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3008799Z Running 1 items in this shard
2025-12-04T11:20:45.3008805Z 
2025-12-04T11:20:45.3010112Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:03:39.443560793 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3010118Z 
2025-12-04T11:20:45.3010637Z [W1204 11:03:55.685063911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3010654Z 
2025-12-04T11:20:45.3011166Z [W1204 11:03:55.685333822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3011171Z 
2025-12-04T11:20:45.3011682Z [W1204 11:03:55.694546155 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3011687Z 
2025-12-04T11:20:45.3012215Z [W1204 11:03:55.695305986 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3012223Z 
2025-12-04T11:20:45.3012732Z [W1204 11:03:55.695499539 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3012737Z 
2025-12-04T11:20:45.3013264Z [W1204 11:03:55.704021790 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3013270Z 
2025-12-04T11:20:45.3013780Z [W1204 11:03:55.704725026 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3013785Z 
2025-12-04T11:20:45.3014302Z [W1204 11:03:55.704911622 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3014307Z 
2025-12-04T11:20:45.3014773Z W1204 11:03:55.428000 87769 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.3015297Z [W1204 11:03:55.907822037 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3015302Z 
2025-12-04T11:20:45.3015814Z [W1204 11:03:55.909628467 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3015821Z 
2025-12-04T11:20:45.3016403Z [W1204 11:03:55.909846753 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3016410Z 
2025-12-04T11:20:45.3016929Z [W1204 11:03:55.915407062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3016934Z 
2025-12-04T11:20:45.3017509Z [W1204 11:03:55.916141120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3017518Z 
2025-12-04T11:20:45.3018043Z [W1204 11:03:55.916349107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3018049Z 
2025-12-04T11:20:45.3018554Z [W1204 11:03:55.924069211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3018588Z 
2025-12-04T11:20:45.3019107Z [W1204 11:03:55.924775219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3019112Z 
2025-12-04T11:20:45.3019619Z [W1204 11:03:55.924972639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3019625Z 
2025-12-04T11:20:45.3019774Z ('RERUN', {'yellow': True}) [20.3057s] [100%]
2025-12-04T11:20:45.3021053Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:03:56.392670584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3021088Z 
2025-12-04T11:20:45.3021598Z [W1204 11:03:56.393484189 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3021620Z 
2025-12-04T11:20:45.3022130Z [W1204 11:03:56.393717535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3022135Z 
2025-12-04T11:20:45.3022643Z [W1204 11:03:56.399406441 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3022649Z 
2025-12-04T11:20:45.3023176Z [W1204 11:03:56.400144655 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3023181Z 
2025-12-04T11:20:45.3023694Z [W1204 11:03:56.400357246 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3023699Z 
2025-12-04T11:20:45.3024219Z [W1204 11:03:56.408172706 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3024226Z 
2025-12-04T11:20:45.3024733Z [W1204 11:03:56.408878116 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3024738Z 
2025-12-04T11:20:45.3025262Z [W1204 11:03:56.409075174 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3025267Z 
2025-12-04T11:20:45.3025772Z [W1204 11:03:56.523846091 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3025781Z 
2025-12-04T11:20:45.3026304Z [W1204 11:03:56.524667495 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3026311Z 
2025-12-04T11:20:45.3026821Z [W1204 11:03:56.524883891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3026828Z 
2025-12-04T11:20:45.3027336Z [W1204 11:03:56.530437729 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3027353Z 
2025-12-04T11:20:45.3027863Z [W1204 11:03:56.531165776 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3027868Z 
2025-12-04T11:20:45.3028376Z [W1204 11:03:56.531373305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3028381Z 
2025-12-04T11:20:45.3028988Z [W1204 11:03:56.539267769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3028996Z 
2025-12-04T11:20:45.3029511Z [W1204 11:03:56.539964477 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3029516Z 
2025-12-04T11:20:45.3030067Z [W1204 11:03:56.540189222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3030073Z 
2025-12-04T11:20:45.3030205Z ('RERUN', {'yellow': True}) [0.5748s] [100%]
2025-12-04T11:20:45.3031493Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:03:56.937947528 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3031498Z 
2025-12-04T11:20:45.3032013Z [W1204 11:03:56.938761270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3032048Z 
2025-12-04T11:20:45.3032557Z [W1204 11:03:56.938977802 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3032576Z 
2025-12-04T11:20:45.3033090Z [W1204 11:03:56.944681370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3033095Z 
2025-12-04T11:20:45.3033605Z [W1204 11:03:56.945403725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3033610Z 
2025-12-04T11:20:45.3034130Z [W1204 11:03:56.945610005 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3034135Z 
2025-12-04T11:20:45.3034648Z [W1204 11:03:56.953419964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3034655Z 
2025-12-04T11:20:45.3035177Z [W1204 11:03:56.954122726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3035182Z 
2025-12-04T11:20:45.3035693Z [W1204 11:03:56.954326995 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3035701Z 
2025-12-04T11:20:45.3036220Z [W1204 11:03:56.073975858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3036225Z 
2025-12-04T11:20:45.3036730Z [W1204 11:03:56.074795984 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3036735Z 
2025-12-04T11:20:45.3037252Z [W1204 11:03:56.075011810 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3037261Z 
2025-12-04T11:20:45.3060517Z [W1204 11:03:56.080807175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3060526Z 
2025-12-04T11:20:45.3061042Z [W1204 11:03:56.081536538 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3061061Z 
2025-12-04T11:20:45.3061577Z [W1204 11:03:56.081743133 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3061587Z 
2025-12-04T11:20:45.3062096Z [W1204 11:03:56.089449771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3062101Z 
2025-12-04T11:20:45.3062807Z [W1204 11:03:56.090178144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3062817Z 
2025-12-04T11:20:45.3063324Z [W1204 11:03:56.090390316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3063329Z 
2025-12-04T11:20:45.3063445Z FAILED [0.5487s] [100%]
2025-12-04T11:20:45.3063451Z 
2025-12-04T11:20:45.3063638Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.3064150Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.3064290Z Traceback (most recent call last):
2025-12-04T11:20:45.3064804Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3065031Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3065510Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3065718Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3066266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3066471Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3066607Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3066613Z 
2025-12-04T11:20:45.3066729Z Expected 1 but got 0.
2025-12-04T11:20:45.3066835Z Absolute difference: 1
2025-12-04T11:20:45.3066956Z Relative difference: 1.0
2025-12-04T11:20:45.3066962Z 
2025-12-04T11:20:45.3067176Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3068087Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3068094Z 
2025-12-04T11:20:45.3068378Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3068601Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3068729Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3069429Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3069659Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3069774Z graph_break []
2025-12-04T11:20:45.3069892Z aten_mm_info [('aten.mm_24_72_1024', 2)]
2025-12-04T11:20:45.3070110Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3071693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3071813Z   if out == self.unknown_value:
2025-12-04T11:20:45.3072549Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3072652Z   warnings.warn(
2025-12-04T11:20:45.3073373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3073492Z   warnings.warn(
2025-12-04T11:20:45.3074002Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.3074138Z Traceback (most recent call last):
2025-12-04T11:20:45.3074640Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3074993Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3075470Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3075631Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3076160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3076422Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3076552Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3076558Z 
2025-12-04T11:20:45.3076679Z Expected 1 but got 0.
2025-12-04T11:20:45.3076784Z Absolute difference: 1
2025-12-04T11:20:45.3076891Z Relative difference: 1.0
2025-12-04T11:20:45.3076897Z 
2025-12-04T11:20:45.3077121Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3078028Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3078090Z 
2025-12-04T11:20:45.3078368Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3078588Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3078709Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3079420Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3079646Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3079741Z graph_break []
2025-12-04T11:20:45.3079879Z aten_mm_info [('aten.mm_24_72_1024', 2)]
2025-12-04T11:20:45.3080092Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3081325Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3081443Z   if out == self.unknown_value:
2025-12-04T11:20:45.3082161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3082278Z   warnings.warn(
2025-12-04T11:20:45.3082989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3083096Z   warnings.warn(
2025-12-04T11:20:45.3083311Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3083425Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3083669Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3084358Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3084454Z graph_break []
2025-12-04T11:20:45.3084585Z aten_mm_info [('aten.mm_24_72_1024', 2)]
2025-12-04T11:20:45.3084797Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3085531Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3085628Z   warnings.warn(
2025-12-04T11:20:45.3086334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3086443Z   warnings.warn(
2025-12-04T11:20:45.3086669Z =================================== FAILURES ===================================
2025-12-04T11:20:45.3087188Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.3087309Z Traceback (most recent call last):
2025-12-04T11:20:45.3087813Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3088103Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3088563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3088727Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3089269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3089472Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3089618Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3089624Z 
2025-12-04T11:20:45.3089760Z Expected 1 but got 0.
2025-12-04T11:20:45.3089865Z Absolute difference: 1
2025-12-04T11:20:45.3089988Z Relative difference: 1.0
2025-12-04T11:20:45.3089993Z 
2025-12-04T11:20:45.3090206Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3091124Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3091133Z 
2025-12-04T11:20:45.3091400Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3091617Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3091743Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3092440Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3092679Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3092776Z graph_break []
2025-12-04T11:20:45.3092896Z aten_mm_info [('aten.mm_24_72_1024', 2)]
2025-12-04T11:20:45.3093123Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3094332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3094448Z   if out == self.unknown_value:
2025-12-04T11:20:45.3095188Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3095287Z   warnings.warn(
2025-12-04T11:20:45.3096019Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3096121Z   warnings.warn(
2025-12-04T11:20:45.3096413Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3096547Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3096771Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3097463Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3097578Z graph_break []
2025-12-04T11:20:45.3097698Z aten_mm_info [('aten.mm_24_72_1024', 2)]
2025-12-04T11:20:45.3097926Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3098715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3098818Z   warnings.warn(
2025-12-04T11:20:45.3099546Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3099645Z   warnings.warn(
2025-12-04T11:20:45.3099870Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3100018Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3100242Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3100946Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3101041Z graph_break []
2025-12-04T11:20:45.3101165Z aten_mm_info [('aten.mm_24_72_1024', 2)]
2025-12-04T11:20:45.3101394Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3102111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3102250Z   warnings.warn(
2025-12-04T11:20:45.3102965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3103066Z   warnings.warn(
2025-12-04T11:20:45.3103915Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2f4f0e9c4ac682e4.xml -
2025-12-04T11:20:45.3104089Z =========================== short test summary info ============================
2025-12-04T11:20:45.3105051Z FAILED [0.5487s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3105059Z 
2025-12-04T11:20:45.3105167Z Expected 1 but got 0.
2025-12-04T11:20:45.3105274Z Absolute difference: 1
2025-12-04T11:20:45.3105392Z Relative difference: 1.0
2025-12-04T11:20:45.3105397Z 
2025-12-04T11:20:45.3105608Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3106517Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3106525Z 
2025-12-04T11:20:45.3106793Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3106969Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.3107178Z ================== 1 failed, 13 deselected, 2 rerun in 21.46s ==================
2025-12-04T11:20:45.3107277Z Got exit code 1
2025-12-04T11:20:45.3108098Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3108521Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:20:45.3108959Z W1204 11:04:08.727000 87951 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.3109627Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-580d25229e34cb07.xml
2025-12-04T11:20:45.3109788Z ============================= test session starts ==============================
2025-12-04T11:20:45.3110137Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.3110256Z cachedir: .pytest_cache
2025-12-04T11:20:45.3110838Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.3110981Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.3111090Z configfile: pytest.ini
2025-12-04T11:20:45.3111631Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.3111856Z collecting ... collected 58 items / 3 deselected / 55 selected
2025-12-04T11:20:45.3112033Z stepcurrent: skipping 3 already run items.
2025-12-04T11:20:45.3112149Z Running 11 items in this shard
2025-12-04T11:20:45.3112164Z 
2025-12-04T11:20:45.3113037Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [4.2622s] [  9%]
2025-12-04T11:20:45.3113906Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5567s] [  9%]
2025-12-04T11:20:45.3114726Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.5499s] [  9%]
2025-12-04T11:20:45.3114732Z 
2025-12-04T11:20:45.3114874Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.3115392Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3115515Z Traceback (most recent call last):
2025-12-04T11:20:45.3116026Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3116264Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3116727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3116909Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3117452Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3117655Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3117798Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3117806Z 
2025-12-04T11:20:45.3117910Z Expected 1 but got 2.
2025-12-04T11:20:45.3118018Z Absolute difference: 1
2025-12-04T11:20:45.3118141Z Relative difference: 1.0
2025-12-04T11:20:45.3118147Z 
2025-12-04T11:20:45.3118358Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3119276Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3119282Z 
2025-12-04T11:20:45.3119552Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3119772Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3119903Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3120787Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3121031Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3121132Z graph_break []
2025-12-04T11:20:45.3121350Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3122095Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3122199Z   warnings.warn(
2025-12-04T11:20:45.3122995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3123101Z   warnings.warn(
2025-12-04T11:20:45.3123613Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3123749Z Traceback (most recent call last):
2025-12-04T11:20:45.3124292Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3124526Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3125000Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3125163Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3125719Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3125928Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3126111Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3126116Z 
2025-12-04T11:20:45.3126239Z Expected 1 but got 2.
2025-12-04T11:20:45.3126348Z Absolute difference: 1
2025-12-04T11:20:45.3126461Z Relative difference: 1.0
2025-12-04T11:20:45.3126480Z 
2025-12-04T11:20:45.3126701Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3127613Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3127619Z 
2025-12-04T11:20:45.3127905Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3128127Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3128250Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3129158Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3129394Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3129511Z graph_break []
2025-12-04T11:20:45.3129731Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3130462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3130582Z   warnings.warn(
2025-12-04T11:20:45.3131304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3131423Z   warnings.warn(
2025-12-04T11:20:45.3131648Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3131768Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3132013Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3132904Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3133009Z graph_break []
2025-12-04T11:20:45.3133242Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3133965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3134082Z   warnings.warn(
2025-12-04T11:20:45.3134858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3134962Z   warnings.warn(
2025-12-04T11:20:45.3135127Z =================================== FAILURES ===================================
2025-12-04T11:20:45.3135645Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3135819Z Traceback (most recent call last):
2025-12-04T11:20:45.3136418Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3136655Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3137129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3137293Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3137833Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3138132Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3138266Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3138272Z 
2025-12-04T11:20:45.3138398Z Expected 1 but got 2.
2025-12-04T11:20:45.3138507Z Absolute difference: 1
2025-12-04T11:20:45.3138619Z Relative difference: 1.0
2025-12-04T11:20:45.3138627Z 
2025-12-04T11:20:45.3138858Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3139770Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3139775Z 
2025-12-04T11:20:45.3140056Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3140275Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3140396Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3141298Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3141525Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3141641Z graph_break []
2025-12-04T11:20:45.3141858Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3142588Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3142706Z   warnings.warn(
2025-12-04T11:20:45.3143426Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3143538Z   warnings.warn(
2025-12-04T11:20:45.3143773Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3143892Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3144139Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3145028Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3145131Z graph_break []
2025-12-04T11:20:45.3145363Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3146087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3146206Z   warnings.warn(
2025-12-04T11:20:45.3146988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3147095Z   warnings.warn(
2025-12-04T11:20:45.3147324Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3147440Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3147668Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3148601Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3148700Z graph_break []
2025-12-04T11:20:45.3148930Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3149656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3149759Z   warnings.warn(
2025-12-04T11:20:45.3150524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3150625Z   warnings.warn(
2025-12-04T11:20:45.3151481Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-580d25229e34cb07.xml -
2025-12-04T11:20:45.3151657Z =========================== short test summary info ============================
2025-12-04T11:20:45.3152598Z FAILED [0.5499s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3152604Z 
2025-12-04T11:20:45.3152725Z Expected 1 but got 2.
2025-12-04T11:20:45.3152833Z Absolute difference: 1
2025-12-04T11:20:45.3152961Z Relative difference: 1.0
2025-12-04T11:20:45.3152969Z 
2025-12-04T11:20:45.3153189Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3154104Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3154112Z 
2025-12-04T11:20:45.3154394Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3154575Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.3154785Z =================== 1 failed, 3 deselected, 2 rerun in 5.40s ===================
2025-12-04T11:20:45.3154886Z Got exit code 1
2025-12-04T11:20:45.3154996Z Retrying single test...
2025-12-04T11:20:45.3155453Z W1204 11:04:29.537000 88155 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.3156119Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9d15e1ab064c4537.xml
2025-12-04T11:20:45.3156288Z ============================= test session starts ==============================
2025-12-04T11:20:45.3156653Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.3156767Z cachedir: .pytest_cache
2025-12-04T11:20:45.3157303Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.3157430Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.3157541Z configfile: pytest.ini
2025-12-04T11:20:45.3158098Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.3158317Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.3159377Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3159509Z Running 1 items in this shard
2025-12-04T11:20:45.3159513Z 
2025-12-04T11:20:45.3160790Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:04:35.858736407 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3160826Z 
2025-12-04T11:20:45.3161361Z [W1204 11:04:51.683103562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3161366Z 
2025-12-04T11:20:45.3161882Z [W1204 11:04:51.683367710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3161888Z 
2025-12-04T11:20:45.3162442Z [W1204 11:04:51.691012787 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3162447Z 
2025-12-04T11:20:45.3162956Z [W1204 11:04:51.691822743 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3162963Z 
2025-12-04T11:20:45.3163482Z [W1204 11:04:51.692030482 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3163487Z 
2025-12-04T11:20:45.3163996Z [W1204 11:04:51.699425928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3164001Z 
2025-12-04T11:20:45.3164526Z [W1204 11:04:51.700193773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3164535Z 
2025-12-04T11:20:45.3165044Z [W1204 11:04:51.700394860 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3165051Z 
2025-12-04T11:20:45.3165560Z [W1204 11:04:51.843491251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3165582Z 
2025-12-04T11:20:45.3166089Z [W1204 11:04:51.845281897 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3166094Z 
2025-12-04T11:20:45.3166599Z [W1204 11:04:51.845498916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3166605Z 
2025-12-04T11:20:45.3167129Z [W1204 11:04:51.849609588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3167134Z 
2025-12-04T11:20:45.3167645Z [W1204 11:04:51.850352583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3167652Z 
2025-12-04T11:20:45.3168172Z [W1204 11:04:51.850566443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3168177Z 
2025-12-04T11:20:45.3168688Z [W1204 11:04:51.856782331 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3168693Z 
2025-12-04T11:20:45.3169215Z [W1204 11:04:51.857470901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3169219Z 
2025-12-04T11:20:45.3169724Z [W1204 11:04:51.857668962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3169729Z 
2025-12-04T11:20:45.3169940Z ('RERUN', {'yellow': True}) [20.1321s] [100%]
2025-12-04T11:20:45.3171575Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:04:51.350105120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3171584Z 
2025-12-04T11:20:45.3172195Z [W1204 11:04:51.350865873 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3172215Z 
2025-12-04T11:20:45.3172725Z [W1204 11:04:51.351076777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3172730Z 
2025-12-04T11:20:45.3173239Z [W1204 11:04:51.355233539 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3173244Z 
2025-12-04T11:20:45.3173777Z [W1204 11:04:51.355920646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3173832Z 
2025-12-04T11:20:45.3174342Z [W1204 11:04:51.356121170 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3174347Z 
2025-12-04T11:20:45.3174865Z [W1204 11:04:51.362545131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3174873Z 
2025-12-04T11:20:45.3175381Z [W1204 11:04:51.363227009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3175386Z 
2025-12-04T11:20:45.3175907Z [W1204 11:04:51.363418628 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3175911Z 
2025-12-04T11:20:45.3176487Z [W1204 11:04:52.455401128 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3176496Z 
2025-12-04T11:20:45.3177004Z [W1204 11:04:52.456215113 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3177028Z 
2025-12-04T11:20:45.3177533Z [W1204 11:04:52.456432566 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3177540Z 
2025-12-04T11:20:45.3178048Z [W1204 11:04:52.460499790 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3178053Z 
2025-12-04T11:20:45.3178576Z [W1204 11:04:52.461203277 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3178581Z 
2025-12-04T11:20:45.3179091Z [W1204 11:04:52.461406752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3179098Z 
2025-12-04T11:20:45.3179615Z [W1204 11:04:52.467579445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3179620Z 
2025-12-04T11:20:45.3180125Z [W1204 11:04:52.468438257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3180132Z 
2025-12-04T11:20:45.3180652Z [W1204 11:04:52.468649896 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3180657Z 
2025-12-04T11:20:45.3180788Z ('RERUN', {'yellow': True}) [0.5698s] [100%]
2025-12-04T11:20:45.3182173Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:04:52.894208619 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3182182Z 
2025-12-04T11:20:45.3182694Z [W1204 11:04:52.895141600 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3182699Z 
2025-12-04T11:20:45.3183208Z [W1204 11:04:52.895373057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3183262Z 
2025-12-04T11:20:45.3183770Z [W1204 11:04:52.900263332 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3183775Z 
2025-12-04T11:20:45.3184279Z [W1204 11:04:52.901198844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3184284Z 
2025-12-04T11:20:45.3184815Z [W1204 11:04:52.901428089 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3184862Z 
2025-12-04T11:20:45.3185369Z [W1204 11:04:52.908479111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3185374Z 
2025-12-04T11:20:45.3185894Z [W1204 11:04:52.909382756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3185902Z 
2025-12-04T11:20:45.3186411Z [W1204 11:04:52.909589672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3186417Z 
2025-12-04T11:20:45.3186936Z [W1204 11:04:52.006060038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3186941Z 
2025-12-04T11:20:45.3187457Z [W1204 11:04:52.006898471 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3187463Z 
2025-12-04T11:20:45.3187990Z [W1204 11:04:52.007124222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3187995Z 
2025-12-04T11:20:45.3188505Z [W1204 11:04:52.011364865 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3188513Z 
2025-12-04T11:20:45.3189024Z [W1204 11:04:52.012104331 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3189029Z 
2025-12-04T11:20:45.3189548Z [W1204 11:04:52.012325294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3189553Z 
2025-12-04T11:20:45.3190066Z [W1204 11:04:52.018618183 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3190075Z 
2025-12-04T11:20:45.3190600Z [W1204 11:04:52.019574716 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3190607Z 
2025-12-04T11:20:45.3191114Z [W1204 11:04:52.019780781 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3191122Z 
2025-12-04T11:20:45.3191243Z FAILED [0.5502s] [100%]
2025-12-04T11:20:45.3191248Z 
2025-12-04T11:20:45.3191395Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.3191913Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3192054Z Traceback (most recent call last):
2025-12-04T11:20:45.3192567Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3192884Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3193354Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3193521Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3194073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3194312Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3194462Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3194468Z 
2025-12-04T11:20:45.3194577Z Expected 1 but got 2.
2025-12-04T11:20:45.3194685Z Absolute difference: 1
2025-12-04T11:20:45.3194809Z Relative difference: 1.0
2025-12-04T11:20:45.3194814Z 
2025-12-04T11:20:45.3195029Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3195948Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3195998Z 
2025-12-04T11:20:45.3196269Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3196492Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3196625Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3197514Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3197744Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3197873Z graph_break []
2025-12-04T11:20:45.3198092Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3199321Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3199445Z   if out == self.unknown_value:
2025-12-04T11:20:45.3200171Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3200294Z   warnings.warn(
2025-12-04T11:20:45.3201010Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3201127Z   warnings.warn(
2025-12-04T11:20:45.3201638Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3201762Z Traceback (most recent call last):
2025-12-04T11:20:45.3202295Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3202534Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3202994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3203177Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3203718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3203941Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3204076Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3204081Z 
2025-12-04T11:20:45.3204189Z Expected 1 but got 2.
2025-12-04T11:20:45.3204316Z Absolute difference: 1
2025-12-04T11:20:45.3204429Z Relative difference: 1.0
2025-12-04T11:20:45.3204434Z 
2025-12-04T11:20:45.3204736Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3205650Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3205658Z 
2025-12-04T11:20:45.3205931Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3206204Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3206325Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3207224Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3207453Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3207554Z graph_break []
2025-12-04T11:20:45.3207794Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3209043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3209174Z   if out == self.unknown_value:
2025-12-04T11:20:45.3209907Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3210010Z   warnings.warn(
2025-12-04T11:20:45.3210740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3210843Z   warnings.warn(
2025-12-04T11:20:45.3211063Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3211198Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3211425Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3212326Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3212426Z graph_break []
2025-12-04T11:20:45.3212640Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3213379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3213483Z   warnings.warn(
2025-12-04T11:20:45.3214210Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3214311Z   warnings.warn(
2025-12-04T11:20:45.3214462Z =================================== FAILURES ===================================
2025-12-04T11:20:45.3214990Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3215114Z Traceback (most recent call last):
2025-12-04T11:20:45.3215622Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3215869Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3216400Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3216583Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3217116Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3217447Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3217599Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3217607Z 
2025-12-04T11:20:45.3217713Z Expected 1 but got 2.
2025-12-04T11:20:45.3217821Z Absolute difference: 1
2025-12-04T11:20:45.3217946Z Relative difference: 1.0
2025-12-04T11:20:45.3217951Z 
2025-12-04T11:20:45.3218166Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3219127Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3219133Z 
2025-12-04T11:20:45.3219404Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3219622Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3219755Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3220646Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3220919Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3221019Z graph_break []
2025-12-04T11:20:45.3221238Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3222461Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3222580Z   if out == self.unknown_value:
2025-12-04T11:20:45.3223313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3223416Z   warnings.warn(
2025-12-04T11:20:45.3224138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3224257Z   warnings.warn(
2025-12-04T11:20:45.3224474Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3224592Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3224836Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3225726Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3225839Z graph_break []
2025-12-04T11:20:45.3226054Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3226782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3226901Z   warnings.warn(
2025-12-04T11:20:45.3227613Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3227727Z   warnings.warn(
2025-12-04T11:20:45.3227944Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3228060Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3228302Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3229187Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3229297Z graph_break []
2025-12-04T11:20:45.3229578Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3230307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3230421Z   warnings.warn(
2025-12-04T11:20:45.3231139Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3231272Z   warnings.warn(
2025-12-04T11:20:45.3232125Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9d15e1ab064c4537.xml -
2025-12-04T11:20:45.3232299Z =========================== short test summary info ============================
2025-12-04T11:20:45.3233268Z FAILED [0.5502s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3233306Z 
2025-12-04T11:20:45.3233415Z Expected 1 but got 2.
2025-12-04T11:20:45.3233525Z Absolute difference: 1
2025-12-04T11:20:45.3233653Z Relative difference: 1.0
2025-12-04T11:20:45.3233658Z 
2025-12-04T11:20:45.3233876Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3234796Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3234802Z 
2025-12-04T11:20:45.3235070Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3235250Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.3235464Z ================== 1 failed, 13 deselected, 2 rerun in 21.29s ==================
2025-12-04T11:20:45.3235566Z Got exit code 1
2025-12-04T11:20:45.3235691Z Retrying single test...
2025-12-04T11:20:45.3236138Z W1204 11:05:04.505000 88364 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.3236798Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e6d909bcc6975bf8.xml
2025-12-04T11:20:45.3236979Z ============================= test session starts ==============================
2025-12-04T11:20:45.3237330Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.3237439Z cachedir: .pytest_cache
2025-12-04T11:20:45.3237971Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.3238097Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.3238217Z configfile: pytest.ini
2025-12-04T11:20:45.3238765Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.3238988Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.3239994Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3240111Z Running 1 items in this shard
2025-12-04T11:20:45.3240116Z 
2025-12-04T11:20:45.3241408Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:05:10.780546421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3241415Z 
2025-12-04T11:20:45.3241996Z [W1204 11:05:26.919146891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3242003Z 
2025-12-04T11:20:45.3242528Z [W1204 11:05:26.919416751 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3242534Z 
2025-12-04T11:20:45.3243043Z [W1204 11:05:26.926934322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3243078Z 
2025-12-04T11:20:45.3243604Z [W1204 11:05:26.927758136 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3243609Z 
2025-12-04T11:20:45.3244120Z [W1204 11:05:26.927957863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3244125Z 
2025-12-04T11:20:45.3244632Z [W1204 11:05:26.935334671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3244642Z 
2025-12-04T11:20:45.3245162Z [W1204 11:05:26.936054982 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3245199Z 
2025-12-04T11:20:45.3245711Z [W1204 11:05:26.936244522 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3245718Z 
2025-12-04T11:20:45.3246239Z [W1204 11:05:26.076376665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3246244Z 
2025-12-04T11:20:45.3246752Z [W1204 11:05:26.078188203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3246757Z 
2025-12-04T11:20:45.3247279Z [W1204 11:05:26.078405879 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3247284Z 
2025-12-04T11:20:45.3247797Z [W1204 11:05:26.082496328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3247805Z 
2025-12-04T11:20:45.3248325Z [W1204 11:05:26.083195765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3248330Z 
2025-12-04T11:20:45.3248843Z [W1204 11:05:26.083399460 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3248848Z 
2025-12-04T11:20:45.3249353Z [W1204 11:05:26.089525159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3249370Z 
2025-12-04T11:20:45.3249881Z [W1204 11:05:26.090218626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3249886Z 
2025-12-04T11:20:45.3250396Z [W1204 11:05:26.090423240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3250402Z 
2025-12-04T11:20:45.3250547Z ('RERUN', {'yellow': True}) [20.4168s] [100%]
2025-12-04T11:20:45.3251827Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:05:27.580710909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3251836Z 
2025-12-04T11:20:45.3252359Z [W1204 11:05:27.581482248 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3252364Z 
2025-12-04T11:20:45.3252872Z [W1204 11:05:27.581691283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3252877Z 
2025-12-04T11:20:45.3253461Z [W1204 11:05:27.585824300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3253469Z 
2025-12-04T11:20:45.3253978Z [W1204 11:05:27.586490821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3253982Z 
2025-12-04T11:20:45.3254530Z [W1204 11:05:27.586688165 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3254535Z 
2025-12-04T11:20:45.3255041Z [W1204 11:05:27.592993223 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3255045Z 
2025-12-04T11:20:45.3255550Z [W1204 11:05:27.593641707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3255566Z 
2025-12-04T11:20:45.3256078Z [W1204 11:05:27.593831802 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3256114Z 
2025-12-04T11:20:45.3256704Z [W1204 11:05:27.685560711 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3256710Z 
2025-12-04T11:20:45.3257231Z [W1204 11:05:27.686377967 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3257239Z 
2025-12-04T11:20:45.3257748Z [W1204 11:05:27.686599367 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3257753Z 
2025-12-04T11:20:45.3258277Z [W1204 11:05:27.690708846 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3258282Z 
2025-12-04T11:20:45.3258795Z [W1204 11:05:27.691421343 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3258801Z 
2025-12-04T11:20:45.3259319Z [W1204 11:05:27.691628754 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3259324Z 
2025-12-04T11:20:45.3259828Z [W1204 11:05:27.697848049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3259835Z 
2025-12-04T11:20:45.3260357Z [W1204 11:05:27.698742869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3260362Z 
2025-12-04T11:20:45.3260870Z [W1204 11:05:27.698943962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3260875Z 
2025-12-04T11:20:45.3261009Z ('RERUN', {'yellow': True}) [0.5696s] [100%]
2025-12-04T11:20:45.3262314Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:05:27.125813059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3262323Z 
2025-12-04T11:20:45.3262832Z [W1204 11:05:27.126623300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3262839Z 
2025-12-04T11:20:45.3263363Z [W1204 11:05:27.126837387 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3263368Z 
2025-12-04T11:20:45.3263880Z [W1204 11:05:27.131139909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3263886Z 
2025-12-04T11:20:45.3264489Z [W1204 11:05:27.131860597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3264497Z 
2025-12-04T11:20:45.3265005Z [W1204 11:05:27.132062746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3265009Z 
2025-12-04T11:20:45.3265531Z [W1204 11:05:27.138366932 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3265574Z 
2025-12-04T11:20:45.3266082Z [W1204 11:05:27.139072284 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3266087Z 
2025-12-04T11:20:45.3266593Z [W1204 11:05:27.139269389 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3266611Z 
2025-12-04T11:20:45.3267122Z [W1204 11:05:27.235306040 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3267159Z 
2025-12-04T11:20:45.3267671Z [W1204 11:05:27.236087841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3267676Z 
2025-12-04T11:20:45.3268197Z [W1204 11:05:27.236295211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3268205Z 
2025-12-04T11:20:45.3268713Z [W1204 11:05:27.240288160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3268718Z 
2025-12-04T11:20:45.3269237Z [W1204 11:05:27.240971303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3269242Z 
2025-12-04T11:20:45.3269757Z [W1204 11:05:27.241169477 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3269762Z 
2025-12-04T11:20:45.3270283Z [W1204 11:05:27.247311814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3270287Z 
2025-12-04T11:20:45.3270795Z [W1204 11:05:27.248238418 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3270801Z 
2025-12-04T11:20:45.3271684Z [W1204 11:05:27.248442563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3271691Z 
2025-12-04T11:20:45.3271798Z FAILED [0.5469s] [100%]
2025-12-04T11:20:45.3271802Z 
2025-12-04T11:20:45.3271948Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.3272475Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3272609Z Traceback (most recent call last):
2025-12-04T11:20:45.3273139Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3273377Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3273841Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3274021Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3274559Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3274765Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3274913Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3274918Z 
2025-12-04T11:20:45.3275028Z Expected 1 but got 2.
2025-12-04T11:20:45.3275150Z Absolute difference: 1
2025-12-04T11:20:45.3275263Z Relative difference: 1.0
2025-12-04T11:20:45.3275390Z 
2025-12-04T11:20:45.3275610Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3276542Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3276548Z 
2025-12-04T11:20:45.3276864Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3277103Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3277224Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3278125Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3278370Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3278478Z graph_break []
2025-12-04T11:20:45.3278698Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3279967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3280089Z   if out == self.unknown_value:
2025-12-04T11:20:45.3280829Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3280937Z   warnings.warn(
2025-12-04T11:20:45.3281658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3281779Z   warnings.warn(
2025-12-04T11:20:45.3282299Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3282441Z Traceback (most recent call last):
2025-12-04T11:20:45.3282959Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3283193Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3283671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3283839Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3284389Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3284597Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3284734Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3284739Z 
2025-12-04T11:20:45.3284865Z Expected 1 but got 2.
2025-12-04T11:20:45.3284980Z Absolute difference: 1
2025-12-04T11:20:45.3285100Z Relative difference: 1.0
2025-12-04T11:20:45.3285105Z 
2025-12-04T11:20:45.3285334Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3286252Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3286260Z 
2025-12-04T11:20:45.3286546Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3286768Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3286887Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3287851Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3288081Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3288199Z graph_break []
2025-12-04T11:20:45.3288420Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3289627Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3289791Z   if out == self.unknown_value:
2025-12-04T11:20:45.3290514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3290632Z   warnings.warn(
2025-12-04T11:20:45.3291354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3291455Z   warnings.warn(
2025-12-04T11:20:45.3291715Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3291832Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3292059Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3292952Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3293054Z graph_break []
2025-12-04T11:20:45.3293281Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3294008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3294111Z   warnings.warn(
2025-12-04T11:20:45.3294850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3294955Z   warnings.warn(
2025-12-04T11:20:45.3295120Z =================================== FAILURES ===================================
2025-12-04T11:20:45.3295632Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3295761Z Traceback (most recent call last):
2025-12-04T11:20:45.3296361Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3296596Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3297056Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3297239Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3297779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3298002Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3298136Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3298142Z 
2025-12-04T11:20:45.3298248Z Expected 1 but got 2.
2025-12-04T11:20:45.3298376Z Absolute difference: 1
2025-12-04T11:20:45.3298488Z Relative difference: 1.0
2025-12-04T11:20:45.3298493Z 
2025-12-04T11:20:45.3298725Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3299635Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3299641Z 
2025-12-04T11:20:45.3299908Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3300219Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3300341Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3301240Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3301498Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3301598Z graph_break []
2025-12-04T11:20:45.3301826Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3303029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3303162Z   if out == self.unknown_value:
2025-12-04T11:20:45.3303892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3304048Z   warnings.warn(
2025-12-04T11:20:45.3304779Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3304883Z   warnings.warn(
2025-12-04T11:20:45.3305101Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3305230Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3305458Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3306357Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3306456Z graph_break []
2025-12-04T11:20:45.3306677Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3307414Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3307519Z   warnings.warn(
2025-12-04T11:20:45.3308255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3308361Z   warnings.warn(
2025-12-04T11:20:45.3308579Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3308710Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3308940Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3309831Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3309946Z graph_break []
2025-12-04T11:20:45.3310163Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3310901Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3311004Z   warnings.warn(
2025-12-04T11:20:45.3311720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3311835Z   warnings.warn(
2025-12-04T11:20:45.3312669Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e6d909bcc6975bf8.xml -
2025-12-04T11:20:45.3312914Z =========================== short test summary info ============================
2025-12-04T11:20:45.3313871Z FAILED [0.5469s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3313877Z 
2025-12-04T11:20:45.3313984Z Expected 1 but got 2.
2025-12-04T11:20:45.3314110Z Absolute difference: 1
2025-12-04T11:20:45.3314252Z Relative difference: 1.0
2025-12-04T11:20:45.3314257Z 
2025-12-04T11:20:45.3314486Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3315391Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3315397Z 
2025-12-04T11:20:45.3315663Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3315865Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.3316096Z ================== 1 failed, 13 deselected, 2 rerun in 21.57s ==================
2025-12-04T11:20:45.3316196Z Got exit code 1
2025-12-04T11:20:45.3317034Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3317445Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:20:45.3317904Z W1204 11:05:40.038000 88573 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.3318559Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0a612698d44183a1.xml
2025-12-04T11:20:45.3318728Z ============================= test session starts ==============================
2025-12-04T11:20:45.3319092Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.3319208Z cachedir: .pytest_cache
2025-12-04T11:20:45.3319741Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.3319868Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.3319979Z configfile: pytest.ini
2025-12-04T11:20:45.3320532Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.3320747Z collecting ... collected 58 items / 4 deselected / 54 selected
2025-12-04T11:20:45.3320906Z stepcurrent: skipping 4 already run items.
2025-12-04T11:20:45.3321024Z Running 10 items in this shard
2025-12-04T11:20:45.3321029Z 
2025-12-04T11:20:45.3321910Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [4.2482s] [ 10%]
2025-12-04T11:20:45.3322787Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5534s] [ 10%]
2025-12-04T11:20:45.3323563Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.5550s] [ 10%]
2025-12-04T11:20:45.3323572Z 
2025-12-04T11:20:45.3323726Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.3324237Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3324361Z Traceback (most recent call last):
2025-12-04T11:20:45.3324885Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3325180Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3325658Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3325825Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3326364Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3326616Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3326749Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3326754Z 
2025-12-04T11:20:45.3326862Z Expected 1 but got 2.
2025-12-04T11:20:45.3326983Z Absolute difference: 1
2025-12-04T11:20:45.3327093Z Relative difference: 1.0
2025-12-04T11:20:45.3327098Z 
2025-12-04T11:20:45.3327325Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3328236Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3328293Z 
2025-12-04T11:20:45.3328563Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3328798Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3328919Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3329819Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3330046Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3330147Z graph_break []
2025-12-04T11:20:45.3330379Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3331115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3331239Z   warnings.warn(
2025-12-04T11:20:45.3331963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3332066Z   warnings.warn(
2025-12-04T11:20:45.3332588Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3332714Z Traceback (most recent call last):
2025-12-04T11:20:45.3333221Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3333464Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3333926Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3334108Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3334642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3334851Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3334998Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3335007Z 
2025-12-04T11:20:45.3335113Z Expected 1 but got 2.
2025-12-04T11:20:45.3335234Z Absolute difference: 1
2025-12-04T11:20:45.3335345Z Relative difference: 1.0
2025-12-04T11:20:45.3335350Z 
2025-12-04T11:20:45.3335565Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3336561Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3336567Z 
2025-12-04T11:20:45.3336910Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3337146Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3337265Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3338149Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3338426Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3338527Z graph_break []
2025-12-04T11:20:45.3338745Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3339487Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3339605Z   warnings.warn(
2025-12-04T11:20:45.3340340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3340476Z   warnings.warn(
2025-12-04T11:20:45.3340694Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3340828Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3341060Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3341960Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3342062Z graph_break []
2025-12-04T11:20:45.3342278Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3343027Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3343133Z   warnings.warn(
2025-12-04T11:20:45.3343849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3343966Z   warnings.warn(
2025-12-04T11:20:45.3344119Z =================================== FAILURES ===================================
2025-12-04T11:20:45.3344648Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3344776Z Traceback (most recent call last):
2025-12-04T11:20:45.3345287Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3345535Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3345999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3346181Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3346719Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3346928Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3347083Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3347088Z 
2025-12-04T11:20:45.3347196Z Expected 1 but got 2.
2025-12-04T11:20:45.3347307Z Absolute difference: 1
2025-12-04T11:20:45.3347440Z Relative difference: 1.0
2025-12-04T11:20:45.3347445Z 
2025-12-04T11:20:45.3347662Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3348600Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3348673Z 
2025-12-04T11:20:45.3348945Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3349167Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3349301Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3350185Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3350480Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3350583Z graph_break []
2025-12-04T11:20:45.3350802Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3351545Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3351654Z   warnings.warn(
2025-12-04T11:20:45.3352371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3352522Z   warnings.warn(
2025-12-04T11:20:45.3352741Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3352874Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3353104Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3353989Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3354104Z graph_break []
2025-12-04T11:20:45.3354322Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3355063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3355168Z   warnings.warn(
2025-12-04T11:20:45.3355890Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3356002Z   warnings.warn(
2025-12-04T11:20:45.3356220Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3356334Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3356574Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3357458Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3357570Z graph_break []
2025-12-04T11:20:45.3357789Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3358514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3358630Z   warnings.warn(
2025-12-04T11:20:45.3359351Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3359470Z   warnings.warn(
2025-12-04T11:20:45.3360307Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0a612698d44183a1.xml -
2025-12-04T11:20:45.3360482Z =========================== short test summary info ============================
2025-12-04T11:20:45.3361508Z FAILED [0.5550s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3361517Z 
2025-12-04T11:20:45.3361625Z Expected 1 but got 2.
2025-12-04T11:20:45.3361747Z Absolute difference: 1
2025-12-04T11:20:45.3361859Z Relative difference: 1.0
2025-12-04T11:20:45.3361864Z 
2025-12-04T11:20:45.3362082Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3363042Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3363048Z 
2025-12-04T11:20:45.3363316Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3363510Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.3363709Z =================== 1 failed, 4 deselected, 2 rerun in 5.39s ===================
2025-12-04T11:20:45.3363815Z Got exit code 1
2025-12-04T11:20:45.3363940Z Retrying single test...
2025-12-04T11:20:45.3364414Z W1204 11:06:00.979000 88777 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.3365074Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-90f2ceb88314c75a.xml
2025-12-04T11:20:45.3365258Z ============================= test session starts ==============================
2025-12-04T11:20:45.3365608Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.3365732Z cachedir: .pytest_cache
2025-12-04T11:20:45.3366251Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.3366378Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.3366503Z configfile: pytest.ini
2025-12-04T11:20:45.3367048Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.3367269Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.3368273Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3368392Z Running 1 items in this shard
2025-12-04T11:20:45.3368398Z 
2025-12-04T11:20:45.3369700Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:06:06.376765421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3369707Z 
2025-12-04T11:20:45.3370231Z [W1204 11:06:22.087818799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3370239Z 
2025-12-04T11:20:45.3370763Z [W1204 11:06:22.088083682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3370769Z 
2025-12-04T11:20:45.3371638Z [W1204 11:06:22.095489169 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3371650Z 
2025-12-04T11:20:45.3372177Z [W1204 11:06:22.096224079 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3372182Z 
2025-12-04T11:20:45.3372693Z [W1204 11:06:22.096417919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3372698Z 
2025-12-04T11:20:45.3373352Z [W1204 11:06:22.103545531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3373358Z 
2025-12-04T11:20:45.3373872Z [W1204 11:06:22.104204659 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3373877Z 
2025-12-04T11:20:45.3374384Z [W1204 11:06:22.104392332 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3374445Z 
2025-12-04T11:20:45.3374959Z [W1204 11:06:22.240459661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3374964Z 
2025-12-04T11:20:45.3375473Z [W1204 11:06:22.242177774 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3375478Z 
2025-12-04T11:20:45.3376004Z [W1204 11:06:22.242391340 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3376014Z 
2025-12-04T11:20:45.3376591Z [W1204 11:06:22.246359559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3376658Z 
2025-12-04T11:20:45.3377183Z [W1204 11:06:22.247009057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3377190Z 
2025-12-04T11:20:45.3377696Z [W1204 11:06:22.247210558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3377701Z 
2025-12-04T11:20:45.3378223Z [W1204 11:06:22.253320201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3378228Z 
2025-12-04T11:20:45.3378740Z [W1204 11:06:22.253961494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3378745Z 
2025-12-04T11:20:45.3379256Z [W1204 11:06:22.254158790 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3379279Z 
2025-12-04T11:20:45.3379416Z ('RERUN', {'yellow': True}) [20.0260s] [100%]
2025-12-04T11:20:45.3380703Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:06:23.735296598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3380711Z 
2025-12-04T11:20:45.3381246Z [W1204 11:06:23.736032306 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3381252Z 
2025-12-04T11:20:45.3381760Z [W1204 11:06:23.736235670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3381769Z 
2025-12-04T11:20:45.3382292Z [W1204 11:06:23.740357723 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3382299Z 
2025-12-04T11:20:45.3382810Z [W1204 11:06:23.741001988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3382817Z 
2025-12-04T11:20:45.3383341Z [W1204 11:06:23.741195318 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3383346Z 
2025-12-04T11:20:45.3383853Z [W1204 11:06:23.747260128 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3383858Z 
2025-12-04T11:20:45.3384379Z [W1204 11:06:23.747875854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3384384Z 
2025-12-04T11:20:45.3384960Z [W1204 11:06:23.748066506 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3384968Z 
2025-12-04T11:20:45.3385481Z [W1204 11:06:23.837438193 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3385499Z 
2025-12-04T11:20:45.3386033Z [W1204 11:06:23.838228476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3386039Z 
2025-12-04T11:20:45.3386543Z [W1204 11:06:23.838451428 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3386548Z 
2025-12-04T11:20:45.3387067Z [W1204 11:06:23.842526009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3387072Z 
2025-12-04T11:20:45.3387582Z [W1204 11:06:23.843200175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3387620Z 
2025-12-04T11:20:45.3388142Z [W1204 11:06:23.843401431 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3388147Z 
2025-12-04T11:20:45.3388658Z [W1204 11:06:23.849503892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3388666Z 
2025-12-04T11:20:45.3389186Z [W1204 11:06:23.850367333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3389191Z 
2025-12-04T11:20:45.3389700Z [W1204 11:06:23.850572535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3389704Z 
2025-12-04T11:20:45.3389839Z ('RERUN', {'yellow': True}) [0.5571s] [100%]
2025-12-04T11:20:45.3391129Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:06:23.272493034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3391137Z 
2025-12-04T11:20:45.3391644Z [W1204 11:06:23.273237062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3391651Z 
2025-12-04T11:20:45.3392172Z [W1204 11:06:23.273436227 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3392177Z 
2025-12-04T11:20:45.3392681Z [W1204 11:06:23.277489708 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3392687Z 
2025-12-04T11:20:45.3393213Z [W1204 11:06:23.278108572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3393220Z 
2025-12-04T11:20:45.3393725Z [W1204 11:06:23.278302370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3393730Z 
2025-12-04T11:20:45.3394248Z [W1204 11:06:23.284477487 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3394254Z 
2025-12-04T11:20:45.3394763Z [W1204 11:06:23.285109703 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3394768Z 
2025-12-04T11:20:45.3395285Z [W1204 11:06:23.285299795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3395290Z 
2025-12-04T11:20:45.3395878Z [W1204 11:06:23.378206063 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3395886Z 
2025-12-04T11:20:45.3396395Z [W1204 11:06:23.379033322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3396414Z 
2025-12-04T11:20:45.3396924Z [W1204 11:06:23.379251880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3396959Z 
2025-12-04T11:20:45.3397468Z [W1204 11:06:23.383683921 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3397474Z 
2025-12-04T11:20:45.3397997Z [W1204 11:06:23.384395564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3398002Z 
2025-12-04T11:20:45.3398514Z [W1204 11:06:23.384609607 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3398551Z 
2025-12-04T11:20:45.3399070Z [W1204 11:06:24.390931891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3399075Z 
2025-12-04T11:20:45.3399586Z [W1204 11:06:24.391651945 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3399594Z 
2025-12-04T11:20:45.3400115Z [W1204 11:06:24.391855758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3400120Z 
2025-12-04T11:20:45.3400224Z FAILED [0.5398s] [100%]
2025-12-04T11:20:45.3400229Z 
2025-12-04T11:20:45.3400372Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.3400899Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3401029Z Traceback (most recent call last):
2025-12-04T11:20:45.3401560Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3401793Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3402260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3402441Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3402980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3403201Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3403338Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3403343Z 
2025-12-04T11:20:45.3403450Z Expected 1 but got 2.
2025-12-04T11:20:45.3403575Z Absolute difference: 1
2025-12-04T11:20:45.3403694Z Relative difference: 1.0
2025-12-04T11:20:45.3403698Z 
2025-12-04T11:20:45.3403918Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3404847Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3404855Z 
2025-12-04T11:20:45.3405127Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3405362Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3405479Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3406371Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3406704Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3406807Z graph_break []
2025-12-04T11:20:45.3407043Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3408262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3408413Z   if out == self.unknown_value:
2025-12-04T11:20:45.3409150Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3409255Z   warnings.warn(
2025-12-04T11:20:45.3409987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3410091Z   warnings.warn(
2025-12-04T11:20:45.3410609Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3410788Z Traceback (most recent call last):
2025-12-04T11:20:45.3411303Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3411536Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3412010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3412175Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3412725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3412932Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3413065Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3413070Z 
2025-12-04T11:20:45.3413195Z Expected 1 but got 2.
2025-12-04T11:20:45.3413306Z Absolute difference: 1
2025-12-04T11:20:45.3413431Z Relative difference: 1.0
2025-12-04T11:20:45.3413436Z 
2025-12-04T11:20:45.3413652Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3414567Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3414575Z 
2025-12-04T11:20:45.3414861Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3415085Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3415217Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3416113Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3416419Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3416543Z graph_break []
2025-12-04T11:20:45.3416763Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3417988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3418112Z   if out == self.unknown_value:
2025-12-04T11:20:45.3418837Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3418957Z   warnings.warn(
2025-12-04T11:20:45.3419774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3419882Z   warnings.warn(
2025-12-04T11:20:45.3420116Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3420235Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3420479Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3421404Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3421505Z graph_break []
2025-12-04T11:20:45.3421742Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3422468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3422586Z   warnings.warn(
2025-12-04T11:20:45.3423308Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3423446Z   warnings.warn(
2025-12-04T11:20:45.3423607Z =================================== FAILURES ===================================
2025-12-04T11:20:45.3424123Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3424253Z Traceback (most recent call last):
2025-12-04T11:20:45.3424780Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3425010Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3425487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3425657Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3426195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3426420Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3426558Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3426563Z 
2025-12-04T11:20:45.3426682Z Expected 1 but got 2.
2025-12-04T11:20:45.3426800Z Absolute difference: 1
2025-12-04T11:20:45.3426911Z Relative difference: 1.0
2025-12-04T11:20:45.3426916Z 
2025-12-04T11:20:45.3427144Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3428056Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3428062Z 
2025-12-04T11:20:45.3428334Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3428570Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3428690Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3429584Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3429814Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3429912Z graph_break []
2025-12-04T11:20:45.3430142Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3431345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3431476Z   if out == self.unknown_value:
2025-12-04T11:20:45.3432273Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3432378Z   warnings.warn(
2025-12-04T11:20:45.3433109Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3433240Z   warnings.warn(
2025-12-04T11:20:45.3433472Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3433592Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3433821Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3434724Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3434830Z graph_break []
2025-12-04T11:20:45.3435081Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3435821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3435924Z   warnings.warn(
2025-12-04T11:20:45.3436658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3436760Z   warnings.warn(
2025-12-04T11:20:45.3436978Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3437110Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3437339Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3438243Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3438345Z graph_break []
2025-12-04T11:20:45.3438560Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3439297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3439402Z   warnings.warn(
2025-12-04T11:20:45.3440117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3440233Z   warnings.warn(
2025-12-04T11:20:45.3441067Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-90f2ceb88314c75a.xml -
2025-12-04T11:20:45.3441261Z =========================== short test summary info ============================
2025-12-04T11:20:45.3442215Z FAILED [0.5398s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3442220Z 
2025-12-04T11:20:45.3442341Z Expected 1 but got 2.
2025-12-04T11:20:45.3442454Z Absolute difference: 1
2025-12-04T11:20:45.3442568Z Relative difference: 1.0
2025-12-04T11:20:45.3442573Z 
2025-12-04T11:20:45.3442807Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3443715Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3443720Z 
2025-12-04T11:20:45.3443989Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3444266Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.3444470Z ================== 1 failed, 13 deselected, 2 rerun in 21.16s ==================
2025-12-04T11:20:45.3444584Z Got exit code 1
2025-12-04T11:20:45.3444694Z Retrying single test...
2025-12-04T11:20:45.3445141Z W1204 11:06:35.878000 88986 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.3445841Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9644b19a5203c0ee.xml
2025-12-04T11:20:45.3446011Z ============================= test session starts ==============================
2025-12-04T11:20:45.3446377Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.3446488Z cachedir: .pytest_cache
2025-12-04T11:20:45.3447015Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.3447195Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.3447306Z configfile: pytest.ini
2025-12-04T11:20:45.3447851Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.3448084Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.3449078Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3449209Z Running 1 items in this shard
2025-12-04T11:20:45.3449214Z 
2025-12-04T11:20:45.3450500Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:06:41.143577932 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3450508Z 
2025-12-04T11:20:45.3451041Z [W1204 11:06:57.491570727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3451046Z 
2025-12-04T11:20:45.3451557Z [W1204 11:06:57.491841112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3451565Z 
2025-12-04T11:20:45.3452074Z [W1204 11:06:57.499345615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3452092Z 
2025-12-04T11:20:45.3452601Z [W1204 11:06:57.500152176 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3452607Z 
2025-12-04T11:20:45.3453114Z [W1204 11:06:57.500360440 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3453120Z 
2025-12-04T11:20:45.3453640Z [W1204 11:06:57.507655334 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3453646Z 
2025-12-04T11:20:45.3454150Z [W1204 11:06:57.508356087 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3454158Z 
2025-12-04T11:20:45.3454680Z [W1204 11:06:57.508562966 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3454685Z 
2025-12-04T11:20:45.3455189Z [W1204 11:06:57.646212449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3455194Z 
2025-12-04T11:20:45.3455773Z [W1204 11:06:57.647973657 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3455779Z 
2025-12-04T11:20:45.3456361Z [W1204 11:06:57.648188518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3456367Z 
2025-12-04T11:20:45.3456892Z [W1204 11:06:57.652239608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3456936Z 
2025-12-04T11:20:45.3457443Z [W1204 11:06:57.652965452 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3457448Z 
2025-12-04T11:20:45.3457954Z [W1204 11:06:57.653168708 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3457974Z 
2025-12-04T11:20:45.3458479Z [W1204 11:06:57.659298996 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3458489Z 
2025-12-04T11:20:45.3458996Z [W1204 11:06:57.659964432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3459032Z 
2025-12-04T11:20:45.3459552Z [W1204 11:06:57.660186746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3459559Z 
2025-12-04T11:20:45.3459693Z ('RERUN', {'yellow': True}) [19.6324s] [100%]
2025-12-04T11:20:45.3460983Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:06:57.156649865 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3460989Z 
2025-12-04T11:20:45.3461498Z [W1204 11:06:57.157441711 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3461507Z 
2025-12-04T11:20:45.3462033Z [W1204 11:06:57.157653505 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3462040Z 
2025-12-04T11:20:45.3462545Z [W1204 11:06:57.161925863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3462552Z 
2025-12-04T11:20:45.3463059Z [W1204 11:06:57.162606988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3463076Z 
2025-12-04T11:20:45.3463586Z [W1204 11:06:57.162806247 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3463591Z 
2025-12-04T11:20:45.3464098Z [W1204 11:06:57.169069830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3464103Z 
2025-12-04T11:20:45.3464624Z [W1204 11:06:57.169726148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3464631Z 
2025-12-04T11:20:45.3465138Z [W1204 11:06:57.169919758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3465142Z 
2025-12-04T11:20:45.3465668Z [W1204 11:06:57.259894073 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3465673Z 
2025-12-04T11:20:45.3466179Z [W1204 11:06:57.260717715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3466184Z 
2025-12-04T11:20:45.3466701Z [W1204 11:06:57.260934182 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3466706Z 
2025-12-04T11:20:45.3467283Z [W1204 11:06:57.264928029 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3467291Z 
2025-12-04T11:20:45.3467811Z [W1204 11:06:57.265583236 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3467815Z 
2025-12-04T11:20:45.3468321Z [W1204 11:06:57.265782299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3468357Z 
2025-12-04T11:20:45.3468866Z [W1204 11:06:57.271977896 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3468883Z 
2025-12-04T11:20:45.3469390Z [W1204 11:06:57.272834749 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3469395Z 
2025-12-04T11:20:45.3469906Z [W1204 11:06:57.273032791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3469940Z 
2025-12-04T11:20:45.3470084Z ('RERUN', {'yellow': True}) [0.5733s] [100%]
2025-12-04T11:20:45.3471768Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:06:58.701941684 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3471780Z 
2025-12-04T11:20:45.3472308Z [W1204 11:06:58.702738583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3472313Z 
2025-12-04T11:20:45.3472822Z [W1204 11:06:58.702956311 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3472826Z 
2025-12-04T11:20:45.3473351Z [W1204 11:06:58.707098368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3473358Z 
2025-12-04T11:20:45.3473863Z [W1204 11:06:58.707781943 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3473868Z 
2025-12-04T11:20:45.3474393Z [W1204 11:06:58.707988164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3474400Z 
2025-12-04T11:20:45.3474905Z [W1204 11:06:58.714324880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3474910Z 
2025-12-04T11:20:45.3475416Z [W1204 11:06:58.715006697 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3475422Z 
2025-12-04T11:20:45.3475948Z [W1204 11:06:58.715203624 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3475956Z 
2025-12-04T11:20:45.3476465Z [W1204 11:06:58.809890180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3476470Z 
2025-12-04T11:20:45.3476993Z [W1204 11:06:58.810733420 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3477000Z 
2025-12-04T11:20:45.3477511Z [W1204 11:06:58.810958579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3477517Z 
2025-12-04T11:20:45.3478035Z [W1204 11:06:58.814946738 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3478040Z 
2025-12-04T11:20:45.3478680Z [W1204 11:06:58.815617626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3478688Z 
2025-12-04T11:20:45.3479215Z [W1204 11:06:58.815819598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3479220Z 
2025-12-04T11:20:45.3479727Z [W1204 11:06:58.822002787 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3479774Z 
2025-12-04T11:20:45.3480286Z [W1204 11:06:58.822861662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3480304Z 
2025-12-04T11:20:45.3480811Z [W1204 11:06:58.823060882 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3480816Z 
2025-12-04T11:20:45.3480923Z FAILED [0.5485s] [100%]
2025-12-04T11:20:45.3480928Z 
2025-12-04T11:20:45.3481098Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.3481674Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3481813Z Traceback (most recent call last):
2025-12-04T11:20:45.3482335Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3482571Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3483049Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3483215Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3483749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3483971Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3484111Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3484119Z 
2025-12-04T11:20:45.3484241Z Expected 1 but got 2.
2025-12-04T11:20:45.3484351Z Absolute difference: 1
2025-12-04T11:20:45.3484465Z Relative difference: 1.0
2025-12-04T11:20:45.3484470Z 
2025-12-04T11:20:45.3484698Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3485603Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3485611Z 
2025-12-04T11:20:45.3485893Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3486115Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3486232Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3487139Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3487371Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3487485Z graph_break []
2025-12-04T11:20:45.3487705Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3488912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3489042Z   if out == self.unknown_value:
2025-12-04T11:20:45.3489770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3489873Z   warnings.warn(
2025-12-04T11:20:45.3490689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3490796Z   warnings.warn(
2025-12-04T11:20:45.3491318Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3491445Z Traceback (most recent call last):
2025-12-04T11:20:45.3491987Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3492234Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3492695Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3492876Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3493416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3493625Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3493805Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3493811Z 
2025-12-04T11:20:45.3493918Z Expected 1 but got 2.
2025-12-04T11:20:45.3494028Z Absolute difference: 1
2025-12-04T11:20:45.3494158Z Relative difference: 1.0
2025-12-04T11:20:45.3494163Z 
2025-12-04T11:20:45.3494383Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3495310Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3495316Z 
2025-12-04T11:20:45.3495587Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3495809Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3495944Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3496905Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3497153Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3497255Z graph_break []
2025-12-04T11:20:45.3497476Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3498704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3498827Z   if out == self.unknown_value:
2025-12-04T11:20:45.3499570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3499676Z   warnings.warn(
2025-12-04T11:20:45.3500395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3500514Z   warnings.warn(
2025-12-04T11:20:45.3500736Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3500855Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3501096Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3501980Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3502097Z graph_break []
2025-12-04T11:20:45.3502316Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3503105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3503222Z   warnings.warn(
2025-12-04T11:20:45.3503941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3504091Z   warnings.warn(
2025-12-04T11:20:45.3504242Z =================================== FAILURES ===================================
2025-12-04T11:20:45.3504754Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3504896Z Traceback (most recent call last):
2025-12-04T11:20:45.3505404Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3505649Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3506108Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3506304Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3506853Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3507063Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3507193Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3507200Z 
2025-12-04T11:20:45.3507322Z Expected 1 but got 2.
2025-12-04T11:20:45.3507430Z Absolute difference: 1
2025-12-04T11:20:45.3507554Z Relative difference: 1.0
2025-12-04T11:20:45.3507559Z 
2025-12-04T11:20:45.3507775Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3508690Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3508698Z 
2025-12-04T11:20:45.3508979Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3509199Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3509333Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3510224Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3510452Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3510565Z graph_break []
2025-12-04T11:20:45.3510783Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3512003Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3512128Z   if out == self.unknown_value:
2025-12-04T11:20:45.3512849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3512966Z   warnings.warn(
2025-12-04T11:20:45.3513684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3513786Z   warnings.warn(
2025-12-04T11:20:45.3514017Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3514133Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3514380Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3515333Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3515438Z graph_break []
2025-12-04T11:20:45.3515667Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3516390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3516534Z   warnings.warn(
2025-12-04T11:20:45.3517251Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3517352Z   warnings.warn(
2025-12-04T11:20:45.3517581Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3517698Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3517930Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3518874Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3518973Z graph_break []
2025-12-04T11:20:45.3519204Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3519930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3520030Z   warnings.warn(
2025-12-04T11:20:45.3520759Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3520859Z   warnings.warn(
2025-12-04T11:20:45.3521708Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9644b19a5203c0ee.xml -
2025-12-04T11:20:45.3521885Z =========================== short test summary info ============================
2025-12-04T11:20:45.3522834Z FAILED [0.5485s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3522842Z 
2025-12-04T11:20:45.3522964Z Expected 1 but got 2.
2025-12-04T11:20:45.3523073Z Absolute difference: 1
2025-12-04T11:20:45.3523199Z Relative difference: 1.0
2025-12-04T11:20:45.3523204Z 
2025-12-04T11:20:45.3523425Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3524339Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3524346Z 
2025-12-04T11:20:45.3524629Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3524811Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.3525023Z ================== 1 failed, 13 deselected, 2 rerun in 20.79s ==================
2025-12-04T11:20:45.3525128Z Got exit code 1
2025-12-04T11:20:45.3525958Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3526384Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:20:45.3526835Z W1204 11:07:10.337000 89195 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.3527569Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a6f999e52eb1904.xml
2025-12-04T11:20:45.3527740Z ============================= test session starts ==============================
2025-12-04T11:20:45.3528092Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.3528218Z cachedir: .pytest_cache
2025-12-04T11:20:45.3528786Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.3528911Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.3529039Z configfile: pytest.ini
2025-12-04T11:20:45.3529579Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.3529808Z collecting ... collected 58 items / 5 deselected / 53 selected
2025-12-04T11:20:45.3529949Z stepcurrent: skipping 5 already run items.
2025-12-04T11:20:45.3530068Z Running 9 items in this shard
2025-12-04T11:20:45.3530105Z 
2025-12-04T11:20:45.3530996Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [4.3412s] [ 11%]
2025-12-04T11:20:45.3531869Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.8933s] [ 11%]
2025-12-04T11:20:45.3532672Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.8957s] [ 11%]
2025-12-04T11:20:45.3532678Z 
2025-12-04T11:20:45.3532822Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.3533338Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3533476Z Traceback (most recent call last):
2025-12-04T11:20:45.3533989Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3534233Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3534692Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3534858Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3535410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3535618Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3535765Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3535770Z 
2025-12-04T11:20:45.3535878Z Expected 1 but got 2.
2025-12-04T11:20:45.3535988Z Absolute difference: 1
2025-12-04T11:20:45.3536116Z Relative difference: 1.0
2025-12-04T11:20:45.3536123Z 
2025-12-04T11:20:45.3536417Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3537337Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3537360Z 
2025-12-04T11:20:45.3537631Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3537853Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3537985Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3538515Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.3538743Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3538927Z graph_break []
2025-12-04T11:20:45.3539147Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3539894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3539995Z   warnings.warn(
2025-12-04T11:20:45.3540743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3540862Z   warnings.warn(
2025-12-04T11:20:45.3541378Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3541500Z Traceback (most recent call last):
2025-12-04T11:20:45.3542018Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3542256Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3542755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3542920Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3543455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3543677Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3543812Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3543817Z 
2025-12-04T11:20:45.3543938Z Expected 1 but got 2.
2025-12-04T11:20:45.3544048Z Absolute difference: 1
2025-12-04T11:20:45.3544160Z Relative difference: 1.0
2025-12-04T11:20:45.3544165Z 
2025-12-04T11:20:45.3544395Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3545317Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3545325Z 
2025-12-04T11:20:45.3545609Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3545830Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3545949Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3546496Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.3546727Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3546827Z graph_break []
2025-12-04T11:20:45.3547058Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3547790Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3547910Z   warnings.warn(
2025-12-04T11:20:45.3548631Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3548733Z   warnings.warn(
2025-12-04T11:20:45.3548963Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3549082Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3549310Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3549849Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.3549949Z graph_break []
2025-12-04T11:20:45.3550177Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3550970Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3551076Z   warnings.warn(
2025-12-04T11:20:45.3551807Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3551909Z   warnings.warn(
2025-12-04T11:20:45.3552091Z =================================== FAILURES ===================================
2025-12-04T11:20:45.3552621Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3552744Z Traceback (most recent call last):
2025-12-04T11:20:45.3553271Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3553503Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3553965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3554177Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3554715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3554942Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3555079Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3555085Z 
2025-12-04T11:20:45.3555192Z Expected 1 but got 2.
2025-12-04T11:20:45.3555316Z Absolute difference: 1
2025-12-04T11:20:45.3555427Z Relative difference: 1.0
2025-12-04T11:20:45.3555432Z 
2025-12-04T11:20:45.3555646Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3556585Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3556591Z 
2025-12-04T11:20:45.3556864Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3557100Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3557219Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3557745Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.3557992Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3558094Z graph_break []
2025-12-04T11:20:45.3558328Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3559057Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3559161Z   warnings.warn(
2025-12-04T11:20:45.3559894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3560002Z   warnings.warn(
2025-12-04T11:20:45.3560221Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3560355Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3560593Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3561141Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.3561245Z graph_break []
2025-12-04T11:20:45.3561461Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3562201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3562366Z   warnings.warn(
2025-12-04T11:20:45.3563102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3563206Z   warnings.warn(
2025-12-04T11:20:45.3563420Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3563587Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3563816Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3564343Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.3564457Z graph_break []
2025-12-04T11:20:45.3564675Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3565411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3565555Z   warnings.warn(
2025-12-04T11:20:45.3566268Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3566381Z   warnings.warn(
2025-12-04T11:20:45.3567214Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a6f999e52eb1904.xml -
2025-12-04T11:20:45.3567409Z =========================== short test summary info ============================
2025-12-04T11:20:45.3568358Z FAILED [0.8957s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3568364Z 
2025-12-04T11:20:45.3568474Z Expected 1 but got 2.
2025-12-04T11:20:45.3568600Z Absolute difference: 1
2025-12-04T11:20:45.3568716Z Relative difference: 1.0
2025-12-04T11:20:45.3568723Z 
2025-12-04T11:20:45.3568953Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3569864Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3569872Z 
2025-12-04T11:20:45.3570140Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3570331Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.3570528Z =================== 1 failed, 5 deselected, 2 rerun in 6.16s ===================
2025-12-04T11:20:45.3570629Z Got exit code 1
2025-12-04T11:20:45.3570750Z Retrying single test...
2025-12-04T11:20:45.3571530Z W1204 11:07:31.151000 89372 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.3572203Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-547414903ca204e9.xml
2025-12-04T11:20:45.3572373Z ============================= test session starts ==============================
2025-12-04T11:20:45.3572722Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.3572850Z cachedir: .pytest_cache
2025-12-04T11:20:45.3573375Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.3573516Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.3573625Z configfile: pytest.ini
2025-12-04T11:20:45.3574171Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.3574556Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.3575562Z stepcurrent: skipping 5 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3575682Z Running 1 items in this shard
2025-12-04T11:20:45.3575703Z 
2025-12-04T11:20:45.3577055Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:07:35.611505898 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3577111Z 
2025-12-04T11:20:45.3577632Z [W1204 11:07:51.542888694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3577651Z 
2025-12-04T11:20:45.3578166Z [W1204 11:07:51.543149450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3578216Z 
2025-12-04T11:20:45.3578728Z [W1204 11:07:51.550598929 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3578733Z 
2025-12-04T11:20:45.3579256Z [W1204 11:07:51.551419010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3579264Z 
2025-12-04T11:20:45.3579777Z [W1204 11:07:51.551615413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3579782Z 
2025-12-04T11:20:45.3580305Z [W1204 11:07:51.558867861 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3580310Z 
2025-12-04T11:20:45.3580824Z [W1204 11:07:51.559648206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3580829Z 
2025-12-04T11:20:45.3581353Z [W1204 11:07:51.559847222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3581357Z 
2025-12-04T11:20:45.3581864Z [W1204 11:07:53.562425796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3581872Z 
2025-12-04T11:20:45.3582393Z [W1204 11:07:53.564202215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3582398Z 
2025-12-04T11:20:45.3582906Z [W1204 11:07:53.564431693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3582911Z 
2025-12-04T11:20:45.3583418Z [W1204 11:07:53.568497211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3583426Z 
2025-12-04T11:20:45.3583949Z [W1204 11:07:53.569209727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3583956Z 
2025-12-04T11:20:45.3584462Z [W1204 11:07:53.569413525 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3584469Z 
2025-12-04T11:20:45.3584989Z [W1204 11:07:53.575625034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3584994Z 
2025-12-04T11:20:45.3585503Z [W1204 11:07:53.576332632 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3585508Z 
2025-12-04T11:20:45.3586028Z [W1204 11:07:53.576544452 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3586033Z 
2025-12-04T11:20:45.3586227Z ('RERUN', {'yellow': True}) [20.2628s] [100%]
2025-12-04T11:20:45.3587535Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:07:54.416233067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3587572Z 
2025-12-04T11:20:45.3588084Z [W1204 11:07:54.417051198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3588089Z 
2025-12-04T11:20:45.3588594Z [W1204 11:07:54.417264930 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3588611Z 
2025-12-04T11:20:45.3589120Z [W1204 11:07:54.421342055 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3589129Z 
2025-12-04T11:20:45.3589636Z [W1204 11:07:54.422217987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3589669Z 
2025-12-04T11:20:45.3590190Z [W1204 11:07:54.422418389 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3590197Z 
2025-12-04T11:20:45.3590707Z [W1204 11:07:54.428624018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3590711Z 
2025-12-04T11:20:45.3591234Z [W1204 11:07:54.429314103 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3591238Z 
2025-12-04T11:20:45.3591746Z [W1204 11:07:54.429506896 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3591751Z 
2025-12-04T11:20:45.3592276Z [W1204 11:07:54.521090188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3592283Z 
2025-12-04T11:20:45.3592792Z [W1204 11:07:54.521900368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3592797Z 
2025-12-04T11:20:45.3593322Z [W1204 11:07:54.522117599 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3593327Z 
2025-12-04T11:20:45.3593840Z [W1204 11:07:54.526209633 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3593845Z 
2025-12-04T11:20:45.3594349Z [W1204 11:07:54.526902162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3594367Z 
2025-12-04T11:20:45.3594878Z [W1204 11:07:54.527100942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3594884Z 
2025-12-04T11:20:45.3595394Z [W1204 11:07:54.533343262 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3595398Z 
2025-12-04T11:20:45.3595918Z [W1204 11:07:54.534283037 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3595925Z 
2025-12-04T11:20:45.3596432Z [W1204 11:07:54.534483535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3596437Z 
2025-12-04T11:20:45.3596580Z ('RERUN', {'yellow': True}) [0.9181s] [100%]
2025-12-04T11:20:45.3597921Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:07:54.314101059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3597929Z 
2025-12-04T11:20:45.3598455Z [W1204 11:07:54.314902345 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3598460Z 
2025-12-04T11:20:45.3598971Z [W1204 11:07:54.315103523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3599007Z 
2025-12-04T11:20:45.3599531Z [W1204 11:07:54.319066398 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3599536Z 
2025-12-04T11:20:45.3600044Z [W1204 11:07:54.319718292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3600050Z 
2025-12-04T11:20:45.3600564Z [W1204 11:07:54.319909131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3600601Z 
2025-12-04T11:20:45.3601121Z [W1204 11:07:54.326016425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3601126Z 
2025-12-04T11:20:45.3601631Z [W1204 11:07:54.326674109 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3601638Z 
2025-12-04T11:20:45.3602159Z [W1204 11:07:54.326861191 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3602164Z 
2025-12-04T11:20:45.3602672Z [W1204 11:07:55.416622721 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3602677Z 
2025-12-04T11:20:45.3603199Z [W1204 11:07:55.417412597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3603206Z 
2025-12-04T11:20:45.3603714Z [W1204 11:07:55.417619963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3603719Z 
2025-12-04T11:20:45.3604240Z [W1204 11:07:55.421646380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3604247Z 
2025-12-04T11:20:45.3604753Z [W1204 11:07:55.422323954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3604758Z 
2025-12-04T11:20:45.3605268Z [W1204 11:07:55.422523436 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3605285Z 
2025-12-04T11:20:45.3605797Z [W1204 11:07:55.428638163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3605804Z 
2025-12-04T11:20:45.3606311Z [W1204 11:07:55.429479410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3606316Z 
2025-12-04T11:20:45.3606831Z [W1204 11:07:55.429676683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3606838Z 
2025-12-04T11:20:45.3606941Z FAILED [0.8927s] [100%]
2025-12-04T11:20:45.3606946Z 
2025-12-04T11:20:45.3607100Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.3607617Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3607741Z Traceback (most recent call last):
2025-12-04T11:20:45.3608327Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3608562Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3609042Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3609205Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3609744Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3609999Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3610134Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3610139Z 
2025-12-04T11:20:45.3610258Z Expected 1 but got 2.
2025-12-04T11:20:45.3610366Z Absolute difference: 1
2025-12-04T11:20:45.3610477Z Relative difference: 1.0
2025-12-04T11:20:45.3610482Z 
2025-12-04T11:20:45.3610710Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3611633Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3611670Z 
2025-12-04T11:20:45.3611943Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3612180Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3612299Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3612841Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.3613070Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3613170Z graph_break []
2025-12-04T11:20:45.3613401Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3614618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3614750Z   if out == self.unknown_value:
2025-12-04T11:20:45.3615474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3615583Z   warnings.warn(
2025-12-04T11:20:45.3616389Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3616496Z   warnings.warn(
2025-12-04T11:20:45.3617016Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3617156Z Traceback (most recent call last):
2025-12-04T11:20:45.3617668Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3617917Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3618374Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3618539Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3619092Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3619299Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3619451Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3619456Z 
2025-12-04T11:20:45.3619564Z Expected 1 but got 2.
2025-12-04T11:20:45.3619673Z Absolute difference: 1
2025-12-04T11:20:45.3619801Z Relative difference: 1.0
2025-12-04T11:20:45.3619806Z 
2025-12-04T11:20:45.3620026Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3621033Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3621056Z 
2025-12-04T11:20:45.3621329Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3621550Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3621716Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3622246Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.3622474Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3622588Z graph_break []
2025-12-04T11:20:45.3622802Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3624020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3624177Z   if out == self.unknown_value:
2025-12-04T11:20:45.3624896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3625016Z   warnings.warn(
2025-12-04T11:20:45.3625736Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3625851Z   warnings.warn(
2025-12-04T11:20:45.3626069Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3626187Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3626433Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3626960Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.3627060Z graph_break []
2025-12-04T11:20:45.3627641Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3628478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3628622Z   warnings.warn(
2025-12-04T11:20:45.3629444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3629586Z   warnings.warn(
2025-12-04T11:20:45.3629749Z =================================== FAILURES ===================================
2025-12-04T11:20:45.3630420Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3630586Z Traceback (most recent call last):
2025-12-04T11:20:45.3631142Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3631468Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3631968Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3632234Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3632823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3633067Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3633304Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3633310Z 
2025-12-04T11:20:45.3633452Z Expected 1 but got 2.
2025-12-04T11:20:45.3633688Z Absolute difference: 1
2025-12-04T11:20:45.3633885Z Relative difference: 1.0
2025-12-04T11:20:45.3633891Z 
2025-12-04T11:20:45.3634169Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3635187Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3635224Z 
2025-12-04T11:20:45.3635532Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3635841Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3635975Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3636579Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.3636931Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3637069Z graph_break []
2025-12-04T11:20:45.3637360Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3638657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3638794Z   if out == self.unknown_value:
2025-12-04T11:20:45.3639697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3639842Z   warnings.warn(
2025-12-04T11:20:45.3640599Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3640785Z   warnings.warn(
2025-12-04T11:20:45.3641046Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3641282Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3641569Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3642135Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.3642325Z graph_break []
2025-12-04T11:20:45.3642582Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3643386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3643569Z   warnings.warn(
2025-12-04T11:20:45.3644347Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3644534Z   warnings.warn(
2025-12-04T11:20:45.3644790Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3644941Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3645245Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3645843Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.3646056Z graph_break []
2025-12-04T11:20:45.3646311Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3647068Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3647267Z   warnings.warn(
2025-12-04T11:20:45.3648066Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3648309Z   warnings.warn(
2025-12-04T11:20:45.3649181Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-547414903ca204e9.xml -
2025-12-04T11:20:45.3649391Z =========================== short test summary info ============================
2025-12-04T11:20:45.3650478Z FAILED [0.8927s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3650486Z 
2025-12-04T11:20:45.3650632Z Expected 1 but got 2.
2025-12-04T11:20:45.3650842Z Absolute difference: 1
2025-12-04T11:20:45.3651012Z Relative difference: 1.0
2025-12-04T11:20:45.3651018Z 
2025-12-04T11:20:45.3651277Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3652296Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3652340Z 
2025-12-04T11:20:45.3652648Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3652888Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.3653165Z ================== 1 failed, 13 deselected, 2 rerun in 22.11s ==================
2025-12-04T11:20:45.3653321Z Got exit code 1
2025-12-04T11:20:45.3653536Z Retrying single test...
2025-12-04T11:20:45.3654021Z W1204 11:08:06.780000 89554 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.3654762Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-41f0f199b083e6d2.xml
2025-12-04T11:20:45.3654948Z ============================= test session starts ==============================
2025-12-04T11:20:45.3655375Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.3655604Z cachedir: .pytest_cache
2025-12-04T11:20:45.3656167Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.3656479Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.3656629Z configfile: pytest.ini
2025-12-04T11:20:45.3657188Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.3657564Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.3658611Z stepcurrent: skipping 5 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3658767Z Running 1 items in this shard
2025-12-04T11:20:45.3658822Z 
2025-12-04T11:20:45.3660144Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:08:10.243397218 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3660153Z 
2025-12-04T11:20:45.3660710Z [W1204 11:08:26.993787869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3660742Z 
2025-12-04T11:20:45.3661337Z [W1204 11:08:26.994072693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3661344Z 
2025-12-04T11:20:45.3661992Z [W1204 11:08:26.001587353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3661998Z 
2025-12-04T11:20:45.3662593Z [W1204 11:08:26.002347959 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3662598Z 
2025-12-04T11:20:45.3663144Z [W1204 11:08:26.002547272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3663182Z 
2025-12-04T11:20:45.3663777Z [W1204 11:08:26.009701318 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3663782Z 
2025-12-04T11:20:45.3664315Z [W1204 11:08:26.010488744 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3664320Z 
2025-12-04T11:20:45.3664973Z [W1204 11:08:26.010694260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3664984Z 
2025-12-04T11:20:45.3665530Z [W1204 11:08:28.017672412 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3665565Z 
2025-12-04T11:20:45.3666162Z [W1204 11:08:28.019424886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3666170Z 
2025-12-04T11:20:45.3666715Z [W1204 11:08:28.019642223 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3666720Z 
2025-12-04T11:20:45.3667275Z [W1204 11:08:28.023893815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3667306Z 
2025-12-04T11:20:45.3667890Z [W1204 11:08:28.024658586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3667895Z 
2025-12-04T11:20:45.3668460Z [W1204 11:08:28.024871034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3668467Z 
2025-12-04T11:20:45.3669061Z [W1204 11:08:28.031158266 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3669067Z 
2025-12-04T11:20:45.3669613Z [W1204 11:08:28.031887418 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3669618Z 
2025-12-04T11:20:45.3670221Z [W1204 11:08:28.032091799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3670226Z 
2025-12-04T11:20:45.3670374Z ('RERUN', {'yellow': True}) [20.0741s] [100%]
2025-12-04T11:20:45.3672195Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:08:29.893022120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3672206Z 
2025-12-04T11:20:45.3672760Z [W1204 11:08:29.893848526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3672768Z 
2025-12-04T11:20:45.3673361Z [W1204 11:08:29.894062370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3673366Z 
2025-12-04T11:20:45.3673921Z [W1204 11:08:29.898277431 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3673926Z 
2025-12-04T11:20:45.3674471Z [W1204 11:08:29.899185675 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3674503Z 
2025-12-04T11:20:45.3675215Z [W1204 11:08:29.899389102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3675224Z 
2025-12-04T11:20:45.3675788Z [W1204 11:08:29.905728040 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3675843Z 
2025-12-04T11:20:45.3676441Z [W1204 11:08:29.906459251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3676447Z 
2025-12-04T11:20:45.3676991Z [W1204 11:08:29.906659857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3676997Z 
2025-12-04T11:20:45.3677591Z [W1204 11:08:29.000988940 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3677597Z 
2025-12-04T11:20:45.3678122Z [W1204 11:08:29.001828087 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3678188Z 
2025-12-04T11:20:45.3678847Z [W1204 11:08:29.002053371 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3678852Z 
2025-12-04T11:20:45.3679408Z [W1204 11:08:29.006248342 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3679417Z 
2025-12-04T11:20:45.3680010Z [W1204 11:08:29.006984842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3680015Z 
2025-12-04T11:20:45.3680558Z [W1204 11:08:29.007199666 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3680563Z 
2025-12-04T11:20:45.3681140Z [W1204 11:08:29.013583811 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3681148Z 
2025-12-04T11:20:45.3681746Z [W1204 11:08:29.014574444 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3681752Z 
2025-12-04T11:20:45.3682324Z [W1204 11:08:29.014783817 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3682380Z 
2025-12-04T11:20:45.3682550Z ('RERUN', {'yellow': True}) [0.9434s] [100%]
2025-12-04T11:20:45.3683870Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:08:30.799798371 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3683877Z 
2025-12-04T11:20:45.3684472Z [W1204 11:08:30.800641898 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3684480Z 
2025-12-04T11:20:45.3685000Z [W1204 11:08:30.800858234 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3685005Z 
2025-12-04T11:20:45.3685670Z [W1204 11:08:30.804933634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3685678Z 
2025-12-04T11:20:45.3686221Z [W1204 11:08:30.805606551 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3686226Z 
2025-12-04T11:20:45.3686825Z [W1204 11:08:30.805804789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3686831Z 
2025-12-04T11:20:45.3687436Z [W1204 11:08:30.811985885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3687444Z 
2025-12-04T11:20:45.3688017Z [W1204 11:08:30.812657803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3688022Z 
2025-12-04T11:20:45.3688612Z [W1204 11:08:30.812852916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3688647Z 
2025-12-04T11:20:45.3689208Z [W1204 11:08:30.901570427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3689261Z 
2025-12-04T11:20:45.3689802Z [W1204 11:08:30.902319220 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3689807Z 
2025-12-04T11:20:45.3690359Z [W1204 11:08:30.902527550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3690392Z 
2025-12-04T11:20:45.3690997Z [W1204 11:08:30.906516533 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3691002Z 
2025-12-04T11:20:45.3691524Z [W1204 11:08:30.907168882 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3691532Z 
2025-12-04T11:20:45.3692176Z [W1204 11:08:30.907369815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3692182Z 
2025-12-04T11:20:45.3692726Z [W1204 11:08:30.913520642 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3692731Z 
2025-12-04T11:20:45.3693328Z [W1204 11:08:30.914374260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3693333Z 
2025-12-04T11:20:45.3693890Z [W1204 11:08:30.914574983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3693896Z 
2025-12-04T11:20:45.3694061Z FAILED [0.8960s] [100%]
2025-12-04T11:20:45.3694066Z 
2025-12-04T11:20:45.3694282Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.3694856Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3695067Z Traceback (most recent call last):
2025-12-04T11:20:45.3695620Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3695946Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3696519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3696771Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3697413Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3697658Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3697925Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3697984Z 
2025-12-04T11:20:45.3698131Z Expected 1 but got 2.
2025-12-04T11:20:45.3698254Z Absolute difference: 1
2025-12-04T11:20:45.3698507Z Relative difference: 1.0
2025-12-04T11:20:45.3698512Z 
2025-12-04T11:20:45.3698767Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3699720Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3699788Z 
2025-12-04T11:20:45.3700170Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3700438Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3700660Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3701250Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.3701565Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3701754Z graph_break []
2025-12-04T11:20:45.3702015Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3703283Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3703482Z   if out == self.unknown_value:
2025-12-04T11:20:45.3704273Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3704498Z   warnings.warn(
2025-12-04T11:20:45.3705254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3705448Z   warnings.warn(
2025-12-04T11:20:45.3705981Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3706184Z Traceback (most recent call last):
2025-12-04T11:20:45.3706877Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3707147Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3707700Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3707907Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3718594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3718925Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3719098Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3719106Z 
2025-12-04T11:20:45.3719220Z Expected 1 but got 2.
2025-12-04T11:20:45.3719328Z Absolute difference: 1
2025-12-04T11:20:45.3719455Z Relative difference: 1.0
2025-12-04T11:20:45.3719460Z 
2025-12-04T11:20:45.3719685Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3720625Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3720647Z 
2025-12-04T11:20:45.3720924Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3721151Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3721283Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3721815Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.3722049Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3722166Z graph_break []
2025-12-04T11:20:45.3722386Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3723613Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3723903Z   if out == self.unknown_value:
2025-12-04T11:20:45.3724639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3724759Z   warnings.warn(
2025-12-04T11:20:45.3725482Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3725638Z   warnings.warn(
2025-12-04T11:20:45.3725862Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3725978Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3726223Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3726757Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.3726863Z graph_break []
2025-12-04T11:20:45.3727097Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3727866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3727982Z   warnings.warn(
2025-12-04T11:20:45.3728702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3728807Z   warnings.warn(
2025-12-04T11:20:45.3728969Z =================================== FAILURES ===================================
2025-12-04T11:20:45.3729485Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.3729626Z Traceback (most recent call last):
2025-12-04T11:20:45.3730145Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3730381Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3730852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3731016Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3731553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3731770Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3731905Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3731910Z 
2025-12-04T11:20:45.3732031Z Expected 1 but got 2.
2025-12-04T11:20:45.3732140Z Absolute difference: 1
2025-12-04T11:20:45.3732252Z Relative difference: 1.0
2025-12-04T11:20:45.3732257Z 
2025-12-04T11:20:45.3732488Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3733410Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3733420Z 
2025-12-04T11:20:45.3733702Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3733924Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3734044Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3734592Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.3734825Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3734927Z graph_break []
2025-12-04T11:20:45.3735162Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3736548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3736692Z   if out == self.unknown_value:
2025-12-04T11:20:45.3737423Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3737563Z   warnings.warn(
2025-12-04T11:20:45.3738299Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3738402Z   warnings.warn(
2025-12-04T11:20:45.3738635Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3738753Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3738989Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3739567Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.3739668Z graph_break []
2025-12-04T11:20:45.3739885Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3740632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3740739Z   warnings.warn(
2025-12-04T11:20:45.3741470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3741573Z   warnings.warn(
2025-12-04T11:20:45.3741787Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3741914Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3742146Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3742680Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.3742796Z graph_break []
2025-12-04T11:20:45.3743012Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3743746Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3743850Z   warnings.warn(
2025-12-04T11:20:45.3744568Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3744683Z   warnings.warn(
2025-12-04T11:20:45.3745526Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-41f0f199b083e6d2.xml -
2025-12-04T11:20:45.3745716Z =========================== short test summary info ============================
2025-12-04T11:20:45.3746673Z FAILED [0.8960s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3746682Z 
2025-12-04T11:20:45.3746789Z Expected 1 but got 2.
2025-12-04T11:20:45.3746915Z Absolute difference: 1
2025-12-04T11:20:45.3747027Z Relative difference: 1.0
2025-12-04T11:20:45.3747033Z 
2025-12-04T11:20:45.3747265Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3748182Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3748248Z 
2025-12-04T11:20:45.3748521Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3748718Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.3748919Z ================== 1 failed, 13 deselected, 2 rerun in 21.95s ==================
2025-12-04T11:20:45.3749034Z Got exit code 1
2025-12-04T11:20:45.3749897Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.3750310Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:20:45.3750770Z W1204 11:08:42.220000 89736 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.3751432Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-438f9d52209526cc.xml
2025-12-04T11:20:45.3751644Z ============================= test session starts ==============================
2025-12-04T11:20:45.3751998Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.3752109Z cachedir: .pytest_cache
2025-12-04T11:20:45.3752644Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.3752774Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.3752883Z configfile: pytest.ini
2025-12-04T11:20:45.3753437Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.3753655Z collecting ... collected 58 items / 6 deselected / 52 selected
2025-12-04T11:20:45.3753819Z stepcurrent: skipping 6 already run items.
2025-12-04T11:20:45.3753935Z Running 8 items in this shard
2025-12-04T11:20:45.3753946Z 
2025-12-04T11:20:45.3754804Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.8908s] [ 12%]
2025-12-04T11:20:45.3755669Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4674s] [ 12%]
2025-12-04T11:20:45.3756440Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.4687s] [ 12%]
2025-12-04T11:20:45.3756446Z 
2025-12-04T11:20:45.3756601Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.3757104Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.3757234Z Traceback (most recent call last):
2025-12-04T11:20:45.3757756Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3757993Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3758469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3758641Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3759178Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3759396Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3759533Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3759538Z 
2025-12-04T11:20:45.3759663Z Expected 1 but got 2.
2025-12-04T11:20:45.3759773Z Absolute difference: 1
2025-12-04T11:20:45.3759886Z Relative difference: 1.0
2025-12-04T11:20:45.3759891Z 
2025-12-04T11:20:45.3760190Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3761094Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3761100Z 
2025-12-04T11:20:45.3761382Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3761635Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3761754Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3762658Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3762886Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3762990Z graph_break []
2025-12-04T11:20:45.3763223Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3763991Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3764107Z   warnings.warn(
2025-12-04T11:20:45.3764827Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3764935Z   warnings.warn(
2025-12-04T11:20:45.3765451Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.3765575Z Traceback (most recent call last):
2025-12-04T11:20:45.3766094Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3766334Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3766797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3766975Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3767515Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3767724Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3767871Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3767876Z 
2025-12-04T11:20:45.3767984Z Expected 1 but got 2.
2025-12-04T11:20:45.3768112Z Absolute difference: 1
2025-12-04T11:20:45.3768223Z Relative difference: 1.0
2025-12-04T11:20:45.3768228Z 
2025-12-04T11:20:45.3768447Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3769375Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3769384Z 
2025-12-04T11:20:45.3769653Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3769892Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3770016Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3770906Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3771480Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3771595Z graph_break []
2025-12-04T11:20:45.3771816Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3772715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3772828Z   warnings.warn(
2025-12-04T11:20:45.3773569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3773672Z   warnings.warn(
2025-12-04T11:20:45.3773938Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3774071Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3774301Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3775201Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3775301Z graph_break []
2025-12-04T11:20:45.3775525Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3776377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3776481Z   warnings.warn(
2025-12-04T11:20:45.3777199Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3777318Z   warnings.warn(
2025-12-04T11:20:45.3777466Z =================================== FAILURES ===================================
2025-12-04T11:20:45.3777987Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.3778120Z Traceback (most recent call last):
2025-12-04T11:20:45.3778632Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3778884Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3779346Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3779526Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3780062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3780272Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3780420Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3780425Z 
2025-12-04T11:20:45.3780533Z Expected 1 but got 2.
2025-12-04T11:20:45.3780643Z Absolute difference: 1
2025-12-04T11:20:45.3780769Z Relative difference: 1.0
2025-12-04T11:20:45.3780774Z 
2025-12-04T11:20:45.3780989Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3781912Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3781920Z 
2025-12-04T11:20:45.3782190Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3782411Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3782545Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3783433Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3783676Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3783775Z graph_break []
2025-12-04T11:20:45.3783993Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3784809Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3784915Z   warnings.warn(
2025-12-04T11:20:45.3785649Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3785780Z   warnings.warn(
2025-12-04T11:20:45.3785999Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3786132Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3786364Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3787249Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3787364Z graph_break []
2025-12-04T11:20:45.3787588Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3788355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3788456Z   warnings.warn(
2025-12-04T11:20:45.3789171Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3789289Z   warnings.warn(
2025-12-04T11:20:45.3789505Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3789624Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3789866Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3790758Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3790873Z graph_break []
2025-12-04T11:20:45.3791091Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3791815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3791934Z   warnings.warn(
2025-12-04T11:20:45.3792655Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3792772Z   warnings.warn(
2025-12-04T11:20:45.3793607Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-438f9d52209526cc.xml -
2025-12-04T11:20:45.3793784Z =========================== short test summary info ============================
2025-12-04T11:20:45.3794744Z FAILED [0.4687s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3794753Z 
2025-12-04T11:20:45.3794860Z Expected 1 but got 2.
2025-12-04T11:20:45.3794983Z Absolute difference: 1
2025-12-04T11:20:45.3795097Z Relative difference: 1.0
2025-12-04T11:20:45.3795102Z 
2025-12-04T11:20:45.3795319Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3796230Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3796236Z 
2025-12-04T11:20:45.3796503Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3796764Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.3796964Z =================== 1 failed, 6 deselected, 2 rerun in 4.86s ===================
2025-12-04T11:20:45.3797066Z Got exit code 1
2025-12-04T11:20:45.3797186Z Retrying single test...
2025-12-04T11:20:45.3797632Z W1204 11:09:02.718000 89905 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.3798296Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98df9406c6e0faf3.xml
2025-12-04T11:20:45.3798506Z ============================= test session starts ==============================
2025-12-04T11:20:45.3798856Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.3798982Z cachedir: .pytest_cache
2025-12-04T11:20:45.3799506Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.3799639Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.3799800Z configfile: pytest.ini
2025-12-04T11:20:45.3800342Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.3800578Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.3801557Z stepcurrent: skipping 6 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3801678Z Running 1 items in this shard
2025-12-04T11:20:45.3801683Z 
2025-12-04T11:20:45.3802967Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:09:08.632398257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3802978Z 
2025-12-04T11:20:45.3803500Z [W1204 11:09:24.522037128 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3803508Z 
2025-12-04T11:20:45.3804039Z [W1204 11:09:24.522301560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3804046Z 
2025-12-04T11:20:45.3804556Z [W1204 11:09:24.529822132 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3804562Z 
2025-12-04T11:20:45.3805087Z [W1204 11:09:24.530634636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3805092Z 
2025-12-04T11:20:45.3805599Z [W1204 11:09:24.530843137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3805604Z 
2025-12-04T11:20:45.3806127Z [W1204 11:09:24.538004044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3806134Z 
2025-12-04T11:20:45.3806641Z [W1204 11:09:24.538870586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3806645Z 
2025-12-04T11:20:45.3807156Z [W1204 11:09:24.539067005 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3807174Z 
2025-12-04T11:20:45.3807684Z [W1204 11:09:24.678692615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3807689Z 
2025-12-04T11:20:45.3808195Z [W1204 11:09:24.680545050 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3808199Z 
2025-12-04T11:20:45.3808778Z [W1204 11:09:24.680775509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3808786Z 
2025-12-04T11:20:45.3809293Z [W1204 11:09:24.684866570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3809298Z 
2025-12-04T11:20:45.3809819Z [W1204 11:09:24.685542987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3809859Z 
2025-12-04T11:20:45.3810366Z [W1204 11:09:24.685743610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3810371Z 
2025-12-04T11:20:45.3810886Z [W1204 11:09:24.691988774 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3810890Z 
2025-12-04T11:20:45.3811405Z [W1204 11:09:24.692669621 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3811455Z 
2025-12-04T11:20:45.3811977Z [W1204 11:09:24.692869115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3811982Z 
2025-12-04T11:20:45.3812115Z ('RERUN', {'yellow': True}) [19.8058s] [100%]
2025-12-04T11:20:45.3813383Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:09:24.099325879 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3813389Z 
2025-12-04T11:20:45.3813913Z [W1204 11:09:24.100114023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3813917Z 
2025-12-04T11:20:45.3814428Z [W1204 11:09:24.100326655 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3814435Z 
2025-12-04T11:20:45.3814957Z [W1204 11:09:24.104376219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3814962Z 
2025-12-04T11:20:45.3815469Z [W1204 11:09:24.105017510 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3815477Z 
2025-12-04T11:20:45.3816002Z [W1204 11:09:24.105210957 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3816008Z 
2025-12-04T11:20:45.3816594Z [W1204 11:09:24.111434917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3816600Z 
2025-12-04T11:20:45.3817127Z [W1204 11:09:24.112066514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3817134Z 
2025-12-04T11:20:45.3817646Z [W1204 11:09:24.112256475 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3817651Z 
2025-12-04T11:20:45.3818168Z [W1204 11:09:24.200773142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3818191Z 
2025-12-04T11:20:45.3818700Z [W1204 11:09:24.201563106 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3818705Z 
2025-12-04T11:20:45.3819214Z [W1204 11:09:24.201775354 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3819218Z 
2025-12-04T11:20:45.3819812Z [W1204 11:09:24.205792632 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3819820Z 
2025-12-04T11:20:45.3820327Z [W1204 11:09:24.206450292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3820332Z 
2025-12-04T11:20:45.3820853Z [W1204 11:09:24.206645326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3820888Z 
2025-12-04T11:20:45.3821399Z [W1204 11:09:24.212802721 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3821404Z 
2025-12-04T11:20:45.3821927Z [W1204 11:09:24.213647513 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3821931Z 
2025-12-04T11:20:45.3822446Z [W1204 11:09:24.213846026 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3822451Z 
2025-12-04T11:20:45.3822631Z ('RERUN', {'yellow': True}) [0.4819s] [100%]
2025-12-04T11:20:45.3823899Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:09:25.557866919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3823908Z 
2025-12-04T11:20:45.3824420Z [W1204 11:09:25.558592953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3824441Z 
2025-12-04T11:20:45.3824955Z [W1204 11:09:25.558793269 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3824960Z 
2025-12-04T11:20:45.3825473Z [W1204 11:09:25.562866254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3825481Z 
2025-12-04T11:20:45.3826002Z [W1204 11:09:25.563500657 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3826007Z 
2025-12-04T11:20:45.3826514Z [W1204 11:09:25.563693482 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3826521Z 
2025-12-04T11:20:45.3827041Z [W1204 11:09:25.569831369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3827046Z 
2025-12-04T11:20:45.3827555Z [W1204 11:09:25.570534864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3827560Z 
2025-12-04T11:20:45.3828085Z [W1204 11:09:25.570729093 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3828091Z 
2025-12-04T11:20:45.3828599Z [W1204 11:09:25.660974806 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3828603Z 
2025-12-04T11:20:45.3829123Z [W1204 11:09:25.661768144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3829130Z 
2025-12-04T11:20:45.3829640Z [W1204 11:09:25.661985667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3829645Z 
2025-12-04T11:20:45.3830149Z [W1204 11:09:25.666027162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3830154Z 
2025-12-04T11:20:45.3830673Z [W1204 11:09:25.666709729 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3830744Z 
2025-12-04T11:20:45.3831254Z [W1204 11:09:25.666918706 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3831261Z 
2025-12-04T11:20:45.3831784Z [W1204 11:09:25.673107486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3831818Z 
2025-12-04T11:20:45.3832324Z [W1204 11:09:25.673969783 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3832328Z 
2025-12-04T11:20:45.3832847Z [W1204 11:09:25.674170448 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3832851Z 
2025-12-04T11:20:45.3832956Z FAILED [0.4573s] [100%]
2025-12-04T11:20:45.3832961Z 
2025-12-04T11:20:45.3833115Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.3833622Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.3833775Z Traceback (most recent call last):
2025-12-04T11:20:45.3834299Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3834535Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3835001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3835177Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3835713Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3835932Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3836065Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3836071Z 
2025-12-04T11:20:45.3836181Z Expected 1 but got 2.
2025-12-04T11:20:45.3836304Z Absolute difference: 1
2025-12-04T11:20:45.3836416Z Relative difference: 1.0
2025-12-04T11:20:45.3836420Z 
2025-12-04T11:20:45.3836636Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3837545Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3837554Z 
2025-12-04T11:20:45.3837824Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3838057Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3838175Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3839071Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3839313Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3839412Z graph_break []
2025-12-04T11:20:45.3839643Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3840851Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3840973Z   if out == self.unknown_value:
2025-12-04T11:20:45.3841712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3841814Z   warnings.warn(
2025-12-04T11:20:45.3842605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3842713Z   warnings.warn(
2025-12-04T11:20:45.3843220Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.3843360Z Traceback (most recent call last):
2025-12-04T11:20:45.3843865Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3844146Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3844606Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3844773Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3845325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3845538Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3845673Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3845707Z 
2025-12-04T11:20:45.3845828Z Expected 1 but got 2.
2025-12-04T11:20:45.3845936Z Absolute difference: 1
2025-12-04T11:20:45.3846060Z Relative difference: 1.0
2025-12-04T11:20:45.3846065Z 
2025-12-04T11:20:45.3846282Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3847189Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3847195Z 
2025-12-04T11:20:45.3847478Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3847697Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3847828Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3848716Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3848946Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3849058Z graph_break []
2025-12-04T11:20:45.3849275Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3850502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3850620Z   if out == self.unknown_value:
2025-12-04T11:20:45.3851347Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3851464Z   warnings.warn(
2025-12-04T11:20:45.3852190Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3852296Z   warnings.warn(
2025-12-04T11:20:45.3852528Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3852647Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3852893Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3853778Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3853878Z graph_break []
2025-12-04T11:20:45.3854108Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3854899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3855019Z   warnings.warn(
2025-12-04T11:20:45.3855738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3855840Z   warnings.warn(
2025-12-04T11:20:45.3856003Z =================================== FAILURES ===================================
2025-12-04T11:20:45.3856640Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.3856794Z Traceback (most recent call last):
2025-12-04T11:20:45.3857310Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3857544Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3858024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3858844Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3859384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3859608Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3859748Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3859754Z 
2025-12-04T11:20:45.3859877Z Expected 1 but got 2.
2025-12-04T11:20:45.3859988Z Absolute difference: 1
2025-12-04T11:20:45.3860102Z Relative difference: 1.0
2025-12-04T11:20:45.3860107Z 
2025-12-04T11:20:45.3860339Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3861239Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3861250Z 
2025-12-04T11:20:45.3861535Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3861761Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3861880Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3862787Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3863021Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3863135Z graph_break []
2025-12-04T11:20:45.3863355Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3864566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3864704Z   if out == self.unknown_value:
2025-12-04T11:20:45.3865429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3865534Z   warnings.warn(
2025-12-04T11:20:45.3866266Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3866373Z   warnings.warn(
2025-12-04T11:20:45.3866608Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3866726Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3866956Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3867942Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3868045Z graph_break []
2025-12-04T11:20:45.3868276Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3868999Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3869174Z   warnings.warn(
2025-12-04T11:20:45.3869905Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3870007Z   warnings.warn(
2025-12-04T11:20:45.3870222Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3870352Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3870580Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3871822Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3872092Z graph_break []
2025-12-04T11:20:45.3872310Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3873053Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3873156Z   warnings.warn(
2025-12-04T11:20:45.3873884Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3873985Z   warnings.warn(
2025-12-04T11:20:45.3874826Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98df9406c6e0faf3.xml -
2025-12-04T11:20:45.3875016Z =========================== short test summary info ============================
2025-12-04T11:20:45.3875950Z FAILED [0.4573s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3875958Z 
2025-12-04T11:20:45.3876081Z Expected 1 but got 2.
2025-12-04T11:20:45.3876188Z Absolute difference: 1
2025-12-04T11:20:45.3876301Z Relative difference: 1.0
2025-12-04T11:20:45.3876307Z 
2025-12-04T11:20:45.3876541Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3877441Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3877447Z 
2025-12-04T11:20:45.3877734Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3877916Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.3878115Z ================== 1 failed, 13 deselected, 2 rerun in 20.78s ==================
2025-12-04T11:20:45.3878231Z Got exit code 1
2025-12-04T11:20:45.3878341Z Retrying single test...
2025-12-04T11:20:45.3878790Z W1204 11:09:36.960000 90079 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.3879467Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f706cf73cc88a5b8.xml
2025-12-04T11:20:45.3879633Z ============================= test session starts ==============================
2025-12-04T11:20:45.3879998Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.3880108Z cachedir: .pytest_cache
2025-12-04T11:20:45.3880732Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.3880877Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.3880990Z configfile: pytest.ini
2025-12-04T11:20:45.3881534Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.3881816Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.3882794Z stepcurrent: skipping 6 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3882925Z Running 1 items in this shard
2025-12-04T11:20:45.3882930Z 
2025-12-04T11:20:45.3884198Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:09:42.817963690 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3884235Z 
2025-12-04T11:20:45.3884768Z [W1204 11:09:58.589273741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3884775Z 
2025-12-04T11:20:45.3885289Z [W1204 11:09:58.589530563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3885294Z 
2025-12-04T11:20:45.3885817Z [W1204 11:09:58.596767050 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3885822Z 
2025-12-04T11:20:45.3886331Z [W1204 11:09:58.597478187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3886337Z 
2025-12-04T11:20:45.3886849Z [W1204 11:09:58.597667138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3886868Z 
2025-12-04T11:20:45.3887377Z [W1204 11:09:58.604592264 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3887382Z 
2025-12-04T11:20:45.3887891Z [W1204 11:09:58.605395222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3887896Z 
2025-12-04T11:20:45.3888417Z [W1204 11:09:58.605584410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3888422Z 
2025-12-04T11:20:45.3888927Z [W1204 11:09:58.737322492 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3888932Z 
2025-12-04T11:20:45.3889454Z [W1204 11:09:58.739102308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3889462Z 
2025-12-04T11:20:45.3889968Z [W1204 11:09:58.739313618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3889973Z 
2025-12-04T11:20:45.3890490Z [W1204 11:09:58.743320274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3890497Z 
2025-12-04T11:20:45.3891006Z [W1204 11:09:58.744004994 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3891011Z 
2025-12-04T11:20:45.3891529Z [W1204 11:09:58.744197384 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3891534Z 
2025-12-04T11:20:45.3892104Z [W1204 11:09:58.750287548 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3892112Z 
2025-12-04T11:20:45.3892619Z [W1204 11:09:58.750920945 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3892624Z 
2025-12-04T11:20:45.3893140Z [W1204 11:09:58.751112652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3893175Z 
2025-12-04T11:20:45.3893310Z ('RERUN', {'yellow': True}) [19.6417s] [100%]
2025-12-04T11:20:45.3894584Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:09:58.147624160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3894590Z 
2025-12-04T11:20:45.3895108Z [W1204 11:09:58.148368731 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3895142Z 
2025-12-04T11:20:45.3895669Z [W1204 11:09:58.148596740 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3895675Z 
2025-12-04T11:20:45.3896183Z [W1204 11:09:58.152644818 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3896190Z 
2025-12-04T11:20:45.3896786Z [W1204 11:09:58.153281573 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3896792Z 
2025-12-04T11:20:45.3897304Z [W1204 11:09:58.153472488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3897309Z 
2025-12-04T11:20:45.3897826Z [W1204 11:09:58.159508290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3897848Z 
2025-12-04T11:20:45.3898357Z [W1204 11:09:58.160188026 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3898362Z 
2025-12-04T11:20:45.3898873Z [W1204 11:09:58.160375446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3898880Z 
2025-12-04T11:20:45.3899404Z [W1204 11:09:58.247722610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3899409Z 
2025-12-04T11:20:45.3899916Z [W1204 11:09:58.248476474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3899921Z 
2025-12-04T11:20:45.3900449Z [W1204 11:09:58.248717039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3900456Z 
2025-12-04T11:20:45.3900966Z [W1204 11:09:58.252625098 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3900971Z 
2025-12-04T11:20:45.3901494Z [W1204 11:09:58.253287563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3901501Z 
2025-12-04T11:20:45.3902011Z [W1204 11:09:58.253483574 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3902016Z 
2025-12-04T11:20:45.3902531Z [W1204 11:09:58.259437688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3902535Z 
2025-12-04T11:20:45.3903125Z [W1204 11:09:58.260259849 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3903131Z 
2025-12-04T11:20:45.3903640Z [W1204 11:09:58.260455271 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3903658Z 
2025-12-04T11:20:45.3903790Z ('RERUN', {'yellow': True}) [0.4701s] [100%]
2025-12-04T11:20:45.3905053Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:09:59.588684942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3905091Z 
2025-12-04T11:20:45.3905611Z [W1204 11:09:59.589410498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3905616Z 
2025-12-04T11:20:45.3906132Z [W1204 11:09:59.589607469 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3906138Z 
2025-12-04T11:20:45.3906696Z [W1204 11:09:59.593556166 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3906702Z 
2025-12-04T11:20:45.3907211Z [W1204 11:09:59.594191774 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3907219Z 
2025-12-04T11:20:45.3907739Z [W1204 11:09:59.594378065 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3907743Z 
2025-12-04T11:20:45.3908252Z [W1204 11:09:59.600336498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3908257Z 
2025-12-04T11:20:45.3908763Z [W1204 11:09:59.600984188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3908785Z 
2025-12-04T11:20:45.3909289Z [W1204 11:09:59.601171953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3909296Z 
2025-12-04T11:20:45.3909799Z [W1204 11:09:59.687311302 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3909806Z 
2025-12-04T11:20:45.3910323Z [W1204 11:09:59.688080550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3910328Z 
2025-12-04T11:20:45.3910834Z [W1204 11:09:59.688288618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3910839Z 
2025-12-04T11:20:45.3911359Z [W1204 11:09:59.692247355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3911364Z 
2025-12-04T11:20:45.3911875Z [W1204 11:09:59.692950519 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3911883Z 
2025-12-04T11:20:45.3912405Z [W1204 11:09:59.693151974 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3912410Z 
2025-12-04T11:20:45.3912921Z [W1204 11:09:59.699168630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3912926Z 
2025-12-04T11:20:45.3913447Z [W1204 11:09:59.700040014 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3913452Z 
2025-12-04T11:20:45.3913957Z [W1204 11:09:59.700243755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.3913962Z 
2025-12-04T11:20:45.3914125Z FAILED [0.4386s] [100%]
2025-12-04T11:20:45.3914131Z 
2025-12-04T11:20:45.3914291Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.3914790Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.3914929Z Traceback (most recent call last):
2025-12-04T11:20:45.3915440Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3915703Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3916181Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3916349Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3916895Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3917106Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3917300Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3917306Z 
2025-12-04T11:20:45.3917424Z Expected 1 but got 2.
2025-12-04T11:20:45.3917532Z Absolute difference: 1
2025-12-04T11:20:45.3917642Z Relative difference: 1.0
2025-12-04T11:20:45.3917646Z 
2025-12-04T11:20:45.3917876Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3918780Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3918787Z 
2025-12-04T11:20:45.3919070Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3919289Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3919405Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3920308Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3920538Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3920650Z graph_break []
2025-12-04T11:20:45.3920869Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3922074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3922209Z   if out == self.unknown_value:
2025-12-04T11:20:45.3922931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3923052Z   warnings.warn(
2025-12-04T11:20:45.3923780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3923885Z   warnings.warn(
2025-12-04T11:20:45.3924404Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.3924533Z Traceback (most recent call last):
2025-12-04T11:20:45.3925045Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3925293Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3925751Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3925927Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3926527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3926737Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3926885Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3926890Z 
2025-12-04T11:20:45.3926998Z Expected 1 but got 2.
2025-12-04T11:20:45.3927123Z Absolute difference: 1
2025-12-04T11:20:45.3927268Z Relative difference: 1.0
2025-12-04T11:20:45.3927273Z 
2025-12-04T11:20:45.3927489Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3928395Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3928400Z 
2025-12-04T11:20:45.3928670Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3928904Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3929022Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3929941Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3930182Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3930285Z graph_break []
2025-12-04T11:20:45.3930503Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3931720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3931839Z   if out == self.unknown_value:
2025-12-04T11:20:45.3932582Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3932688Z   warnings.warn(
2025-12-04T11:20:45.3933405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3933520Z   warnings.warn(
2025-12-04T11:20:45.3933741Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3933872Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3934101Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3934989Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3935106Z graph_break []
2025-12-04T11:20:45.3935328Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3936068Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3936172Z   warnings.warn(
2025-12-04T11:20:45.3936995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3937116Z   warnings.warn(
2025-12-04T11:20:45.3937267Z =================================== FAILURES ===================================
2025-12-04T11:20:45.3937772Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.3937913Z Traceback (most recent call last):
2025-12-04T11:20:45.3938422Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3938737Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3939202Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3939368Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3939919Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3940161Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3940298Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3940318Z 
2025-12-04T11:20:45.3940427Z Expected 1 but got 2.
2025-12-04T11:20:45.3940538Z Absolute difference: 1
2025-12-04T11:20:45.3940663Z Relative difference: 1.0
2025-12-04T11:20:45.3940668Z 
2025-12-04T11:20:45.3940885Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3941793Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3941829Z 
2025-12-04T11:20:45.3942113Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3942336Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3942471Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3943357Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3943585Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3943700Z graph_break []
2025-12-04T11:20:45.3943914Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3945138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.3945259Z   if out == self.unknown_value:
2025-12-04T11:20:45.3945983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3946104Z   warnings.warn(
2025-12-04T11:20:45.3946820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3946937Z   warnings.warn(
2025-12-04T11:20:45.3947155Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3947272Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3947519Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3948409Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3948512Z graph_break []
2025-12-04T11:20:45.3948740Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3949465Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3949580Z   warnings.warn(
2025-12-04T11:20:45.3950293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3950394Z   warnings.warn(
2025-12-04T11:20:45.3950700Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3950820Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3951050Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3951943Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3952076Z graph_break []
2025-12-04T11:20:45.3952305Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3953031Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3953135Z   warnings.warn(
2025-12-04T11:20:45.3953869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3953974Z   warnings.warn(
2025-12-04T11:20:45.3954859Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f706cf73cc88a5b8.xml -
2025-12-04T11:20:45.3955040Z =========================== short test summary info ============================
2025-12-04T11:20:45.3955972Z FAILED [0.4386s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3955993Z 
2025-12-04T11:20:45.3956102Z Expected 1 but got 2.
2025-12-04T11:20:45.3956214Z Absolute difference: 1
2025-12-04T11:20:45.3956340Z Relative difference: 1.0
2025-12-04T11:20:45.3956345Z 
2025-12-04T11:20:45.3956561Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3957460Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3957468Z 
2025-12-04T11:20:45.3957750Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3957932Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.3958144Z ================== 1 failed, 13 deselected, 2 rerun in 20.58s ==================
2025-12-04T11:20:45.3958251Z Got exit code 1
2025-12-04T11:20:45.3959064Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3959490Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:20:45.3959934Z W1204 11:10:10.810000 90253 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.3960614Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c67f05de6c39b0d8.xml
2025-12-04T11:20:45.3960784Z ============================= test session starts ==============================
2025-12-04T11:20:45.3961134Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.3961261Z cachedir: .pytest_cache
2025-12-04T11:20:45.3961781Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.3961909Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.3962031Z configfile: pytest.ini
2025-12-04T11:20:45.3962568Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.3962794Z collecting ... collected 58 items / 7 deselected / 51 selected
2025-12-04T11:20:45.3963001Z stepcurrent: skipping 7 already run items.
2025-12-04T11:20:45.3963121Z Running 7 items in this shard
2025-12-04T11:20:45.3963126Z 
2025-12-04T11:20:45.3964005Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.8417s] [ 14%]
2025-12-04T11:20:45.3964891Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4582s] [ 14%]
2025-12-04T11:20:45.3965675Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.4535s] [ 14%]
2025-12-04T11:20:45.3965681Z 
2025-12-04T11:20:45.3965825Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.3966332Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.3966500Z Traceback (most recent call last):
2025-12-04T11:20:45.3967014Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3967260Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3967724Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3967890Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3968440Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3968645Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3968796Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3968802Z 
2025-12-04T11:20:45.3968909Z Expected 1 but got 2.
2025-12-04T11:20:45.3969022Z Absolute difference: 1
2025-12-04T11:20:45.3969149Z Relative difference: 1.0
2025-12-04T11:20:45.3969154Z 
2025-12-04T11:20:45.3969369Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3970272Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3970294Z 
2025-12-04T11:20:45.3970562Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3970781Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3970912Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3972191Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3972424Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3972543Z graph_break []
2025-12-04T11:20:45.3972763Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3973511Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3973617Z   warnings.warn(
2025-12-04T11:20:45.3974338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3974453Z   warnings.warn(
2025-12-04T11:20:45.3974956Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.3975098Z Traceback (most recent call last):
2025-12-04T11:20:45.3975741Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3975979Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3976526Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3976692Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3977278Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3977500Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3977634Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3977639Z 
2025-12-04T11:20:45.3977759Z Expected 1 but got 2.
2025-12-04T11:20:45.3977866Z Absolute difference: 1
2025-12-04T11:20:45.3977977Z Relative difference: 1.0
2025-12-04T11:20:45.3977982Z 
2025-12-04T11:20:45.3978212Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3979113Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3979166Z 
2025-12-04T11:20:45.3979448Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3979673Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3979790Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3980689Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3980916Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3981030Z graph_break []
2025-12-04T11:20:45.3981254Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3981987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3982105Z   warnings.warn(
2025-12-04T11:20:45.3982824Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3982928Z   warnings.warn(
2025-12-04T11:20:45.3983158Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3983278Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3983519Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3984406Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3984511Z graph_break []
2025-12-04T11:20:45.3984739Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3985464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3985580Z   warnings.warn(
2025-12-04T11:20:45.3986296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3986397Z   warnings.warn(
2025-12-04T11:20:45.3986557Z =================================== FAILURES ===================================
2025-12-04T11:20:45.3987062Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.3987186Z Traceback (most recent call last):
2025-12-04T11:20:45.3987782Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.3988019Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.3988493Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.3988660Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.3989224Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.3989446Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.3989584Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.3989589Z 
2025-12-04T11:20:45.3989709Z Expected 1 but got 2.
2025-12-04T11:20:45.3989815Z Absolute difference: 1
2025-12-04T11:20:45.3989926Z Relative difference: 1.0
2025-12-04T11:20:45.3989931Z 
2025-12-04T11:20:45.3990162Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.3991113Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.3991119Z 
2025-12-04T11:20:45.3991393Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.3991629Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3991746Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3992647Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3992874Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3992973Z graph_break []
2025-12-04T11:20:45.3993207Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3993940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3994056Z   warnings.warn(
2025-12-04T11:20:45.3994773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3994878Z   warnings.warn(
2025-12-04T11:20:45.3995107Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3995224Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3995450Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3996359Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.3996461Z graph_break []
2025-12-04T11:20:45.3996689Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.3997410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3997512Z   warnings.warn(
2025-12-04T11:20:45.3998251Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.3998353Z   warnings.warn(
2025-12-04T11:20:45.3998581Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.3998700Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.3998929Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.3999926Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4000031Z graph_break []
2025-12-04T11:20:45.4000249Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4000989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4001126Z   warnings.warn(
2025-12-04T11:20:45.4001862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4001965Z   warnings.warn(
2025-12-04T11:20:45.4002807Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c67f05de6c39b0d8.xml -
2025-12-04T11:20:45.4003028Z =========================== short test summary info ============================
2025-12-04T11:20:45.4003970Z FAILED [0.4535s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4003978Z 
2025-12-04T11:20:45.4004101Z Expected 1 but got 2.
2025-12-04T11:20:45.4004213Z Absolute difference: 1
2025-12-04T11:20:45.4004327Z Relative difference: 1.0
2025-12-04T11:20:45.4004332Z 
2025-12-04T11:20:45.4004568Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4005470Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4005476Z 
2025-12-04T11:20:45.4005761Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4005947Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.4006148Z =================== 1 failed, 7 deselected, 2 rerun in 4.78s ===================
2025-12-04T11:20:45.4006269Z Got exit code 1
2025-12-04T11:20:45.4006381Z Retrying single test...
2025-12-04T11:20:45.4006827Z W1204 11:10:31.085000 90429 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.4007511Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0e30a339afee7d22.xml
2025-12-04T11:20:45.4007679Z ============================= test session starts ==============================
2025-12-04T11:20:45.4008046Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.4008161Z cachedir: .pytest_cache
2025-12-04T11:20:45.4008687Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.4008833Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.4008944Z configfile: pytest.ini
2025-12-04T11:20:45.4009506Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.4009735Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.4010724Z stepcurrent: skipping 7 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4010856Z Running 1 items in this shard
2025-12-04T11:20:45.4010862Z 
2025-12-04T11:20:45.4012197Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:10:36.935468562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4012207Z 
2025-12-04T11:20:45.4012737Z [W1204 11:10:52.692230925 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4012743Z 
2025-12-04T11:20:45.4013294Z [W1204 11:10:52.692493183 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4013299Z 
2025-12-04T11:20:45.4013825Z [W1204 11:10:52.699763058 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4013830Z 
2025-12-04T11:20:45.4014338Z [W1204 11:10:52.700480400 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4014343Z 
2025-12-04T11:20:45.4014870Z [W1204 11:10:52.700712984 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4014905Z 
2025-12-04T11:20:45.4015415Z [W1204 11:10:52.707564883 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4015420Z 
2025-12-04T11:20:45.4015924Z [W1204 11:10:52.708327166 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4015944Z 
2025-12-04T11:20:45.4016525Z [W1204 11:10:52.708509715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4016531Z 
2025-12-04T11:20:45.4017041Z [W1204 11:10:52.846185718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4017046Z 
2025-12-04T11:20:45.4017572Z [W1204 11:10:52.847942650 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4017579Z 
2025-12-04T11:20:45.4018088Z [W1204 11:10:52.848154970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4018093Z 
2025-12-04T11:20:45.4018616Z [W1204 11:10:52.852234769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4018623Z 
2025-12-04T11:20:45.4019131Z [W1204 11:10:52.852929893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4019136Z 
2025-12-04T11:20:45.4019654Z [W1204 11:10:52.853127888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4019659Z 
2025-12-04T11:20:45.4020167Z [W1204 11:10:52.859192631 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4020174Z 
2025-12-04T11:20:45.4020682Z [W1204 11:10:52.859840535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4020701Z 
2025-12-04T11:20:45.4021206Z [W1204 11:10:52.860059104 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4021214Z 
2025-12-04T11:20:45.4021347Z ('RERUN', {'yellow': True}) [19.6302s] [100%]
2025-12-04T11:20:45.4022638Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:10:52.265044445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4022644Z 
2025-12-04T11:20:45.4023217Z [W1204 11:10:52.265793263 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4023225Z 
2025-12-04T11:20:45.4023744Z [W1204 11:10:52.265989075 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4023750Z 
2025-12-04T11:20:45.4024257Z [W1204 11:10:52.270094116 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4024291Z 
2025-12-04T11:20:45.4024813Z [W1204 11:10:52.270733179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4024817Z 
2025-12-04T11:20:45.4025323Z [W1204 11:10:52.270923080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4025328Z 
2025-12-04T11:20:45.4025854Z [W1204 11:10:52.276989448 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4025890Z 
2025-12-04T11:20:45.4026400Z [W1204 11:10:52.277618443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4026405Z 
2025-12-04T11:20:45.4026913Z [W1204 11:10:52.277822356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4026935Z 
2025-12-04T11:20:45.4027448Z [W1204 11:10:52.364469886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4027452Z 
2025-12-04T11:20:45.4027959Z [W1204 11:10:52.365250162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4027964Z 
2025-12-04T11:20:45.4028490Z [W1204 11:10:52.365458275 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4028495Z 
2025-12-04T11:20:45.4029010Z [W1204 11:10:52.369355873 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4029016Z 
2025-12-04T11:20:45.4029536Z [W1204 11:10:52.369992733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4029543Z 
2025-12-04T11:20:45.4030051Z [W1204 11:10:52.370208579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4030056Z 
2025-12-04T11:20:45.4030581Z [W1204 11:10:52.376138353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4030586Z 
2025-12-04T11:20:45.4031097Z [W1204 11:10:52.376956007 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4031105Z 
2025-12-04T11:20:45.4031633Z [W1204 11:10:52.377152691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4031641Z 
2025-12-04T11:20:45.4031775Z ('RERUN', {'yellow': True}) [0.4768s] [100%]
2025-12-04T11:20:45.4033049Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:10:53.709083154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4033058Z 
2025-12-04T11:20:45.4033578Z [W1204 11:10:53.709813503 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4033582Z 
2025-12-04T11:20:45.4034092Z [W1204 11:10:53.710029561 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4034162Z 
2025-12-04T11:20:45.4034682Z [W1204 11:10:53.713984779 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4034689Z 
2025-12-04T11:20:45.4035194Z [W1204 11:10:53.714586977 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4035245Z 
2025-12-04T11:20:45.4035765Z [W1204 11:10:53.714773689 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4035769Z 
2025-12-04T11:20:45.4036276Z [W1204 11:10:53.720886061 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4036281Z 
2025-12-04T11:20:45.4036801Z [W1204 11:10:53.721502405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4036806Z 
2025-12-04T11:20:45.4037321Z [W1204 11:10:53.721692889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4037358Z 
2025-12-04T11:20:45.4037866Z [W1204 11:10:53.808625099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4037882Z 
2025-12-04T11:20:45.4038390Z [W1204 11:10:53.809409982 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4038395Z 
2025-12-04T11:20:45.4038898Z [W1204 11:10:53.809623646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4038903Z 
2025-12-04T11:20:45.4039423Z [W1204 11:10:53.813634380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4039428Z 
2025-12-04T11:20:45.4039940Z [W1204 11:10:53.814362333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4039946Z 
2025-12-04T11:20:45.4040469Z [W1204 11:10:53.814564778 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4040474Z 
2025-12-04T11:20:45.4040981Z [W1204 11:10:53.820725633 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4040988Z 
2025-12-04T11:20:45.4041511Z [W1204 11:10:53.821614939 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4041516Z 
2025-12-04T11:20:45.4042027Z [W1204 11:10:53.821829828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4042032Z 
2025-12-04T11:20:45.4042148Z FAILED [0.4435s] [100%]
2025-12-04T11:20:45.4042156Z 
2025-12-04T11:20:45.4042303Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.4042814Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.4042955Z Traceback (most recent call last):
2025-12-04T11:20:45.4043465Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4043701Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4044182Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4044348Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4044897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4045167Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4045305Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4045311Z 
2025-12-04T11:20:45.4045431Z Expected 1 but got 2.
2025-12-04T11:20:45.4045539Z Absolute difference: 1
2025-12-04T11:20:45.4045651Z Relative difference: 1.0
2025-12-04T11:20:45.4045669Z 
2025-12-04T11:20:45.4045884Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4046819Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4046825Z 
2025-12-04T11:20:45.4047108Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4047329Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4047446Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4048353Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4048612Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4048727Z graph_break []
2025-12-04T11:20:45.4048943Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4050157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4050287Z   if out == self.unknown_value:
2025-12-04T11:20:45.4051007Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4051126Z   warnings.warn(
2025-12-04T11:20:45.4051845Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4051950Z   warnings.warn(
2025-12-04T11:20:45.4052467Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.4052594Z Traceback (most recent call last):
2025-12-04T11:20:45.4053118Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4053349Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4053807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4053982Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4054521Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4054730Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4054873Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4054878Z 
2025-12-04T11:20:45.4054984Z Expected 1 but got 2.
2025-12-04T11:20:45.4055104Z Absolute difference: 1
2025-12-04T11:20:45.4055216Z Relative difference: 1.0
2025-12-04T11:20:45.4055223Z 
2025-12-04T11:20:45.4055442Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4056433Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4056440Z 
2025-12-04T11:20:45.4056709Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4056944Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4057145Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4058032Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4058273Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4058482Z graph_break []
2025-12-04T11:20:45.4058717Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4059917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4060034Z   if out == self.unknown_value:
2025-12-04T11:20:45.4060777Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4060912Z   warnings.warn(
2025-12-04T11:20:45.4061648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4061752Z   warnings.warn(
2025-12-04T11:20:45.4061972Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4062108Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4062339Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4063231Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4063346Z graph_break []
2025-12-04T11:20:45.4063568Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4064308Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4064411Z   warnings.warn(
2025-12-04T11:20:45.4065129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4065248Z   warnings.warn(
2025-12-04T11:20:45.4065396Z =================================== FAILURES ===================================
2025-12-04T11:20:45.4065912Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.4066040Z Traceback (most recent call last):
2025-12-04T11:20:45.4066547Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4066793Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4067254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4067419Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4067972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4068183Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4068334Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4068339Z 
2025-12-04T11:20:45.4068444Z Expected 1 but got 2.
2025-12-04T11:20:45.4068554Z Absolute difference: 1
2025-12-04T11:20:45.4068677Z Relative difference: 1.0
2025-12-04T11:20:45.4068682Z 
2025-12-04T11:20:45.4068897Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4069864Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4069886Z 
2025-12-04T11:20:45.4070159Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4070378Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4070541Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4071763Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4072009Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4072111Z graph_break []
2025-12-04T11:20:45.4072331Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4073556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4073753Z   if out == self.unknown_value:
2025-12-04T11:20:45.4074476Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4074603Z   warnings.warn(
2025-12-04T11:20:45.4075323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4075442Z   warnings.warn(
2025-12-04T11:20:45.4075665Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4075786Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4076031Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4076928Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4077050Z graph_break []
2025-12-04T11:20:45.4077268Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4077997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4078115Z   warnings.warn(
2025-12-04T11:20:45.4078833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4078936Z   warnings.warn(
2025-12-04T11:20:45.4079166Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4079288Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4079535Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4080425Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4080531Z graph_break []
2025-12-04T11:20:45.4080764Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4081494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4081614Z   warnings.warn(
2025-12-04T11:20:45.4082334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4082543Z   warnings.warn(
2025-12-04T11:20:45.4083400Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0e30a339afee7d22.xml -
2025-12-04T11:20:45.4083580Z =========================== short test summary info ============================
2025-12-04T11:20:45.4084537Z FAILED [0.4435s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4084591Z 
2025-12-04T11:20:45.4084700Z Expected 1 but got 2.
2025-12-04T11:20:45.4084812Z Absolute difference: 1
2025-12-04T11:20:45.4084943Z Relative difference: 1.0
2025-12-04T11:20:45.4084948Z 
2025-12-04T11:20:45.4085170Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4086078Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4086130Z 
2025-12-04T11:20:45.4086406Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4086589Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.4086802Z ================== 1 failed, 13 deselected, 2 rerun in 20.58s ==================
2025-12-04T11:20:45.4086906Z Got exit code 1
2025-12-04T11:20:45.4087014Z Retrying single test...
2025-12-04T11:20:45.4087473Z W1204 11:11:05.205000 90610 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.4088136Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4a14a2e6be65f97f.xml
2025-12-04T11:20:45.4088318Z ============================= test session starts ==============================
2025-12-04T11:20:45.4088673Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.4088786Z cachedir: .pytest_cache
2025-12-04T11:20:45.4089323Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.4089449Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.4089560Z configfile: pytest.ini
2025-12-04T11:20:45.4090115Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.4090333Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.4091329Z stepcurrent: skipping 7 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4091447Z Running 1 items in this shard
2025-12-04T11:20:45.4091457Z 
2025-12-04T11:20:45.4092747Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:11:10.047314338 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4092756Z 
2025-12-04T11:20:45.4093275Z [W1204 11:11:25.259216840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4093283Z 
2025-12-04T11:20:45.4093793Z [W1204 11:11:25.259483766 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4093811Z 
2025-12-04T11:20:45.4094320Z [W1204 11:11:25.266839325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4094325Z 
2025-12-04T11:20:45.4094904Z [W1204 11:11:25.267536827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4094913Z 
2025-12-04T11:20:45.4095434Z [W1204 11:11:25.267723570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4095440Z 
2025-12-04T11:20:45.4095945Z [W1204 11:11:25.274666725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4095980Z 
2025-12-04T11:20:45.4096580Z [W1204 11:11:25.275439269 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4096586Z 
2025-12-04T11:20:45.4097094Z [W1204 11:11:25.275619551 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4097099Z 
2025-12-04T11:20:45.4097627Z [W1204 11:11:26.413524292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4097670Z 
2025-12-04T11:20:45.4098182Z [W1204 11:11:26.415261131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4098187Z 
2025-12-04T11:20:45.4098709Z [W1204 11:11:26.415475728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4098716Z 
2025-12-04T11:20:45.4099231Z [W1204 11:11:26.419460348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4099236Z 
2025-12-04T11:20:45.4099738Z [W1204 11:11:26.420137977 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4099743Z 
2025-12-04T11:20:45.4100272Z [W1204 11:11:26.420337083 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4100280Z 
2025-12-04T11:20:45.4100789Z [W1204 11:11:26.426438664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4100794Z 
2025-12-04T11:20:45.4101317Z [W1204 11:11:26.427068125 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4101324Z 
2025-12-04T11:20:45.4101834Z [W1204 11:11:26.427257028 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4101839Z 
2025-12-04T11:20:45.4101990Z ('RERUN', {'yellow': True}) [19.0761s] [100%]
2025-12-04T11:20:45.4103271Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:11:26.831109857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4103279Z 
2025-12-04T11:20:45.4103803Z [W1204 11:11:26.831833376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4103808Z 
2025-12-04T11:20:45.4104319Z [W1204 11:11:26.832032507 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4104326Z 
2025-12-04T11:20:45.4104832Z [W1204 11:11:26.836069081 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4104849Z 
2025-12-04T11:20:45.4105357Z [W1204 11:11:26.836726476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4105362Z 
2025-12-04T11:20:45.4105929Z [W1204 11:11:26.836921751 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4105935Z 
2025-12-04T11:20:45.4106457Z [W1204 11:11:26.843139546 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4106462Z 
2025-12-04T11:20:45.4106969Z [W1204 11:11:26.843750780 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4107006Z 
2025-12-04T11:20:45.4107525Z [W1204 11:11:26.843934467 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4107531Z 
2025-12-04T11:20:45.4108039Z [W1204 11:11:26.931833573 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4108043Z 
2025-12-04T11:20:45.4108564Z [W1204 11:11:26.932645973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4108573Z 
2025-12-04T11:20:45.4109112Z [W1204 11:11:26.932860144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4109117Z 
2025-12-04T11:20:45.4109639Z [W1204 11:11:26.936828804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4109646Z 
2025-12-04T11:20:45.4110154Z [W1204 11:11:26.937487791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4110160Z 
2025-12-04T11:20:45.4110666Z [W1204 11:11:26.937695296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4110684Z 
2025-12-04T11:20:45.4111188Z [W1204 11:11:26.943771149 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4111193Z 
2025-12-04T11:20:45.4111705Z [W1204 11:11:26.944603657 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4111712Z 
2025-12-04T11:20:45.4112233Z [W1204 11:11:26.944798040 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4112238Z 
2025-12-04T11:20:45.4112373Z ('RERUN', {'yellow': True}) [0.4778s] [100%]
2025-12-04T11:20:45.4113648Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:11:26.278517032 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4113654Z 
2025-12-04T11:20:45.4114163Z [W1204 11:11:26.279223789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4114173Z 
2025-12-04T11:20:45.4114692Z [W1204 11:11:26.279419340 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4114699Z 
2025-12-04T11:20:45.4115208Z [W1204 11:11:26.283470640 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4115214Z 
2025-12-04T11:20:45.4115724Z [W1204 11:11:26.284100521 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4115742Z 
2025-12-04T11:20:45.4116254Z [W1204 11:11:26.284306997 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4116259Z 
2025-12-04T11:20:45.4116768Z [W1204 11:11:26.290495118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4116773Z 
2025-12-04T11:20:45.4117356Z [W1204 11:11:26.291118891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4117364Z 
2025-12-04T11:20:45.4117870Z [W1204 11:11:26.291302088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4117875Z 
2025-12-04T11:20:45.4118427Z [W1204 11:11:26.379308521 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4118432Z 
2025-12-04T11:20:45.4118940Z [W1204 11:11:26.380104609 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4118944Z 
2025-12-04T11:20:45.4119463Z [W1204 11:11:26.380314013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4119468Z 
2025-12-04T11:20:45.4119980Z [W1204 11:11:26.384254662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4120013Z 
2025-12-04T11:20:45.4120539Z [W1204 11:11:26.384946142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4120544Z 
2025-12-04T11:20:45.4121052Z [W1204 11:11:26.385149299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4121060Z 
2025-12-04T11:20:45.4121574Z [W1204 11:11:27.391215832 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4121594Z 
2025-12-04T11:20:45.4122101Z [W1204 11:11:27.392029307 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4122105Z 
2025-12-04T11:20:45.4122617Z [W1204 11:11:27.392224871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4122625Z 
2025-12-04T11:20:45.4122744Z FAILED [0.4463s] [100%]
2025-12-04T11:20:45.4122749Z 
2025-12-04T11:20:45.4122897Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.4123415Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.4123542Z Traceback (most recent call last):
2025-12-04T11:20:45.4124052Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4124299Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4124766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4124944Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4125483Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4125694Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4125841Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4125846Z 
2025-12-04T11:20:45.4125954Z Expected 1 but got 2.
2025-12-04T11:20:45.4126065Z Absolute difference: 1
2025-12-04T11:20:45.4126189Z Relative difference: 1.0
2025-12-04T11:20:45.4126194Z 
2025-12-04T11:20:45.4126411Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4127325Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4127330Z 
2025-12-04T11:20:45.4127600Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4127907Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4128044Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4128938Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4129209Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4129311Z graph_break []
2025-12-04T11:20:45.4129528Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4130746Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4130865Z   if out == self.unknown_value:
2025-12-04T11:20:45.4131617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4131753Z   warnings.warn(
2025-12-04T11:20:45.4132478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4132597Z   warnings.warn(
2025-12-04T11:20:45.4133100Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.4133224Z Traceback (most recent call last):
2025-12-04T11:20:45.4133742Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4133976Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4134450Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4134616Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4135156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4135374Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4135508Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4135516Z 
2025-12-04T11:20:45.4135636Z Expected 1 but got 2.
2025-12-04T11:20:45.4135744Z Absolute difference: 1
2025-12-04T11:20:45.4135855Z Relative difference: 1.0
2025-12-04T11:20:45.4135861Z 
2025-12-04T11:20:45.4136088Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4137056Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4137062Z 
2025-12-04T11:20:45.4137354Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4137576Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4137694Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4138598Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4138832Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4138936Z graph_break []
2025-12-04T11:20:45.4139170Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4140453Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4140592Z   if out == self.unknown_value:
2025-12-04T11:20:45.4141325Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4141428Z   warnings.warn(
2025-12-04T11:20:45.4142160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4142295Z   warnings.warn(
2025-12-04T11:20:45.4142529Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4142646Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4142876Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4143785Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4143917Z graph_break []
2025-12-04T11:20:45.4144132Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4144870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4144978Z   warnings.warn(
2025-12-04T11:20:45.4145705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4145807Z   warnings.warn(
2025-12-04T11:20:45.4145953Z =================================== FAILURES ===================================
2025-12-04T11:20:45.4146474Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.4146604Z Traceback (most recent call last):
2025-12-04T11:20:45.4147126Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4147363Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4147824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4148005Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4148542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4148748Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4148903Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4148908Z 
2025-12-04T11:20:45.4149016Z Expected 1 but got 2.
2025-12-04T11:20:45.4149140Z Absolute difference: 1
2025-12-04T11:20:45.4149252Z Relative difference: 1.0
2025-12-04T11:20:45.4149257Z 
2025-12-04T11:20:45.4149479Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4150402Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4150407Z 
2025-12-04T11:20:45.4150677Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4150914Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4151034Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4151919Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4152164Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4152325Z graph_break []
2025-12-04T11:20:45.4152556Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4153760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4153914Z   if out == self.unknown_value:
2025-12-04T11:20:45.4154652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4154757Z   warnings.warn(
2025-12-04T11:20:45.4155490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4155596Z   warnings.warn(
2025-12-04T11:20:45.4155822Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4155990Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4156222Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4157112Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4157232Z graph_break []
2025-12-04T11:20:45.4157451Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4158191Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4158297Z   warnings.warn(
2025-12-04T11:20:45.4159019Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4159135Z   warnings.warn(
2025-12-04T11:20:45.4159355Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4159471Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4159713Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4160602Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4160722Z graph_break []
2025-12-04T11:20:45.4160939Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4161667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4161782Z   warnings.warn(
2025-12-04T11:20:45.4162498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4162614Z   warnings.warn(
2025-12-04T11:20:45.4163454Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4a14a2e6be65f97f.xml -
2025-12-04T11:20:45.4163629Z =========================== short test summary info ============================
2025-12-04T11:20:45.4164581Z FAILED [0.4463s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4164588Z 
2025-12-04T11:20:45.4164695Z Expected 1 but got 2.
2025-12-04T11:20:45.4164816Z Absolute difference: 1
2025-12-04T11:20:45.4164927Z Relative difference: 1.0
2025-12-04T11:20:45.4164932Z 
2025-12-04T11:20:45.4165212Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4166134Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4166139Z 
2025-12-04T11:20:45.4166408Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4166633Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.4166833Z ================== 1 failed, 13 deselected, 2 rerun in 20.03s ==================
2025-12-04T11:20:45.4166934Z Got exit code 1
2025-12-04T11:20:45.4167760Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4168179Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:20:45.4168636Z W1204 11:11:38.682000 90791 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.4169348Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82a3db4b14f41cd2.xml
2025-12-04T11:20:45.4169517Z ============================= test session starts ==============================
2025-12-04T11:20:45.4169886Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.4169998Z cachedir: .pytest_cache
2025-12-04T11:20:45.4170518Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.4170657Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.4170767Z configfile: pytest.ini
2025-12-04T11:20:45.4171665Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.4171891Z collecting ... collected 58 items / 8 deselected / 50 selected
2025-12-04T11:20:45.4172035Z stepcurrent: skipping 8 already run items.
2025-12-04T11:20:45.4172166Z Running 6 items in this shard
2025-12-04T11:20:45.4172171Z 
2025-12-04T11:20:45.4173030Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.8788s] [ 16%]
2025-12-04T11:20:45.4173901Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4250s] [ 16%]
2025-12-04T11:20:45.4174669Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.4182s] [ 16%]
2025-12-04T11:20:45.4174674Z 
2025-12-04T11:20:45.4174823Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.4175348Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.4175474Z Traceback (most recent call last):
2025-12-04T11:20:45.4175999Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4176237Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4176767Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4176947Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4177484Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4177839Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4177978Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4177986Z 
2025-12-04T11:20:45.4178092Z Expected 1 but got 2.
2025-12-04T11:20:45.4178214Z Absolute difference: 1
2025-12-04T11:20:45.4178327Z Relative difference: 1.0
2025-12-04T11:20:45.4178332Z 
2025-12-04T11:20:45.4178547Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4179513Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.4179520Z 
2025-12-04T11:20:45.4179788Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4180023Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4180141Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4180675Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4180967Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4181069Z graph_break []
2025-12-04T11:20:45.4181295Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4182028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4182136Z   warnings.warn(
2025-12-04T11:20:45.4182872Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4182974Z   warnings.warn(
2025-12-04T11:20:45.4183489Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.4183620Z Traceback (most recent call last):
2025-12-04T11:20:45.4184135Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4184383Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4184842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4185012Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4185560Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4185769Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4185916Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4185921Z 
2025-12-04T11:20:45.4186026Z Expected 1 but got 2.
2025-12-04T11:20:45.4186134Z Absolute difference: 1
2025-12-04T11:20:45.4186259Z Relative difference: 1.0
2025-12-04T11:20:45.4186268Z 
2025-12-04T11:20:45.4186485Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4187391Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.4187409Z 
2025-12-04T11:20:45.4187679Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4187898Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4188030Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4188556Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4188785Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4188898Z graph_break []
2025-12-04T11:20:45.4189178Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4189924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4190027Z   warnings.warn(
2025-12-04T11:20:45.4190745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4190891Z   warnings.warn(
2025-12-04T11:20:45.4191109Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4191225Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4191466Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4191994Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4192112Z graph_break []
2025-12-04T11:20:45.4192327Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4193084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4193201Z   warnings.warn(
2025-12-04T11:20:45.4193917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4194034Z   warnings.warn(
2025-12-04T11:20:45.4194180Z =================================== FAILURES ===================================
2025-12-04T11:20:45.4194683Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.4194820Z Traceback (most recent call last):
2025-12-04T11:20:45.4195332Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4195566Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4196035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4196202Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4196749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4196955Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4197088Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4197093Z 
2025-12-04T11:20:45.4197211Z Expected 1 but got 2.
2025-12-04T11:20:45.4197318Z Absolute difference: 1
2025-12-04T11:20:45.4197428Z Relative difference: 1.0
2025-12-04T11:20:45.4197447Z 
2025-12-04T11:20:45.4197664Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4198571Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.4198579Z 
2025-12-04T11:20:45.4198859Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4199078Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4199197Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4199737Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4199965Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4200079Z graph_break []
2025-12-04T11:20:45.4200298Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4201092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4201211Z   warnings.warn(
2025-12-04T11:20:45.4201931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4202076Z   warnings.warn(
2025-12-04T11:20:45.4202295Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4202416Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4202655Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4203185Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4203287Z graph_break []
2025-12-04T11:20:45.4203523Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4204242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4204387Z   warnings.warn(
2025-12-04T11:20:45.4205105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4205208Z   warnings.warn(
2025-12-04T11:20:45.4205435Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4205550Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4205776Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4206312Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4206411Z graph_break []
2025-12-04T11:20:45.4206644Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4207366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4207467Z   warnings.warn(
2025-12-04T11:20:45.4208196Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4208299Z   warnings.warn(
2025-12-04T11:20:45.4209147Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82a3db4b14f41cd2.xml -
2025-12-04T11:20:45.4209323Z =========================== short test summary info ============================
2025-12-04T11:20:45.4210259Z FAILED [0.4182s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4210267Z 
2025-12-04T11:20:45.4210389Z Expected 1 but got 2.
2025-12-04T11:20:45.4210498Z Absolute difference: 1
2025-12-04T11:20:45.4210621Z Relative difference: 1.0
2025-12-04T11:20:45.4210626Z 
2025-12-04T11:20:45.4210841Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4211748Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.4211754Z 
2025-12-04T11:20:45.4212034Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4212215Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.4212426Z =================== 1 failed, 8 deselected, 2 rerun in 4.75s ===================
2025-12-04T11:20:45.4212593Z Got exit code 1
2025-12-04T11:20:45.4212704Z Retrying single test...
2025-12-04T11:20:45.4213165Z W1204 11:11:59.267000 90960 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.4213833Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a02c7191ab69f431.xml
2025-12-04T11:20:45.4214048Z ============================= test session starts ==============================
2025-12-04T11:20:45.4214415Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.4214528Z cachedir: .pytest_cache
2025-12-04T11:20:45.4215066Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.4215195Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.4215307Z configfile: pytest.ini
2025-12-04T11:20:45.4215870Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.4216129Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.4217209Z stepcurrent: skipping 8 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.4217337Z Running 1 items in this shard
2025-12-04T11:20:45.4217342Z 
2025-12-04T11:20:45.4218611Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:12:02.272761954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4218617Z 
2025-12-04T11:20:45.4219155Z [W1204 11:12:19.518301637 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4219162Z 
2025-12-04T11:20:45.4219675Z [W1204 11:12:19.518566200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4219681Z 
2025-12-04T11:20:45.4220207Z [W1204 11:12:19.525909295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4220214Z 
2025-12-04T11:20:45.4220724Z [W1204 11:12:19.526648895 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4220729Z 
2025-12-04T11:20:45.4221253Z [W1204 11:12:19.526840399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4221258Z 
2025-12-04T11:20:45.4221771Z [W1204 11:12:19.533862900 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4221776Z 
2025-12-04T11:20:45.4222302Z [W1204 11:12:19.534543891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4222306Z 
2025-12-04T11:20:45.4222817Z [W1204 11:12:19.534734527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4222824Z 
2025-12-04T11:20:45.4223330Z [W1204 11:12:21.542108370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4223350Z 
2025-12-04T11:20:45.4223860Z [W1204 11:12:21.543931835 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4223865Z 
2025-12-04T11:20:45.4224372Z [W1204 11:12:21.544156187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4224452Z 
2025-12-04T11:20:45.4224978Z [W1204 11:12:21.548325892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4224985Z 
2025-12-04T11:20:45.4225491Z [W1204 11:12:21.549090184 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4225527Z 
2025-12-04T11:20:45.4226049Z [W1204 11:12:21.549297089 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4226054Z 
2025-12-04T11:20:45.4226562Z [W1204 11:12:21.555640488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4226567Z 
2025-12-04T11:20:45.4227086Z [W1204 11:12:21.556418740 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4227091Z 
2025-12-04T11:20:45.4227602Z [W1204 11:12:21.556639491 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4227639Z 
2025-12-04T11:20:45.4227789Z ('RERUN', {'yellow': True}) [20.1203s] [100%]
2025-12-04T11:20:45.4229046Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:12:21.920799249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4229055Z 
2025-12-04T11:20:45.4229568Z [W1204 11:12:21.921611810 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4229587Z 
2025-12-04T11:20:45.4230101Z [W1204 11:12:21.921825920 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4230106Z 
2025-12-04T11:20:45.4230623Z [W1204 11:12:21.925823311 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4230630Z 
2025-12-04T11:20:45.4231159Z [W1204 11:12:21.926625021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4231163Z 
2025-12-04T11:20:45.4231672Z [W1204 11:12:21.926818700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4231677Z 
2025-12-04T11:20:45.4232200Z [W1204 11:12:21.932910440 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4232205Z 
2025-12-04T11:20:45.4232712Z [W1204 11:12:21.933538858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4232716Z 
2025-12-04T11:20:45.4233235Z [W1204 11:12:21.933725485 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4233241Z 
2025-12-04T11:20:45.4233749Z [W1204 11:12:21.023781718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4233754Z 
2025-12-04T11:20:45.4234272Z [W1204 11:12:21.024578135 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4234279Z 
2025-12-04T11:20:45.4234785Z [W1204 11:12:21.024784370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4234790Z 
2025-12-04T11:20:45.4235295Z [W1204 11:12:21.028760273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4235300Z 
2025-12-04T11:20:45.4235892Z [W1204 11:12:21.029423515 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4235900Z 
2025-12-04T11:20:45.4236412Z [W1204 11:12:21.029621400 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4236417Z 
2025-12-04T11:20:45.4236939Z [W1204 11:12:21.035762795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4236981Z 
2025-12-04T11:20:45.4237490Z [W1204 11:12:21.036668746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4237496Z 
2025-12-04T11:20:45.4238016Z [W1204 11:12:21.036867140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4238020Z 
2025-12-04T11:20:45.4238152Z ('RERUN', {'yellow': True}) [0.4392s] [100%]
2025-12-04T11:20:45.4239434Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:12:21.333555728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4239473Z 
2025-12-04T11:20:45.4239985Z [W1204 11:12:21.334346692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4239993Z 
2025-12-04T11:20:45.4240500Z [W1204 11:12:21.334550788 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4240520Z 
2025-12-04T11:20:45.4241031Z [W1204 11:12:21.338602848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4241036Z 
2025-12-04T11:20:45.4241547Z [W1204 11:12:21.339440911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4241554Z 
2025-12-04T11:20:45.4242073Z [W1204 11:12:21.339633496 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4242079Z 
2025-12-04T11:20:45.4242583Z [W1204 11:12:21.345780229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4242590Z 
2025-12-04T11:20:45.4243110Z [W1204 11:12:21.346478713 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4243114Z 
2025-12-04T11:20:45.4243622Z [W1204 11:12:21.346669243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4243626Z 
2025-12-04T11:20:45.4244156Z [W1204 11:12:22.436304789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4244163Z 
2025-12-04T11:20:45.4244673Z [W1204 11:12:22.437120886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4244677Z 
2025-12-04T11:20:45.4245197Z [W1204 11:12:22.437329853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4245204Z 
2025-12-04T11:20:45.4245714Z [W1204 11:12:22.441441114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4245719Z 
2025-12-04T11:20:45.4246225Z [W1204 11:12:22.442126413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4246244Z 
2025-12-04T11:20:45.4246900Z [W1204 11:12:22.442321726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4246905Z 
2025-12-04T11:20:45.4247416Z [W1204 11:12:22.448543890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4247420Z 
2025-12-04T11:20:45.4247945Z [W1204 11:12:22.449474833 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4247981Z 
2025-12-04T11:20:45.4248490Z [W1204 11:12:22.449679283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4248494Z 
2025-12-04T11:20:45.4248612Z FAILED [0.4117s] [100%]
2025-12-04T11:20:45.4248617Z 
2025-12-04T11:20:45.4248762Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.4249268Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.4249411Z Traceback (most recent call last):
2025-12-04T11:20:45.4249957Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4250205Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4250674Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4250840Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4251388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4251595Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4251741Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4251746Z 
2025-12-04T11:20:45.4251853Z Expected 1 but got 2.
2025-12-04T11:20:45.4251962Z Absolute difference: 1
2025-12-04T11:20:45.4252088Z Relative difference: 1.0
2025-12-04T11:20:45.4252098Z 
2025-12-04T11:20:45.4252313Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4253214Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.4253231Z 
2025-12-04T11:20:45.4253504Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4253728Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4253860Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4254387Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4254617Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4254732Z graph_break []
2025-12-04T11:20:45.4254955Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4256184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4256371Z   if out == self.unknown_value:
2025-12-04T11:20:45.4257107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4257227Z   warnings.warn(
2025-12-04T11:20:45.4257945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4258065Z   warnings.warn(
2025-12-04T11:20:45.4258651Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.4258779Z Traceback (most recent call last):
2025-12-04T11:20:45.4259302Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4259532Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4259990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4260198Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4260731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4260954Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4261088Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4261093Z 
2025-12-04T11:20:45.4261201Z Expected 1 but got 2.
2025-12-04T11:20:45.4261326Z Absolute difference: 1
2025-12-04T11:20:45.4261443Z Relative difference: 1.0
2025-12-04T11:20:45.4261481Z 
2025-12-04T11:20:45.4261699Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4262617Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.4262625Z 
2025-12-04T11:20:45.4262893Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4263125Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4263245Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4263772Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4264013Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4264113Z graph_break []
2025-12-04T11:20:45.4264344Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4265558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4265679Z   if out == self.unknown_value:
2025-12-04T11:20:45.4266415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4266519Z   warnings.warn(
2025-12-04T11:20:45.4267249Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4267352Z   warnings.warn(
2025-12-04T11:20:45.4267573Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4267706Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4267936Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4268465Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4268578Z graph_break []
2025-12-04T11:20:45.4268796Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4269532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4269633Z   warnings.warn(
2025-12-04T11:20:45.4270349Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4270462Z   warnings.warn(
2025-12-04T11:20:45.4270680Z =================================== FAILURES ===================================
2025-12-04T11:20:45.4271537Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.4271672Z Traceback (most recent call last):
2025-12-04T11:20:45.4272186Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4272512Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4272971Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4273133Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4273683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4273895Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4274047Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4274105Z 
2025-12-04T11:20:45.4274215Z Expected 1 but got 2.
2025-12-04T11:20:45.4274325Z Absolute difference: 1
2025-12-04T11:20:45.4274450Z Relative difference: 1.0
2025-12-04T11:20:45.4274455Z 
2025-12-04T11:20:45.4274672Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4275592Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.4275598Z 
2025-12-04T11:20:45.4275869Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4276088Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4276221Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4276752Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4276985Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4277104Z graph_break []
2025-12-04T11:20:45.4277319Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4278535Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4278661Z   if out == self.unknown_value:
2025-12-04T11:20:45.4279387Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4279504Z   warnings.warn(
2025-12-04T11:20:45.4280226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4280344Z   warnings.warn(
2025-12-04T11:20:45.4280562Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4280680Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4280922Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4281458Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4281558Z graph_break []
2025-12-04T11:20:45.4281784Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4282506Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4282620Z   warnings.warn(
2025-12-04T11:20:45.4283444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4283550Z   warnings.warn(
2025-12-04T11:20:45.4283782Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4283899Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4284176Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4284704Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4284804Z graph_break []
2025-12-04T11:20:45.4285043Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4285766Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4285874Z   warnings.warn(
2025-12-04T11:20:45.4286638Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4286743Z   warnings.warn(
2025-12-04T11:20:45.4287593Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a02c7191ab69f431.xml -
2025-12-04T11:20:45.4287772Z =========================== short test summary info ============================
2025-12-04T11:20:45.4288705Z FAILED [0.4117s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4288725Z 
2025-12-04T11:20:45.4288837Z Expected 1 but got 2.
2025-12-04T11:20:45.4288948Z Absolute difference: 1
2025-12-04T11:20:45.4289082Z Relative difference: 1.0
2025-12-04T11:20:45.4289087Z 
2025-12-04T11:20:45.4289307Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4290208Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.4290214Z 
2025-12-04T11:20:45.4290504Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4290687Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.4290902Z ================== 1 failed, 13 deselected, 2 rerun in 21.00s ==================
2025-12-04T11:20:45.4291006Z Got exit code 1
2025-12-04T11:20:45.4291111Z Retrying single test...
2025-12-04T11:20:45.4291572Z W1204 11:12:33.782000 91134 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.4292237Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e37b8ebc7938792f.xml
2025-12-04T11:20:45.4292420Z ============================= test session starts ==============================
2025-12-04T11:20:45.4292772Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.4292886Z cachedir: .pytest_cache
2025-12-04T11:20:45.4293423Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.4293551Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.4293663Z configfile: pytest.ini
2025-12-04T11:20:45.4294225Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.4294446Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.4295501Z stepcurrent: skipping 8 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.4295627Z Running 1 items in this shard
2025-12-04T11:20:45.4295633Z 
2025-12-04T11:20:45.4296969Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:12:37.776359949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4297027Z 
2025-12-04T11:20:45.4297549Z [W1204 11:12:53.862439089 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4297554Z 
2025-12-04T11:20:45.4298064Z [W1204 11:12:53.862700188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4298070Z 
2025-12-04T11:20:45.4298601Z [W1204 11:12:53.870192683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4298636Z 
2025-12-04T11:20:45.4299146Z [W1204 11:12:53.870916311 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4299151Z 
2025-12-04T11:20:45.4299679Z [W1204 11:12:53.871106470 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4299684Z 
2025-12-04T11:20:45.4300196Z [W1204 11:12:53.878050672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4300201Z 
2025-12-04T11:20:45.4300726Z [W1204 11:12:53.878708784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4300731Z 
2025-12-04T11:20:45.4301244Z [W1204 11:12:53.878893636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4301252Z 
2025-12-04T11:20:45.4301778Z [W1204 11:12:55.885365278 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4301784Z 
2025-12-04T11:20:45.4302292Z [W1204 11:12:55.887132558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4302299Z 
2025-12-04T11:20:45.4302815Z [W1204 11:12:55.887353773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4302834Z 
2025-12-04T11:20:45.4303341Z [W1204 11:12:55.891470759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4303346Z 
2025-12-04T11:20:45.4303860Z [W1204 11:12:55.892194656 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4303867Z 
2025-12-04T11:20:45.4304396Z [W1204 11:12:55.892402346 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4304401Z 
2025-12-04T11:20:45.4304908Z [W1204 11:12:55.898681018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4304915Z 
2025-12-04T11:20:45.4305438Z [W1204 11:12:55.899398288 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4305443Z 
2025-12-04T11:20:45.4305948Z [W1204 11:12:55.899600662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4305953Z 
2025-12-04T11:20:45.4306104Z ('RERUN', {'yellow': True}) [19.9477s] [100%]
2025-12-04T11:20:45.4307451Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:12:55.264413976 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4307460Z 
2025-12-04T11:20:45.4307970Z [W1204 11:12:55.265230197 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4308018Z 
2025-12-04T11:20:45.4308529Z [W1204 11:12:55.265435750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4308534Z 
2025-12-04T11:20:45.4309041Z [W1204 11:12:55.269464401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4309046Z 
2025-12-04T11:20:45.4309568Z [W1204 11:12:55.270342425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4309608Z 
2025-12-04T11:20:45.4310116Z [W1204 11:12:55.270544744 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4310121Z 
2025-12-04T11:20:45.4310642Z [W1204 11:12:55.276711460 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4310649Z 
2025-12-04T11:20:45.4311161Z [W1204 11:12:55.277351336 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4311166Z 
2025-12-04T11:20:45.4311688Z [W1204 11:12:55.277539164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4311693Z 
2025-12-04T11:20:45.4312206Z [W1204 11:12:55.366923654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4312213Z 
2025-12-04T11:20:45.4312731Z [W1204 11:12:55.367697961 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4312736Z 
2025-12-04T11:20:45.4313240Z [W1204 11:12:55.367903647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4313248Z 
2025-12-04T11:20:45.4313758Z [W1204 11:12:55.371897849 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4313779Z 
2025-12-04T11:20:45.4314285Z [W1204 11:12:55.372563885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4314289Z 
2025-12-04T11:20:45.4314799Z [W1204 11:12:55.372758507 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4314804Z 
2025-12-04T11:20:45.4315327Z [W1204 11:12:55.378813179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4315332Z 
2025-12-04T11:20:45.4315841Z [W1204 11:12:55.379635546 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4315848Z 
2025-12-04T11:20:45.4316369Z [W1204 11:12:55.379830822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4316374Z 
2025-12-04T11:20:45.4316506Z ('RERUN', {'yellow': True}) [0.4399s] [100%]
2025-12-04T11:20:45.4317840Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:12:56.687201557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4317849Z 
2025-12-04T11:20:45.4318360Z [W1204 11:12:56.688009277 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4318364Z 
2025-12-04T11:20:45.4318888Z [W1204 11:12:56.688220048 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4318923Z 
2025-12-04T11:20:45.4319431Z [W1204 11:12:56.692334916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4319435Z 
2025-12-04T11:20:45.4319943Z [W1204 11:12:56.693197701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4319948Z 
2025-12-04T11:20:45.4320475Z [W1204 11:12:56.693394377 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4320480Z 
2025-12-04T11:20:45.4321020Z [W1204 11:12:56.699524104 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4321025Z 
2025-12-04T11:20:45.4321544Z [W1204 11:12:56.700200727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4321552Z 
2025-12-04T11:20:45.4322058Z [W1204 11:12:56.700404924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4322063Z 
2025-12-04T11:20:45.4322581Z [W1204 11:12:56.789785195 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4322587Z 
2025-12-04T11:20:45.4323093Z [W1204 11:12:56.790620210 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4323102Z 
2025-12-04T11:20:45.4323623Z [W1204 11:12:56.790838473 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4323630Z 
2025-12-04T11:20:45.4324135Z [W1204 11:12:56.794914062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4324142Z 
2025-12-04T11:20:45.4324649Z [W1204 11:12:56.795600285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4324665Z 
2025-12-04T11:20:45.4325170Z [W1204 11:12:56.795799349 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4325175Z 
2025-12-04T11:20:45.4325681Z [W1204 11:12:56.802091790 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4325686Z 
2025-12-04T11:20:45.4326211Z [W1204 11:12:56.803047940 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4326218Z 
2025-12-04T11:20:45.4326724Z [W1204 11:12:56.803254091 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4326728Z 
2025-12-04T11:20:45.4326847Z FAILED [0.4217s] [100%]
2025-12-04T11:20:45.4326852Z 
2025-12-04T11:20:45.4326995Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.4327511Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.4327637Z Traceback (most recent call last):
2025-12-04T11:20:45.4328152Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4328397Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4328932Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4329102Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4329652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4329893Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4330041Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4330046Z 
2025-12-04T11:20:45.4330154Z Expected 1 but got 2.
2025-12-04T11:20:45.4330264Z Absolute difference: 1
2025-12-04T11:20:45.4330390Z Relative difference: 1.0
2025-12-04T11:20:45.4330395Z 
2025-12-04T11:20:45.4330610Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4331524Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.4331575Z 
2025-12-04T11:20:45.4331847Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4332068Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4332201Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4332731Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4332959Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4333073Z graph_break []
2025-12-04T11:20:45.4333293Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4334517Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4334637Z   if out == self.unknown_value:
2025-12-04T11:20:45.4335362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4335478Z   warnings.warn(
2025-12-04T11:20:45.4336200Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4336398Z   warnings.warn(
2025-12-04T11:20:45.4336903Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.4337027Z Traceback (most recent call last):
2025-12-04T11:20:45.4337548Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4337784Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4338244Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4338424Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4338958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4339181Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4339316Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4339321Z 
2025-12-04T11:20:45.4339430Z Expected 1 but got 2.
2025-12-04T11:20:45.4339555Z Absolute difference: 1
2025-12-04T11:20:45.4339667Z Relative difference: 1.0
2025-12-04T11:20:45.4339673Z 
2025-12-04T11:20:45.4339901Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4340880Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.4340889Z 
2025-12-04T11:20:45.4341160Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4341395Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4341510Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4342083Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4342311Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4342410Z graph_break []
2025-12-04T11:20:45.4342640Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4343846Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4344009Z   if out == self.unknown_value:
2025-12-04T11:20:45.4344744Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4344849Z   warnings.warn(
2025-12-04T11:20:45.4345576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4345679Z   warnings.warn(
2025-12-04T11:20:45.4345896Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4346025Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4346257Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4346801Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4346905Z graph_break []
2025-12-04T11:20:45.4347122Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4347862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4347966Z   warnings.warn(
2025-12-04T11:20:45.4348680Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4348796Z   warnings.warn(
2025-12-04T11:20:45.4348942Z =================================== FAILURES ===================================
2025-12-04T11:20:45.4349456Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:20:45.4349585Z Traceback (most recent call last):
2025-12-04T11:20:45.4350101Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4350347Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4350806Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4350985Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4351527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4351734Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4351879Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4351884Z 
2025-12-04T11:20:45.4351991Z Expected 1 but got 2.
2025-12-04T11:20:45.4352098Z Absolute difference: 1
2025-12-04T11:20:45.4352223Z Relative difference: 1.0
2025-12-04T11:20:45.4352289Z 
2025-12-04T11:20:45.4352507Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4353426Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.4353432Z 
2025-12-04T11:20:45.4353733Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4353951Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4359240Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4359839Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4360074Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4360195Z graph_break []
2025-12-04T11:20:45.4360437Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4361754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4361876Z   if out == self.unknown_value:
2025-12-04T11:20:45.4362606Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4362725Z   warnings.warn(
2025-12-04T11:20:45.4363443Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4363560Z   warnings.warn(
2025-12-04T11:20:45.4363782Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4363905Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4364154Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4364684Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4364784Z graph_break []
2025-12-04T11:20:45.4365015Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4365742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4365857Z   warnings.warn(
2025-12-04T11:20:45.4366572Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4366673Z   warnings.warn(
2025-12-04T11:20:45.4366912Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4367031Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4367261Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4367805Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4367908Z graph_break []
2025-12-04T11:20:45.4368139Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4368862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4368966Z   warnings.warn(
2025-12-04T11:20:45.4369695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4369797Z   warnings.warn(
2025-12-04T11:20:45.4370726Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e37b8ebc7938792f.xml -
2025-12-04T11:20:45.4370906Z =========================== short test summary info ============================
2025-12-04T11:20:45.4372220Z FAILED [0.4217s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4372310Z 
2025-12-04T11:20:45.4372436Z Expected 1 but got 2.
2025-12-04T11:20:45.4372549Z Absolute difference: 1
2025-12-04T11:20:45.4372664Z Relative difference: 1.0
2025-12-04T11:20:45.4372689Z 
2025-12-04T11:20:45.4372910Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4373826Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.4373888Z 
2025-12-04T11:20:45.4374174Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4374358Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.4374558Z ================== 1 failed, 13 deselected, 2 rerun in 20.84s ==================
2025-12-04T11:20:45.4374677Z Got exit code 1
2025-12-04T11:20:45.4375496Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:20:45.4375925Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:20:45.4376443Z W1204 11:13:08.355000 91308 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.4377107Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ee37665d187f9309.xml
2025-12-04T11:20:45.4377293Z ============================= test session starts ==============================
2025-12-04T11:20:45.4377649Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.4377779Z cachedir: .pytest_cache
2025-12-04T11:20:45.4378298Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.4378426Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.4378552Z configfile: pytest.ini
2025-12-04T11:20:45.4379093Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.4379328Z collecting ... collected 58 items / 9 deselected / 49 selected
2025-12-04T11:20:45.4379478Z stepcurrent: skipping 9 already run items.
2025-12-04T11:20:45.4379597Z Running 5 items in this shard
2025-12-04T11:20:45.4379605Z 
2025-12-04T11:20:45.4380492Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.9433s] [ 20%]
2025-12-04T11:20:45.4381353Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5052s] [ 20%]
2025-12-04T11:20:45.4382139Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.5059s] [ 20%]
2025-12-04T11:20:45.4382145Z 
2025-12-04T11:20:45.4382289Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.4382898Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.4383045Z Traceback (most recent call last):
2025-12-04T11:20:45.4383564Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4383811Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4384277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4384483Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4385041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4385252Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4385387Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4385407Z 
2025-12-04T11:20:45.4385515Z Expected 1 but got 2.
2025-12-04T11:20:45.4385631Z Absolute difference: 1
2025-12-04T11:20:45.4385759Z Relative difference: 1.0
2025-12-04T11:20:45.4385797Z 
2025-12-04T11:20:45.4386016Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4386921Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4386929Z 
2025-12-04T11:20:45.4387209Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4387430Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4387560Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4388091Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4388318Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4388436Z graph_break []
2025-12-04T11:20:45.4388656Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4389384Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4389502Z   warnings.warn(
2025-12-04T11:20:45.4390226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4390340Z   warnings.warn(
2025-12-04T11:20:45.4390850Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.4390975Z Traceback (most recent call last):
2025-12-04T11:20:45.4391496Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4391729Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4392202Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4392369Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4392906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4393127Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4393263Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4393268Z 
2025-12-04T11:20:45.4393375Z Expected 1 but got 2.
2025-12-04T11:20:45.4393495Z Absolute difference: 1
2025-12-04T11:20:45.4393605Z Relative difference: 1.0
2025-12-04T11:20:45.4393611Z 
2025-12-04T11:20:45.4393837Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4394804Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4394812Z 
2025-12-04T11:20:45.4395082Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4395314Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4395485Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4396024Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4396251Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4396348Z graph_break []
2025-12-04T11:20:45.4396582Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4397316Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4397465Z   warnings.warn(
2025-12-04T11:20:45.4398184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4398289Z   warnings.warn(
2025-12-04T11:20:45.4398522Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4398641Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4398868Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4399408Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4399512Z graph_break []
2025-12-04T11:20:45.4399740Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4400464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4400570Z   warnings.warn(
2025-12-04T11:20:45.4401300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4401405Z   warnings.warn(
2025-12-04T11:20:45.4401553Z =================================== FAILURES ===================================
2025-12-04T11:20:45.4402073Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.4402197Z Traceback (most recent call last):
2025-12-04T11:20:45.4402719Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4402952Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4403416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4403596Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4404135Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4404355Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4404493Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4404499Z 
2025-12-04T11:20:45.4404609Z Expected 1 but got 2.
2025-12-04T11:20:45.4404733Z Absolute difference: 1
2025-12-04T11:20:45.4404844Z Relative difference: 1.0
2025-12-04T11:20:45.4404850Z 
2025-12-04T11:20:45.4405065Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4406054Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4406063Z 
2025-12-04T11:20:45.4406336Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4406565Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4406685Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4407215Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4407486Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4407585Z graph_break []
2025-12-04T11:20:45.4407815Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4408545Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4408649Z   warnings.warn(
2025-12-04T11:20:45.4409382Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4409515Z   warnings.warn(
2025-12-04T11:20:45.4409732Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4409862Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4410091Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4410630Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4410731Z graph_break []
2025-12-04T11:20:45.4410951Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4411692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4411794Z   warnings.warn(
2025-12-04T11:20:45.4412525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4412627Z   warnings.warn(
2025-12-04T11:20:45.4412844Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4412975Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4413202Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4413728Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4413841Z graph_break []
2025-12-04T11:20:45.4414055Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4414794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4414900Z   warnings.warn(
2025-12-04T11:20:45.4415618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4415732Z   warnings.warn(
2025-12-04T11:20:45.4416645Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ee37665d187f9309.xml -
2025-12-04T11:20:45.4416827Z =========================== short test summary info ============================
2025-12-04T11:20:45.4417787Z FAILED [0.5059s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4417794Z 
2025-12-04T11:20:45.4417901Z Expected 1 but got 2.
2025-12-04T11:20:45.4418099Z Absolute difference: 1
2025-12-04T11:20:45.4418215Z Relative difference: 1.0
2025-12-04T11:20:45.4418220Z 
2025-12-04T11:20:45.4418436Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4419367Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4419405Z 
2025-12-04T11:20:45.4419673Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4419869Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.4420069Z =================== 1 failed, 9 deselected, 2 rerun in 4.99s ===================
2025-12-04T11:20:45.4420170Z Got exit code 1
2025-12-04T11:20:45.4420295Z Retrying single test...
2025-12-04T11:20:45.4420746Z W1204 11:13:29.214000 91485 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.4421450Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-511047743df1b08e.xml
2025-12-04T11:20:45.4421615Z ============================= test session starts ==============================
2025-12-04T11:20:45.4421967Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.4422093Z cachedir: .pytest_cache
2025-12-04T11:20:45.4422616Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.4422756Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.4422864Z configfile: pytest.ini
2025-12-04T11:20:45.4423409Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.4423650Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.4424638Z stepcurrent: skipping 9 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4424760Z Running 1 items in this shard
2025-12-04T11:20:45.4424765Z 
2025-12-04T11:20:45.4426057Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:13:32.324728314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4426066Z 
2025-12-04T11:20:45.4426582Z [W1204 11:13:48.177217314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4426588Z 
2025-12-04T11:20:45.4427116Z [W1204 11:13:48.177487155 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4427124Z 
2025-12-04T11:20:45.4427634Z [W1204 11:13:48.184942108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4427639Z 
2025-12-04T11:20:45.4428164Z [W1204 11:13:48.185660341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4428172Z 
2025-12-04T11:20:45.4428682Z [W1204 11:13:48.185853065 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4428687Z 
2025-12-04T11:20:45.4429205Z [W1204 11:13:48.192923348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4429210Z 
2025-12-04T11:20:45.4429784Z [W1204 11:13:48.193656377 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4429792Z 
2025-12-04T11:20:45.4430316Z [W1204 11:13:48.193846145 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4430321Z 
2025-12-04T11:20:45.4430829Z [W1204 11:13:50.203421010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4430864Z 
2025-12-04T11:20:45.4431370Z [W1204 11:13:50.205217182 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4431387Z 
2025-12-04T11:20:45.4431893Z [W1204 11:13:50.205438097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4431898Z 
2025-12-04T11:20:45.4432411Z [W1204 11:13:50.209618157 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4432416Z 
2025-12-04T11:20:45.4432968Z [W1204 11:13:50.210373750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4432973Z 
2025-12-04T11:20:45.4433481Z [W1204 11:13:50.210586578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4433489Z 
2025-12-04T11:20:45.4434007Z [W1204 11:13:50.216930399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4434011Z 
2025-12-04T11:20:45.4434515Z [W1204 11:13:50.217656831 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4434520Z 
2025-12-04T11:20:45.4435041Z [W1204 11:13:50.217861035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4435051Z 
2025-12-04T11:20:45.4435186Z ('RERUN', {'yellow': True}) [19.8161s] [100%]
2025-12-04T11:20:45.4436472Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:13:51.671095345 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4436492Z 
2025-12-04T11:20:45.4437003Z [W1204 11:13:51.671917773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4437008Z 
2025-12-04T11:20:45.4437513Z [W1204 11:13:51.672133362 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4437531Z 
2025-12-04T11:20:45.4438038Z [W1204 11:13:51.676236417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4438048Z 
2025-12-04T11:20:45.4438559Z [W1204 11:13:51.677104652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4438566Z 
2025-12-04T11:20:45.4439088Z [W1204 11:13:51.677303515 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4439096Z 
2025-12-04T11:20:45.4439603Z [W1204 11:13:51.683637243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4439607Z 
2025-12-04T11:20:45.4440130Z [W1204 11:13:51.684351557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4440135Z 
2025-12-04T11:20:45.4440644Z [W1204 11:13:51.684562181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4440649Z 
2025-12-04T11:20:45.4441345Z [W1204 11:13:51.776615271 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4441354Z 
2025-12-04T11:20:45.4441864Z [W1204 11:13:51.777460977 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4441869Z 
2025-12-04T11:20:45.4442413Z [W1204 11:13:51.777679184 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4442432Z 
2025-12-04T11:20:45.4442938Z [W1204 11:13:51.781905628 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4442943Z 
2025-12-04T11:20:45.4443450Z [W1204 11:13:51.782660157 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4443454Z 
2025-12-04T11:20:45.4443979Z [W1204 11:13:51.782871494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4444015Z 
2025-12-04T11:20:45.4444525Z [W1204 11:13:51.789254631 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4444530Z 
2025-12-04T11:20:45.4445049Z [W1204 11:13:51.790269828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4445056Z 
2025-12-04T11:20:45.4445567Z [W1204 11:13:51.790487312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4445571Z 
2025-12-04T11:20:45.4445717Z ('RERUN', {'yellow': True}) [0.5326s] [100%]
2025-12-04T11:20:45.4446994Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:13:51.181560488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4447002Z 
2025-12-04T11:20:45.4447529Z [W1204 11:13:51.182369855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4447533Z 
2025-12-04T11:20:45.4448046Z [W1204 11:13:51.182582838 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4448054Z 
2025-12-04T11:20:45.4448563Z [W1204 11:13:51.186718560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4448582Z 
2025-12-04T11:20:45.4449091Z [W1204 11:13:51.187573010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4449096Z 
2025-12-04T11:20:45.4449610Z [W1204 11:13:51.187772665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4449617Z 
2025-12-04T11:20:45.4450138Z [W1204 11:13:51.194081282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4450143Z 
2025-12-04T11:20:45.4450652Z [W1204 11:13:51.194772556 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4450659Z 
2025-12-04T11:20:45.4451182Z [W1204 11:13:51.194966725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4451187Z 
2025-12-04T11:20:45.4451696Z [W1204 11:13:51.285730282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4451701Z 
2025-12-04T11:20:45.4452283Z [W1204 11:13:51.286540875 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4452291Z 
2025-12-04T11:20:45.4452803Z [W1204 11:13:51.286764599 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4452808Z 
2025-12-04T11:20:45.4453332Z [W1204 11:13:51.293119480 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4453367Z 
2025-12-04T11:20:45.4453875Z [W1204 11:13:51.294095312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4453880Z 
2025-12-04T11:20:45.4454390Z [W1204 11:13:51.294312514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4454408Z 
2025-12-04T11:20:45.4454922Z [W1204 11:13:51.301740277 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4454962Z 
2025-12-04T11:20:45.4455468Z [W1204 11:13:51.302660516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4455473Z 
2025-12-04T11:20:45.4455996Z [W1204 11:13:51.302874272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4456004Z 
2025-12-04T11:20:45.4456111Z FAILED [0.5119s] [100%]
2025-12-04T11:20:45.4456116Z 
2025-12-04T11:20:45.4456272Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.4456863Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.4456991Z Traceback (most recent call last):
2025-12-04T11:20:45.4457527Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4457759Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4458248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4458414Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4458955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4459180Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4459315Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4459321Z 
2025-12-04T11:20:45.4459426Z Expected 1 but got 2.
2025-12-04T11:20:45.4459548Z Absolute difference: 1
2025-12-04T11:20:45.4459659Z Relative difference: 1.0
2025-12-04T11:20:45.4459664Z 
2025-12-04T11:20:45.4459892Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4460809Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4460817Z 
2025-12-04T11:20:45.4461087Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4461322Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4461443Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4461984Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4462212Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4462311Z graph_break []
2025-12-04T11:20:45.4462540Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4463827Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4463962Z   if out == self.unknown_value:
2025-12-04T11:20:45.4464685Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4464824Z   warnings.warn(
2025-12-04T11:20:45.4465551Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4465657Z   warnings.warn(
2025-12-04T11:20:45.4466163Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.4466300Z Traceback (most recent call last):
2025-12-04T11:20:45.4466813Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4467090Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4467548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4467711Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4468260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4468465Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4468614Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4468619Z 
2025-12-04T11:20:45.4468726Z Expected 1 but got 2.
2025-12-04T11:20:45.4468833Z Absolute difference: 1
2025-12-04T11:20:45.4468957Z Relative difference: 1.0
2025-12-04T11:20:45.4468963Z 
2025-12-04T11:20:45.4469179Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4470091Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4470111Z 
2025-12-04T11:20:45.4470382Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4470601Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4470731Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4471591Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4471821Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4471937Z graph_break []
2025-12-04T11:20:45.4472158Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4473392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4473513Z   if out == self.unknown_value:
2025-12-04T11:20:45.4474237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4474356Z   warnings.warn(
2025-12-04T11:20:45.4475076Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4475192Z   warnings.warn(
2025-12-04T11:20:45.4475411Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4475533Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4475916Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4476443Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4476547Z graph_break []
2025-12-04T11:20:45.4476780Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4477498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4477656Z   warnings.warn(
2025-12-04T11:20:45.4478371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4478473Z   warnings.warn(
2025-12-04T11:20:45.4478635Z =================================== FAILURES ===================================
2025-12-04T11:20:45.4479148Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.4479324Z Traceback (most recent call last):
2025-12-04T11:20:45.4479850Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4480081Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4480555Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4480720Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4481256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4481477Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4481611Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4481617Z 
2025-12-04T11:20:45.4481738Z Expected 1 but got 2.
2025-12-04T11:20:45.4481853Z Absolute difference: 1
2025-12-04T11:20:45.4481969Z Relative difference: 1.0
2025-12-04T11:20:45.4481975Z 
2025-12-04T11:20:45.4482207Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4483115Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4483124Z 
2025-12-04T11:20:45.4483405Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4483626Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4483744Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4484286Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4484520Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4484622Z graph_break []
2025-12-04T11:20:45.4484854Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4486053Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4486188Z   if out == self.unknown_value:
2025-12-04T11:20:45.4486913Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4487015Z   warnings.warn(
2025-12-04T11:20:45.4487749Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4487851Z   warnings.warn(
2025-12-04T11:20:45.4488164Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4488284Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4488515Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4489054Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4489183Z graph_break []
2025-12-04T11:20:45.4489399Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4490134Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4490239Z   warnings.warn(
2025-12-04T11:20:45.4490974Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4491081Z   warnings.warn(
2025-12-04T11:20:45.4491329Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4491461Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4491687Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4492213Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4492330Z graph_break []
2025-12-04T11:20:45.4492548Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4493280Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4493381Z   warnings.warn(
2025-12-04T11:20:45.4494104Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4494221Z   warnings.warn(
2025-12-04T11:20:45.4495055Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-511047743df1b08e.xml -
2025-12-04T11:20:45.4495244Z =========================== short test summary info ============================
2025-12-04T11:20:45.4496186Z FAILED [0.5119s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4496192Z 
2025-12-04T11:20:45.4496389Z Expected 1 but got 2.
2025-12-04T11:20:45.4496503Z Absolute difference: 1
2025-12-04T11:20:45.4496616Z Relative difference: 1.0
2025-12-04T11:20:45.4496622Z 
2025-12-04T11:20:45.4496858Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4497776Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4497784Z 
2025-12-04T11:20:45.4498070Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4498253Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.4498455Z ================== 1 failed, 13 deselected, 2 rerun in 20.89s ==================
2025-12-04T11:20:45.4498576Z Got exit code 1
2025-12-04T11:20:45.4498685Z Retrying single test...
2025-12-04T11:20:45.4499130Z W1204 11:14:03.817000 91667 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.4499801Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4d9221d5ac70ff44.xml
2025-12-04T11:20:45.4500040Z ============================= test session starts ==============================
2025-12-04T11:20:45.4500410Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.4500524Z cachedir: .pytest_cache
2025-12-04T11:20:45.4501046Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.4501223Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.4501333Z configfile: pytest.ini
2025-12-04T11:20:45.4501892Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.4502116Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.4503112Z stepcurrent: skipping 9 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4503247Z Running 1 items in this shard
2025-12-04T11:20:45.4503288Z 
2025-12-04T11:20:45.4504570Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:14:07.927166044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4504579Z 
2025-12-04T11:20:45.4505111Z [W1204 11:14:22.362494888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4505117Z 
2025-12-04T11:20:45.4505636Z [W1204 11:14:22.362758240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4505642Z 
2025-12-04T11:20:45.4506171Z [W1204 11:14:22.370243206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4506176Z 
2025-12-04T11:20:45.4506686Z [W1204 11:14:22.370974257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4506691Z 
2025-12-04T11:20:45.4507198Z [W1204 11:14:22.371173540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4507223Z 
2025-12-04T11:20:45.4507736Z [W1204 11:14:22.378261286 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4507741Z 
2025-12-04T11:20:45.4508249Z [W1204 11:14:22.378952097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4508254Z 
2025-12-04T11:20:45.4508774Z [W1204 11:14:22.379143732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4508783Z 
2025-12-04T11:20:45.4509294Z [W1204 11:14:24.392987944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4509302Z 
2025-12-04T11:20:45.4509823Z [W1204 11:14:24.394908312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4509830Z 
2025-12-04T11:20:45.4510342Z [W1204 11:14:24.395140158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4510347Z 
2025-12-04T11:20:45.4510870Z [W1204 11:14:24.399919835 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4510875Z 
2025-12-04T11:20:45.4511381Z [W1204 11:14:25.400867765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4511386Z 
2025-12-04T11:20:45.4511978Z [W1204 11:14:25.401111230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4511986Z 
2025-12-04T11:20:45.4512495Z [W1204 11:14:25.408117913 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4512499Z 
2025-12-04T11:20:45.4513033Z [W1204 11:14:25.409083366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4513050Z 
2025-12-04T11:20:45.4513558Z [W1204 11:14:25.409309795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4513563Z 
2025-12-04T11:20:45.4513698Z ('RERUN', {'yellow': True}) [19.4004s] [100%]
2025-12-04T11:20:45.4514987Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:14:25.870695527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4515029Z 
2025-12-04T11:20:45.4515539Z [W1204 11:14:25.871554081 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4515544Z 
2025-12-04T11:20:45.4516070Z [W1204 11:14:25.871771369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4516075Z 
2025-12-04T11:20:45.4516580Z [W1204 11:14:25.875981991 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4516585Z 
2025-12-04T11:20:45.4517105Z [W1204 11:14:25.876924062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4517109Z 
2025-12-04T11:20:45.4517620Z [W1204 11:14:25.877128531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4517627Z 
2025-12-04T11:20:45.4518145Z [W1204 11:14:25.883484872 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4518150Z 
2025-12-04T11:20:45.4518657Z [W1204 11:14:25.884223704 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4518664Z 
2025-12-04T11:20:45.4519175Z [W1204 11:14:25.884423197 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4519180Z 
2025-12-04T11:20:45.4519697Z [W1204 11:14:25.978156104 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4519702Z 
2025-12-04T11:20:45.4520212Z [W1204 11:14:25.979008592 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4520219Z 
2025-12-04T11:20:45.4520739Z [W1204 11:14:25.979232045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4520743Z 
2025-12-04T11:20:45.4521254Z [W1204 11:14:25.983484653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4521261Z 
2025-12-04T11:20:45.4521780Z [W1204 11:14:25.984229613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4521785Z 
2025-12-04T11:20:45.4522294Z [W1204 11:14:25.984437175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4522299Z 
2025-12-04T11:20:45.4522893Z [W1204 11:14:25.990863838 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4522900Z 
2025-12-04T11:20:45.4523409Z [W1204 11:14:25.991829601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4523413Z 
2025-12-04T11:20:45.4523922Z [W1204 11:14:25.992034568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4523973Z 
2025-12-04T11:20:45.4524107Z ('RERUN', {'yellow': True}) [0.5425s] [100%]
2025-12-04T11:20:45.4525382Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:14:25.389335682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4525388Z 
2025-12-04T11:20:45.4525918Z [W1204 11:14:25.390173142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4525977Z 
2025-12-04T11:20:45.4526485Z [W1204 11:14:25.390393733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4526490Z 
2025-12-04T11:20:45.4527018Z [W1204 11:14:25.394554876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4527025Z 
2025-12-04T11:20:45.4527536Z [W1204 11:14:25.395441919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4527541Z 
2025-12-04T11:20:45.4528060Z [W1204 11:14:25.395642465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4528064Z 
2025-12-04T11:20:45.4528579Z [W1204 11:14:26.402027556 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4528586Z 
2025-12-04T11:20:45.4529106Z [W1204 11:14:26.402785796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4529111Z 
2025-12-04T11:20:45.4529618Z [W1204 11:14:26.402984402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4529625Z 
2025-12-04T11:20:45.4530131Z [W1204 11:14:26.498959614 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4530149Z 
2025-12-04T11:20:45.4530653Z [W1204 11:14:26.500144064 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4530658Z 
2025-12-04T11:20:45.4531167Z [W1204 11:14:26.500368497 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4531172Z 
2025-12-04T11:20:45.4531694Z [W1204 11:14:26.505317630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4531699Z 
2025-12-04T11:20:45.4532205Z [W1204 11:14:26.506204860 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4532212Z 
2025-12-04T11:20:45.4532729Z [W1204 11:14:26.506405324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4532734Z 
2025-12-04T11:20:45.4533242Z [W1204 11:14:26.512714866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4533247Z 
2025-12-04T11:20:45.4533768Z [W1204 11:14:26.513373224 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4533837Z 
2025-12-04T11:20:45.4534345Z [W1204 11:14:26.513567590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4534357Z 
2025-12-04T11:20:45.4534461Z FAILED [0.5177s] [100%]
2025-12-04T11:20:45.4534478Z 
2025-12-04T11:20:45.4534623Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.4535167Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.4535306Z Traceback (most recent call last):
2025-12-04T11:20:45.4535820Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4536054Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4536618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4536791Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4537382Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4537591Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4537727Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4537736Z 
2025-12-04T11:20:45.4537857Z Expected 1 but got 2.
2025-12-04T11:20:45.4537966Z Absolute difference: 1
2025-12-04T11:20:45.4538079Z Relative difference: 1.0
2025-12-04T11:20:45.4538084Z 
2025-12-04T11:20:45.4538320Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4539231Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4539237Z 
2025-12-04T11:20:45.4539526Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4539753Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4539871Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4540417Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4540650Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4540763Z graph_break []
2025-12-04T11:20:45.4540981Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4542193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4542325Z   if out == self.unknown_value:
2025-12-04T11:20:45.4543054Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4543173Z   warnings.warn(
2025-12-04T11:20:45.4543896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4543999Z   warnings.warn(
2025-12-04T11:20:45.4544520Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.4544643Z Traceback (most recent call last):
2025-12-04T11:20:45.4545148Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4545393Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4545918Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4546096Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4546631Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4546838Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4547018Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4547024Z 
2025-12-04T11:20:45.4547130Z Expected 1 but got 2.
2025-12-04T11:20:45.4547251Z Absolute difference: 1
2025-12-04T11:20:45.4547361Z Relative difference: 1.0
2025-12-04T11:20:45.4547367Z 
2025-12-04T11:20:45.4547585Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4548504Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4548514Z 
2025-12-04T11:20:45.4548783Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4549045Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4549162Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4549689Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4549934Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4550035Z graph_break []
2025-12-04T11:20:45.4550251Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4551471Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4551592Z   if out == self.unknown_value:
2025-12-04T11:20:45.4552334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4552438Z   warnings.warn(
2025-12-04T11:20:45.4553155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4553272Z   warnings.warn(
2025-12-04T11:20:45.4553488Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4553617Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4553847Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4554374Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4554487Z graph_break []
2025-12-04T11:20:45.4554706Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4555429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4555543Z   warnings.warn(
2025-12-04T11:20:45.4556263Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4556381Z   warnings.warn(
2025-12-04T11:20:45.4556531Z =================================== FAILURES ===================================
2025-12-04T11:20:45.4557039Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.4557175Z Traceback (most recent call last):
2025-12-04T11:20:45.4557756Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4558010Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4558473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4558637Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4559185Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4559430Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4559568Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4559574Z 
2025-12-04T11:20:45.4559699Z Expected 1 but got 2.
2025-12-04T11:20:45.4559809Z Absolute difference: 1
2025-12-04T11:20:45.4559935Z Relative difference: 1.0
2025-12-04T11:20:45.4559941Z 
2025-12-04T11:20:45.4560157Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4561073Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4561111Z 
2025-12-04T11:20:45.4561395Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4561616Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4561749Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4562273Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4562502Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4562616Z graph_break []
2025-12-04T11:20:45.4562832Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4564045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4564179Z   if out == self.unknown_value:
2025-12-04T11:20:45.4564900Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4565015Z   warnings.warn(
2025-12-04T11:20:45.4565736Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4565839Z   warnings.warn(
2025-12-04T11:20:45.4566071Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4566187Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4566426Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4566955Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4567056Z graph_break []
2025-12-04T11:20:45.4567284Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4568012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4568116Z   warnings.warn(
2025-12-04T11:20:45.4568847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4568946Z   warnings.warn(
2025-12-04T11:20:45.4569173Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4569291Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4569587Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4570132Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4570232Z graph_break []
2025-12-04T11:20:45.4570449Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4571694Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4571802Z   warnings.warn(
2025-12-04T11:20:45.4572537Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4572641Z   warnings.warn(
2025-12-04T11:20:45.4573487Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4d9221d5ac70ff44.xml -
2025-12-04T11:20:45.4573765Z =========================== short test summary info ============================
2025-12-04T11:20:45.4574713Z FAILED [0.5177s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4574722Z 
2025-12-04T11:20:45.4574847Z Expected 1 but got 2.
2025-12-04T11:20:45.4574958Z Absolute difference: 1
2025-12-04T11:20:45.4575075Z Relative difference: 1.0
2025-12-04T11:20:45.4575080Z 
2025-12-04T11:20:45.4575314Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4576227Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4576233Z 
2025-12-04T11:20:45.4576593Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4576782Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.4576981Z ================== 1 failed, 13 deselected, 2 rerun in 20.49s ==================
2025-12-04T11:20:45.4577099Z Got exit code 1
2025-12-04T11:20:45.4577922Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4578350Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:20:45.4578797Z W1204 11:14:37.939000 91849 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.4579455Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-af9a500a606c950b.xml
2025-12-04T11:20:45.4579651Z ============================= test session starts ==============================
2025-12-04T11:20:45.4580009Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.4580136Z cachedir: .pytest_cache
2025-12-04T11:20:45.4580661Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.4580787Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.4580912Z configfile: pytest.ini
2025-12-04T11:20:45.4581454Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.4581672Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:20:45.4581833Z stepcurrent: skipping 10 already run items.
2025-12-04T11:20:45.4581951Z Running 4 items in this shard
2025-12-04T11:20:45.4581956Z 
2025-12-04T11:20:45.4582930Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.9109s] [ 25%]
2025-12-04T11:20:45.4583785Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4683s] [ 25%]
2025-12-04T11:20:45.4584596Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.4681s] [ 25%]
2025-12-04T11:20:45.4584617Z 
2025-12-04T11:20:45.4584762Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.4585264Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.4585407Z Traceback (most recent call last):
2025-12-04T11:20:45.4585922Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4586192Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4586663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4586827Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4587376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4587581Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4587713Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4587718Z 
2025-12-04T11:20:45.4587836Z Expected 1 but got 2.
2025-12-04T11:20:45.4587944Z Absolute difference: 1
2025-12-04T11:20:45.4588055Z Relative difference: 1.0
2025-12-04T11:20:45.4588074Z 
2025-12-04T11:20:45.4588295Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4589191Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4589198Z 
2025-12-04T11:20:45.4589475Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4589698Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4589834Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4590724Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4590954Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4591068Z graph_break []
2025-12-04T11:20:45.4591290Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4592023Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4592136Z   warnings.warn(
2025-12-04T11:20:45.4592850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4592969Z   warnings.warn(
2025-12-04T11:20:45.4593467Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.4593591Z Traceback (most recent call last):
2025-12-04T11:20:45.4594116Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4594348Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4594894Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4595064Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4595597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4595856Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4595990Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4595995Z 
2025-12-04T11:20:45.4596102Z Expected 1 but got 2.
2025-12-04T11:20:45.4596224Z Absolute difference: 1
2025-12-04T11:20:45.4596341Z Relative difference: 1.0
2025-12-04T11:20:45.4596346Z 
2025-12-04T11:20:45.4596572Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4597475Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4597513Z 
2025-12-04T11:20:45.4597784Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4598014Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4598130Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4599030Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4599260Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4599360Z graph_break []
2025-12-04T11:20:45.4599589Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4600325Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4600446Z   warnings.warn(
2025-12-04T11:20:45.4601166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4601269Z   warnings.warn(
2025-12-04T11:20:45.4601497Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4601618Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4601846Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4602751Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4602850Z graph_break []
2025-12-04T11:20:45.4603082Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4603803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4603908Z   warnings.warn(
2025-12-04T11:20:45.4604642Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4604746Z   warnings.warn(
2025-12-04T11:20:45.4604894Z =================================== FAILURES ===================================
2025-12-04T11:20:45.4605409Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.4605535Z Traceback (most recent call last):
2025-12-04T11:20:45.4606054Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4606357Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4606824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4607006Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4607539Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4607796Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4607930Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4607936Z 
2025-12-04T11:20:45.4608041Z Expected 1 but got 2.
2025-12-04T11:20:45.4608165Z Absolute difference: 1
2025-12-04T11:20:45.4608279Z Relative difference: 1.0
2025-12-04T11:20:45.4608284Z 
2025-12-04T11:20:45.4608501Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4609425Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4609463Z 
2025-12-04T11:20:45.4609735Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4609967Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4610083Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4610972Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4611217Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4611318Z graph_break []
2025-12-04T11:20:45.4611548Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4612284Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4612389Z   warnings.warn(
2025-12-04T11:20:45.4613124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4613225Z   warnings.warn(
2025-12-04T11:20:45.4613457Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4613574Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4613799Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4614697Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4614797Z graph_break []
2025-12-04T11:20:45.4615020Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4615758Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4615860Z   warnings.warn(
2025-12-04T11:20:45.4616666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4616775Z   warnings.warn(
2025-12-04T11:20:45.4616990Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4617120Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4617347Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4618358Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4618460Z graph_break []
2025-12-04T11:20:45.4618675Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4619417Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4619552Z   warnings.warn(
2025-12-04T11:20:45.4620269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4620384Z   warnings.warn(
2025-12-04T11:20:45.4621219Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-af9a500a606c950b.xml -
2025-12-04T11:20:45.4621408Z =========================== short test summary info ============================
2025-12-04T11:20:45.4622345Z FAILED [0.4681s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4622463Z 
2025-12-04T11:20:45.4622572Z Expected 1 but got 2.
2025-12-04T11:20:45.4622701Z Absolute difference: 1
2025-12-04T11:20:45.4622814Z Relative difference: 1.0
2025-12-04T11:20:45.4622822Z 
2025-12-04T11:20:45.4623056Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4623958Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4623963Z 
2025-12-04T11:20:45.4624230Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4624432Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.4624637Z ================== 1 failed, 10 deselected, 2 rerun in 4.88s ===================
2025-12-04T11:20:45.4624756Z Got exit code 1
2025-12-04T11:20:45.4624867Z Retrying single test...
2025-12-04T11:20:45.4625317Z W1204 11:14:58.701000 92018 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.4625988Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e3ba96547605fc4e.xml
2025-12-04T11:20:45.4626158Z ============================= test session starts ==============================
2025-12-04T11:20:45.4626523Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.4626637Z cachedir: .pytest_cache
2025-12-04T11:20:45.4627160Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.4627299Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.4627413Z configfile: pytest.ini
2025-12-04T11:20:45.4627955Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.4628188Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.4629169Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4629304Z Running 1 items in this shard
2025-12-04T11:20:45.4629309Z 
2025-12-04T11:20:45.4630585Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:15:04.612942651 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4630592Z 
2025-12-04T11:20:45.4631199Z [W1204 11:15:20.656322297 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4631207Z 
2025-12-04T11:20:45.4631720Z [W1204 11:15:20.656607339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4631726Z 
2025-12-04T11:20:45.4632283Z [W1204 11:15:20.664107438 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4632302Z 
2025-12-04T11:20:45.4632813Z [W1204 11:15:20.664894640 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4632818Z 
2025-12-04T11:20:45.4633328Z [W1204 11:15:20.665092521 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4633333Z 
2025-12-04T11:20:45.4633857Z [W1204 11:15:20.672167739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4633898Z 
2025-12-04T11:20:45.4634408Z [W1204 11:15:20.673012766 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4634413Z 
2025-12-04T11:20:45.4634934Z [W1204 11:15:20.673200949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4634941Z 
2025-12-04T11:20:45.4635447Z [W1204 11:15:20.813031816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4635452Z 
2025-12-04T11:20:45.4635976Z [W1204 11:15:20.814898318 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4635981Z 
2025-12-04T11:20:45.4636504Z [W1204 11:15:20.815116540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4636511Z 
2025-12-04T11:20:45.4637032Z [W1204 11:15:20.819175902 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4637037Z 
2025-12-04T11:20:45.4637545Z [W1204 11:15:20.819866706 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4637554Z 
2025-12-04T11:20:45.4638066Z [W1204 11:15:20.820112805 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4638071Z 
2025-12-04T11:20:45.4638601Z [W1204 11:15:20.826289842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4638606Z 
2025-12-04T11:20:45.4639122Z [W1204 11:15:20.826978992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4639129Z 
2025-12-04T11:20:45.4639654Z [W1204 11:15:20.827176986 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4639659Z 
2025-12-04T11:20:45.4639794Z ('RERUN', {'yellow': True}) [19.9647s] [100%]
2025-12-04T11:20:45.4641072Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:15:20.249121143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4641078Z 
2025-12-04T11:20:45.4641593Z [W1204 11:15:20.249967141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4641597Z 
2025-12-04T11:20:45.4642185Z [W1204 11:15:20.250208854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4642193Z 
2025-12-04T11:20:45.4642702Z [W1204 11:15:20.254575877 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4642707Z 
2025-12-04T11:20:45.4643213Z [W1204 11:15:20.255344958 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4643265Z 
2025-12-04T11:20:45.4643778Z [W1204 11:15:20.255554601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4643783Z 
2025-12-04T11:20:45.4644288Z [W1204 11:15:20.262250594 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4644293Z 
2025-12-04T11:20:45.4644822Z [W1204 11:15:20.263066920 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4644856Z 
2025-12-04T11:20:45.4645365Z [W1204 11:15:20.263273581 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4645370Z 
2025-12-04T11:20:45.4645892Z [W1204 11:15:20.357260292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4645899Z 
2025-12-04T11:20:45.4646410Z [W1204 11:15:20.358106008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4646415Z 
2025-12-04T11:20:45.4646937Z [W1204 11:15:20.358327520 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4646942Z 
2025-12-04T11:20:45.4647456Z [W1204 11:15:20.362574241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4647461Z 
2025-12-04T11:20:45.4647989Z [W1204 11:15:20.363336869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4647994Z 
2025-12-04T11:20:45.4648508Z [W1204 11:15:20.363547270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4648515Z 
2025-12-04T11:20:45.4649025Z [W1204 11:15:20.369934140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4649047Z 
2025-12-04T11:20:45.4649563Z [W1204 11:15:20.370967990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4649568Z 
2025-12-04T11:20:45.4650075Z [W1204 11:15:20.371184552 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4650085Z 
2025-12-04T11:20:45.4650235Z ('RERUN', {'yellow': True}) [0.5058s] [100%]
2025-12-04T11:20:45.4651502Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:15:21.731677620 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4651510Z 
2025-12-04T11:20:45.4652040Z [W1204 11:15:21.732491375 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4652044Z 
2025-12-04T11:20:45.4652554Z [W1204 11:15:21.732716373 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4652559Z 
2025-12-04T11:20:45.4653086Z [W1204 11:15:21.737022251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4653150Z 
2025-12-04T11:20:45.4653663Z [W1204 11:15:21.737730042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4653670Z 
2025-12-04T11:20:45.4654173Z [W1204 11:15:21.737932915 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4654225Z 
2025-12-04T11:20:45.4654737Z [W1204 11:15:21.744431222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4654742Z 
2025-12-04T11:20:45.4655252Z [W1204 11:15:21.745191043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4655257Z 
2025-12-04T11:20:45.4655775Z [W1204 11:15:21.745393256 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4655780Z 
2025-12-04T11:20:45.4656364Z [W1204 11:15:21.838075303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4656407Z 
2025-12-04T11:20:45.4656931Z [W1204 11:15:21.838871878 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4656936Z 
2025-12-04T11:20:45.4657446Z [W1204 11:15:21.839082754 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4657450Z 
2025-12-04T11:20:45.4657970Z [W1204 11:15:21.843124574 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4657975Z 
2025-12-04T11:20:45.4658479Z [W1204 11:15:21.843813584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4658484Z 
2025-12-04T11:20:45.4659009Z [W1204 11:15:21.844017226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4659016Z 
2025-12-04T11:20:45.4659524Z [W1204 11:15:21.850178145 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4659529Z 
2025-12-04T11:20:45.4660037Z [W1204 11:15:21.851063212 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4660058Z 
2025-12-04T11:20:45.4660565Z [W1204 11:15:21.851263279 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4660570Z 
2025-12-04T11:20:45.4660672Z FAILED [0.4765s] [100%]
2025-12-04T11:20:45.4660678Z 
2025-12-04T11:20:45.4660840Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.4661346Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.4661486Z Traceback (most recent call last):
2025-12-04T11:20:45.4661999Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4662231Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4662711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4662878Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4663414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4663634Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4663769Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4663774Z 
2025-12-04T11:20:45.4663892Z Expected 1 but got 2.
2025-12-04T11:20:45.4664081Z Absolute difference: 1
2025-12-04T11:20:45.4664201Z Relative difference: 1.0
2025-12-04T11:20:45.4664206Z 
2025-12-04T11:20:45.4664437Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4665338Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4665374Z 
2025-12-04T11:20:45.4665656Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4665880Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4665998Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4666903Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4667142Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4667278Z graph_break []
2025-12-04T11:20:45.4667511Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4668721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4668856Z   if out == self.unknown_value:
2025-12-04T11:20:45.4669580Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4669684Z   warnings.warn(
2025-12-04T11:20:45.4670422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4670528Z   warnings.warn(
2025-12-04T11:20:45.4671303Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.4671503Z Traceback (most recent call last):
2025-12-04T11:20:45.4672023Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4672272Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4672733Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4672911Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4673447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4673654Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4673806Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4673815Z 
2025-12-04T11:20:45.4673921Z Expected 1 but got 2.
2025-12-04T11:20:45.4674032Z Absolute difference: 1
2025-12-04T11:20:45.4674159Z Relative difference: 1.0
2025-12-04T11:20:45.4674164Z 
2025-12-04T11:20:45.4674381Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4675294Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4675302Z 
2025-12-04T11:20:45.4675574Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4675796Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4675930Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4676955Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4677204Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4677306Z graph_break []
2025-12-04T11:20:45.4677526Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4678805Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4678926Z   if out == self.unknown_value:
2025-12-04T11:20:45.4679664Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4679770Z   warnings.warn(
2025-12-04T11:20:45.4680494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4680674Z   warnings.warn(
2025-12-04T11:20:45.4680893Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4681009Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4681253Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4682145Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4682261Z graph_break []
2025-12-04T11:20:45.4682478Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4683205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4683326Z   warnings.warn(
2025-12-04T11:20:45.4684045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4684164Z   warnings.warn(
2025-12-04T11:20:45.4684314Z =================================== FAILURES ===================================
2025-12-04T11:20:45.4684817Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.4684956Z Traceback (most recent call last):
2025-12-04T11:20:45.4685465Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4685697Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4686172Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4686335Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4686886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4687092Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4687225Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4687233Z 
2025-12-04T11:20:45.4687352Z Expected 1 but got 2.
2025-12-04T11:20:45.4687463Z Absolute difference: 1
2025-12-04T11:20:45.4687574Z Relative difference: 1.0
2025-12-04T11:20:45.4687593Z 
2025-12-04T11:20:45.4687813Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4688719Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4688724Z 
2025-12-04T11:20:45.4689075Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4689301Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4689433Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4690321Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4690582Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4690699Z graph_break []
2025-12-04T11:20:45.4690916Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4692131Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4692264Z   if out == self.unknown_value:
2025-12-04T11:20:45.4693023Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4693138Z   warnings.warn(
2025-12-04T11:20:45.4693861Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4693967Z   warnings.warn(
2025-12-04T11:20:45.4694197Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4694315Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4694558Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4695447Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4695548Z graph_break []
2025-12-04T11:20:45.4695774Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4696572Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4696692Z   warnings.warn(
2025-12-04T11:20:45.4697411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4697512Z   warnings.warn(
2025-12-04T11:20:45.4697745Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4697861Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4698088Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4698999Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4699103Z graph_break []
2025-12-04T11:20:45.4699332Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4700054Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4700161Z   warnings.warn(
2025-12-04T11:20:45.4700895Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4700997Z   warnings.warn(
2025-12-04T11:20:45.4701923Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e3ba96547605fc4e.xml -
2025-12-04T11:20:45.4702103Z =========================== short test summary info ============================
2025-12-04T11:20:45.4703044Z FAILED [0.4765s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4703088Z 
2025-12-04T11:20:45.4703214Z Expected 1 but got 2.
2025-12-04T11:20:45.4703326Z Absolute difference: 1
2025-12-04T11:20:45.4703438Z Relative difference: 1.0
2025-12-04T11:20:45.4703457Z 
2025-12-04T11:20:45.4703678Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4704578Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4704584Z 
2025-12-04T11:20:45.4704872Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4705108Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.4705320Z ================== 1 failed, 13 deselected, 2 rerun in 20.98s ==================
2025-12-04T11:20:45.4705421Z Got exit code 1
2025-12-04T11:20:45.4705530Z Retrying single test...
2025-12-04T11:20:45.4705986Z W1204 11:15:33.271000 92193 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.4706648Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ce470e45644e1cc6.xml
2025-12-04T11:20:45.4706815Z ============================= test session starts ==============================
2025-12-04T11:20:45.4707181Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.4707294Z cachedir: .pytest_cache
2025-12-04T11:20:45.4707830Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.4707959Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.4708070Z configfile: pytest.ini
2025-12-04T11:20:45.4708625Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.4708847Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.4709832Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4709961Z Running 1 items in this shard
2025-12-04T11:20:45.4709966Z 
2025-12-04T11:20:45.4711239Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:15:38.196016043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4711249Z 
2025-12-04T11:20:45.4711784Z [W1204 11:15:54.368207465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4711789Z 
2025-12-04T11:20:45.4712303Z [W1204 11:15:54.368466367 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4712310Z 
2025-12-04T11:20:45.4712835Z [W1204 11:15:54.375807917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4712841Z 
2025-12-04T11:20:45.4713353Z [W1204 11:15:54.376551806 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4713358Z 
2025-12-04T11:20:45.4713955Z [W1204 11:15:54.376747358 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4713962Z 
2025-12-04T11:20:45.4714472Z [W1204 11:15:54.383710601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4714477Z 
2025-12-04T11:20:45.4714999Z [W1204 11:15:54.384504814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4715036Z 
2025-12-04T11:20:45.4715544Z [W1204 11:15:54.384708070 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4715549Z 
2025-12-04T11:20:45.4716053Z [W1204 11:15:55.521059950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4716071Z 
2025-12-04T11:20:45.4716588Z [W1204 11:15:55.522823241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4716626Z 
2025-12-04T11:20:45.4717134Z [W1204 11:15:55.523036471 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4717139Z 
2025-12-04T11:20:45.4717663Z [W1204 11:15:55.527037880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4717671Z 
2025-12-04T11:20:45.4718179Z [W1204 11:15:55.527706923 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4718184Z 
2025-12-04T11:20:45.4718707Z [W1204 11:15:55.527907937 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4718711Z 
2025-12-04T11:20:45.4719226Z [W1204 11:15:55.534007333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4719233Z 
2025-12-04T11:20:45.4719757Z [W1204 11:15:55.534702336 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4719762Z 
2025-12-04T11:20:45.4720268Z [W1204 11:15:55.534900175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4720275Z 
2025-12-04T11:20:45.4720416Z ('RERUN', {'yellow': True}) [20.0760s] [100%]
2025-12-04T11:20:45.4721697Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:15:55.937847188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4721703Z 
2025-12-04T11:20:45.4722222Z [W1204 11:15:55.938631694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4722229Z 
2025-12-04T11:20:45.4722757Z [W1204 11:15:55.938835962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4722761Z 
2025-12-04T11:20:45.4723274Z [W1204 11:15:55.942949636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4723282Z 
2025-12-04T11:20:45.4723807Z [W1204 11:15:55.943604907 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4723812Z 
2025-12-04T11:20:45.4724321Z [W1204 11:15:55.943799420 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4724326Z 
2025-12-04T11:20:45.4724913Z [W1204 11:15:55.949947866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4724921Z 
2025-12-04T11:20:45.4725428Z [W1204 11:15:55.950689143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4725433Z 
2025-12-04T11:20:45.4725954Z [W1204 11:15:55.950886649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4725990Z 
2025-12-04T11:20:45.4726495Z [W1204 11:15:55.041243801 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4726500Z 
2025-12-04T11:20:45.4727010Z [W1204 11:15:55.042043151 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4727028Z 
2025-12-04T11:20:45.4727535Z [W1204 11:15:55.042257820 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4727544Z 
2025-12-04T11:20:45.4728088Z [W1204 11:15:55.046260191 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4728093Z 
2025-12-04T11:20:45.4728612Z [W1204 11:15:55.046930396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4728619Z 
2025-12-04T11:20:45.4729131Z [W1204 11:15:55.047133715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4729137Z 
2025-12-04T11:20:45.4729656Z [W1204 11:15:55.053277850 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4729661Z 
2025-12-04T11:20:45.4730168Z [W1204 11:15:55.054146214 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4730173Z 
2025-12-04T11:20:45.4730698Z [W1204 11:15:55.054344140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4730706Z 
2025-12-04T11:20:45.4730838Z ('RERUN', {'yellow': True}) [0.4799s] [100%]
2025-12-04T11:20:45.4732111Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:15:55.393978117 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4732131Z 
2025-12-04T11:20:45.4732640Z [W1204 11:15:55.394745204 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4732645Z 
2025-12-04T11:20:45.4733153Z [W1204 11:15:55.394950183 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4733163Z 
2025-12-04T11:20:45.4733678Z [W1204 11:15:55.399053287 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4733688Z 
2025-12-04T11:20:45.4734200Z [W1204 11:15:55.399713580 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4734207Z 
2025-12-04T11:20:45.4734724Z [W1204 11:15:55.399909314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4734729Z 
2025-12-04T11:20:45.4735236Z [W1204 11:15:56.406172827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4735241Z 
2025-12-04T11:20:45.4735761Z [W1204 11:15:56.406843800 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4735766Z 
2025-12-04T11:20:45.4736415Z [W1204 11:15:56.407037737 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4736425Z 
2025-12-04T11:20:45.4736953Z [W1204 11:15:56.498214108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4736959Z 
2025-12-04T11:20:45.4737496Z [W1204 11:15:56.499030838 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4737500Z 
2025-12-04T11:20:45.4738007Z [W1204 11:15:56.499248261 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4738027Z 
2025-12-04T11:20:45.4738534Z [W1204 11:15:56.503367065 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4738539Z 
2025-12-04T11:20:45.4739049Z [W1204 11:15:56.504077510 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4739085Z 
2025-12-04T11:20:45.4739606Z [W1204 11:15:56.504282020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4739611Z 
2025-12-04T11:20:45.4740119Z [W1204 11:15:56.510458116 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4740126Z 
2025-12-04T11:20:45.4740787Z [W1204 11:15:56.511346507 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4740792Z 
2025-12-04T11:20:45.4741302Z [W1204 11:15:56.511549959 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4741307Z 
2025-12-04T11:20:45.4741427Z FAILED [0.4558s] [100%]
2025-12-04T11:20:45.4741437Z 
2025-12-04T11:20:45.4741580Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.4742084Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.4742226Z Traceback (most recent call last):
2025-12-04T11:20:45.4742738Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4742988Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4743454Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4743620Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4744170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4744383Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4744520Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4744540Z 
2025-12-04T11:20:45.4744648Z Expected 1 but got 2.
2025-12-04T11:20:45.4744757Z Absolute difference: 1
2025-12-04T11:20:45.4744884Z Relative difference: 1.0
2025-12-04T11:20:45.4744889Z 
2025-12-04T11:20:45.4745109Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4746010Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4746016Z 
2025-12-04T11:20:45.4746301Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4746521Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4746651Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4747612Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4747845Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4747957Z graph_break []
2025-12-04T11:20:45.4748173Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4749436Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4749554Z   if out == self.unknown_value:
2025-12-04T11:20:45.4750279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4750398Z   warnings.warn(
2025-12-04T11:20:45.4751123Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4751275Z   warnings.warn(
2025-12-04T11:20:45.4751778Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.4751903Z Traceback (most recent call last):
2025-12-04T11:20:45.4752429Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4752662Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4753120Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4753299Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4753836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4754059Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4754193Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4754198Z 
2025-12-04T11:20:45.4754306Z Expected 1 but got 2.
2025-12-04T11:20:45.4754430Z Absolute difference: 1
2025-12-04T11:20:45.4754542Z Relative difference: 1.0
2025-12-04T11:20:45.4754550Z 
2025-12-04T11:20:45.4754767Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4755677Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4755683Z 
2025-12-04T11:20:45.4755953Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4756186Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4756307Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4757194Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4757436Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4757537Z graph_break []
2025-12-04T11:20:45.4757766Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4758975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4759092Z   if out == self.unknown_value:
2025-12-04T11:20:45.4759914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4760020Z   warnings.warn(
2025-12-04T11:20:45.4760754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4760859Z   warnings.warn(
2025-12-04T11:20:45.4761077Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4761241Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4761467Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4762374Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4762473Z graph_break []
2025-12-04T11:20:45.4762694Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4763431Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4763575Z   warnings.warn(
2025-12-04T11:20:45.4764292Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4764410Z   warnings.warn(
2025-12-04T11:20:45.4764556Z =================================== FAILURES ===================================
2025-12-04T11:20:45.4765068Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.4765193Z Traceback (most recent call last):
2025-12-04T11:20:45.4765701Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4765950Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4766409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4766586Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4767123Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4767334Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4767480Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4767484Z 
2025-12-04T11:20:45.4767591Z Expected 1 but got 2.
2025-12-04T11:20:45.4767698Z Absolute difference: 1
2025-12-04T11:20:45.4767820Z Relative difference: 1.0
2025-12-04T11:20:45.4767825Z 
2025-12-04T11:20:45.4768039Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4768962Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4768969Z 
2025-12-04T11:20:45.4769240Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4769458Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4769587Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4770474Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4770713Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4770814Z graph_break []
2025-12-04T11:20:45.4771263Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4772764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4772889Z   if out == self.unknown_value:
2025-12-04T11:20:45.4773628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4773778Z   warnings.warn(
2025-12-04T11:20:45.4774494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4774611Z   warnings.warn(
2025-12-04T11:20:45.4774831Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4774947Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4775191Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4776084Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4776249Z graph_break []
2025-12-04T11:20:45.4776537Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4777271Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4777391Z   warnings.warn(
2025-12-04T11:20:45.4778111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4778228Z   warnings.warn(
2025-12-04T11:20:45.4778447Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4778569Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4778814Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4779698Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.4779800Z graph_break []
2025-12-04T11:20:45.4780031Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4780753Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4780866Z   warnings.warn(
2025-12-04T11:20:45.4781585Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4781692Z   warnings.warn(
2025-12-04T11:20:45.4782545Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ce470e45644e1cc6.xml -
2025-12-04T11:20:45.4782721Z =========================== short test summary info ============================
2025-12-04T11:20:45.4783684Z FAILED [0.4558s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4783693Z 
2025-12-04T11:20:45.4783800Z Expected 1 but got 2.
2025-12-04T11:20:45.4783911Z Absolute difference: 1
2025-12-04T11:20:45.4784036Z Relative difference: 1.0
2025-12-04T11:20:45.4784040Z 
2025-12-04T11:20:45.4784259Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4785779Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4785788Z 
2025-12-04T11:20:45.4786065Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4786259Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.4786478Z ================== 1 failed, 13 deselected, 2 rerun in 21.05s ==================
2025-12-04T11:20:45.4786618Z Got exit code 1
2025-12-04T11:20:45.4787453Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.4787867Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:20:45.4788319Z W1204 11:16:08.082000 92367 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.4789006Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cece0bb00c5477e6.xml
2025-12-04T11:20:45.4789207Z ============================= test session starts ==============================
2025-12-04T11:20:45.4789571Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.4789688Z cachedir: .pytest_cache
2025-12-04T11:20:45.4790211Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.4790352Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.4790463Z configfile: pytest.ini
2025-12-04T11:20:45.4791006Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.4791240Z collecting ... collected 58 items / 11 deselected / 47 selected
2025-12-04T11:20:45.4791391Z stepcurrent: skipping 11 already run items.
2025-12-04T11:20:45.4791528Z Running 3 items in this shard
2025-12-04T11:20:45.4791535Z 
2025-12-04T11:20:45.4792407Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [4.0197s] [ 33%]
2025-12-04T11:20:45.4793271Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5183s] [ 33%]
2025-12-04T11:20:45.4794071Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.5161s] [ 33%]
2025-12-04T11:20:45.4794076Z 
2025-12-04T11:20:45.4794221Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.4794754Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.4794884Z Traceback (most recent call last):
2025-12-04T11:20:45.4795402Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4795652Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4796118Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4796298Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4796835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4797042Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4797193Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4797199Z 
2025-12-04T11:20:45.4797308Z Expected 1 but got 2.
2025-12-04T11:20:45.4797499Z Absolute difference: 1
2025-12-04T11:20:45.4797630Z Relative difference: 1.0
2025-12-04T11:20:45.4797637Z 
2025-12-04T11:20:45.4797853Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4798773Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4798813Z 
2025-12-04T11:20:45.4799083Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4799302Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4799432Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4799964Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4800212Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4800316Z graph_break []
2025-12-04T11:20:45.4800570Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4801316Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4801426Z   warnings.warn(
2025-12-04T11:20:45.4802160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4802268Z   warnings.warn(
2025-12-04T11:20:45.4802775Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.4802912Z Traceback (most recent call last):
2025-12-04T11:20:45.4803425Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4803661Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4804136Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4804302Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4804846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4805056Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4805191Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4805197Z 
2025-12-04T11:20:45.4805317Z Expected 1 but got 2.
2025-12-04T11:20:45.4805425Z Absolute difference: 1
2025-12-04T11:20:45.4805536Z Relative difference: 1.0
2025-12-04T11:20:45.4805554Z 
2025-12-04T11:20:45.4805771Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4806683Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4806691Z 
2025-12-04T11:20:45.4806978Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4807198Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4807320Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4807864Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4808097Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4808212Z graph_break []
2025-12-04T11:20:45.4808428Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4809224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4809343Z   warnings.warn(
2025-12-04T11:20:45.4810067Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4810187Z   warnings.warn(
2025-12-04T11:20:45.4810402Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4810634Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4810875Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4811403Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4811503Z graph_break []
2025-12-04T11:20:45.4811737Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4812466Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4812611Z   warnings.warn(
2025-12-04T11:20:45.4813328Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4813432Z   warnings.warn(
2025-12-04T11:20:45.4813594Z =================================== FAILURES ===================================
2025-12-04T11:20:45.4814106Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.4814245Z Traceback (most recent call last):
2025-12-04T11:20:45.4814753Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4814985Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4815463Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4815630Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4816164Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4816466Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4816610Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4816616Z 
2025-12-04T11:20:45.4816739Z Expected 1 but got 2.
2025-12-04T11:20:45.4816851Z Absolute difference: 1
2025-12-04T11:20:45.4816965Z Relative difference: 1.0
2025-12-04T11:20:45.4816970Z 
2025-12-04T11:20:45.4817205Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4818124Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4818131Z 
2025-12-04T11:20:45.4818417Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4818642Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4818759Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4819304Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4819534Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4819635Z graph_break []
2025-12-04T11:20:45.4819864Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4820595Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4820712Z   warnings.warn(
2025-12-04T11:20:45.4821511Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4821618Z   warnings.warn(
2025-12-04T11:20:45.4821850Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4821966Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4822230Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4822775Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4822878Z graph_break []
2025-12-04T11:20:45.4823109Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4823833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4823936Z   warnings.warn(
2025-12-04T11:20:45.4824707Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4824807Z   warnings.warn(
2025-12-04T11:20:45.4825035Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4825152Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4825379Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4825916Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4826015Z graph_break []
2025-12-04T11:20:45.4826228Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4826964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4827068Z   warnings.warn(
2025-12-04T11:20:45.4827794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4827894Z   warnings.warn(
2025-12-04T11:20:45.4828738Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cece0bb00c5477e6.xml -
2025-12-04T11:20:45.4828925Z =========================== short test summary info ============================
2025-12-04T11:20:45.4829871Z FAILED [0.5161s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4829877Z 
2025-12-04T11:20:45.4829995Z Expected 1 but got 2.
2025-12-04T11:20:45.4830109Z Absolute difference: 1
2025-12-04T11:20:45.4830223Z Relative difference: 1.0
2025-12-04T11:20:45.4830228Z 
2025-12-04T11:20:45.4830458Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4831369Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4831376Z 
2025-12-04T11:20:45.4831656Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4831837Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.4832036Z ================== 1 failed, 11 deselected, 2 rerun in 5.09s ===================
2025-12-04T11:20:45.4832150Z Got exit code 1
2025-12-04T11:20:45.4832258Z Retrying single test...
2025-12-04T11:20:45.4832772Z W1204 11:16:28.792000 92544 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.4833447Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4e672e5e3ae6046c.xml
2025-12-04T11:20:45.4833615Z ============================= test session starts ==============================
2025-12-04T11:20:45.4833977Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.4834122Z cachedir: .pytest_cache
2025-12-04T11:20:45.4834638Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.4834778Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.4834887Z configfile: pytest.ini
2025-12-04T11:20:45.4835438Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.4835663Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.4836687Z stepcurrent: skipping 11 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4836818Z Running 1 items in this shard
2025-12-04T11:20:45.4836824Z 
2025-12-04T11:20:45.4838109Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:16:32.915856585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4838115Z 
2025-12-04T11:20:45.4838647Z [W1204 11:16:48.167318131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4838652Z 
2025-12-04T11:20:45.4839170Z [W1204 11:16:48.167579688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4839177Z 
2025-12-04T11:20:45.4839699Z [W1204 11:16:48.175158765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4839705Z 
2025-12-04T11:20:45.4840215Z [W1204 11:16:48.175921119 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4840222Z 
2025-12-04T11:20:45.4840728Z [W1204 11:16:48.176117512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4840747Z 
2025-12-04T11:20:45.4841253Z [W1204 11:16:48.183213596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4841258Z 
2025-12-04T11:20:45.4841769Z [W1204 11:16:48.183899664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4841776Z 
2025-12-04T11:20:45.4842298Z [W1204 11:16:48.184087193 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4842303Z 
2025-12-04T11:20:45.4842810Z [W1204 11:16:50.190786562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4842818Z 
2025-12-04T11:20:45.4843338Z [W1204 11:16:50.192556017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4843344Z 
2025-12-04T11:20:45.4843853Z [W1204 11:16:50.192776658 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4843857Z 
2025-12-04T11:20:45.4844465Z [W1204 11:16:50.196826901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4844471Z 
2025-12-04T11:20:45.4844982Z [W1204 11:16:50.197502434 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4844988Z 
2025-12-04T11:20:45.4845508Z [W1204 11:16:50.197705888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4845542Z 
2025-12-04T11:20:45.4846051Z [W1204 11:16:50.203877411 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4846056Z 
2025-12-04T11:20:45.4846562Z [W1204 11:16:50.204554953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4846580Z 
2025-12-04T11:20:45.4847087Z [W1204 11:16:50.204760846 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4847096Z 
2025-12-04T11:20:45.4847234Z ('RERUN', {'yellow': True}) [20.2529s] [100%]
2025-12-04T11:20:45.4848554Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:16:51.657032587 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4848562Z 
2025-12-04T11:20:45.4849073Z [W1204 11:16:51.657874289 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4849078Z 
2025-12-04T11:20:45.4849601Z [W1204 11:16:51.658088740 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4849606Z 
2025-12-04T11:20:45.4850112Z [W1204 11:16:51.662327076 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4850121Z 
2025-12-04T11:20:45.4850642Z [W1204 11:16:51.663247874 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4850647Z 
2025-12-04T11:20:45.4851155Z [W1204 11:16:51.663455990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4851162Z 
2025-12-04T11:20:45.4851678Z [W1204 11:16:51.669747936 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4851683Z 
2025-12-04T11:20:45.4852189Z [W1204 11:16:51.670484810 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4852195Z 
2025-12-04T11:20:45.4852702Z [W1204 11:16:51.670689017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4852724Z 
2025-12-04T11:20:45.4853232Z [W1204 11:16:51.763533023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4853239Z 
2025-12-04T11:20:45.4853744Z [W1204 11:16:51.764363973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4853752Z 
2025-12-04T11:20:45.4854280Z [W1204 11:16:51.764592492 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4854286Z 
2025-12-04T11:20:45.4854794Z [W1204 11:16:51.768705393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4854799Z 
2025-12-04T11:20:45.4855320Z [W1204 11:16:51.769389916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4855324Z 
2025-12-04T11:20:45.4855891Z [W1204 11:16:51.769587122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4855899Z 
2025-12-04T11:20:45.4856514Z [W1204 11:16:51.775848620 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4856520Z 
2025-12-04T11:20:45.4857070Z [W1204 11:16:51.776789704 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4857075Z 
2025-12-04T11:20:45.4857584Z [W1204 11:16:51.776990675 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4857603Z 
2025-12-04T11:20:45.4857739Z ('RERUN', {'yellow': True}) [0.5341s] [100%]
2025-12-04T11:20:45.4859022Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:16:51.166470451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4859058Z 
2025-12-04T11:20:45.4859588Z [W1204 11:16:51.167279693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4859592Z 
2025-12-04T11:20:45.4860108Z [W1204 11:16:51.167491178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4860113Z 
2025-12-04T11:20:45.4860636Z [W1204 11:16:51.171624226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4860640Z 
2025-12-04T11:20:45.4861151Z [W1204 11:16:51.172466790 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4861156Z 
2025-12-04T11:20:45.4861688Z [W1204 11:16:51.172674957 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4861696Z 
2025-12-04T11:20:45.4862204Z [W1204 11:16:51.178838112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4862209Z 
2025-12-04T11:20:45.4862726Z [W1204 11:16:51.179496581 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4862734Z 
2025-12-04T11:20:45.4863241Z [W1204 11:16:51.179685508 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4863246Z 
2025-12-04T11:20:45.4863758Z [W1204 11:16:51.271568151 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4863779Z 
2025-12-04T11:20:45.4864291Z [W1204 11:16:51.273564027 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4864298Z 
2025-12-04T11:20:45.4864802Z [W1204 11:16:51.273791397 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4864807Z 
2025-12-04T11:20:45.4865331Z [W1204 11:16:51.279240334 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4865338Z 
2025-12-04T11:20:45.4865850Z [W1204 11:16:51.280365366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4865855Z 
2025-12-04T11:20:45.4866382Z [W1204 11:16:51.280609663 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4866387Z 
2025-12-04T11:20:45.4866957Z [W1204 11:16:51.287511313 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4866964Z 
2025-12-04T11:20:45.4867485Z [W1204 11:16:51.288800517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4867490Z 
2025-12-04T11:20:45.4868000Z [W1204 11:16:51.289001309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4868037Z 
2025-12-04T11:20:45.4868143Z FAILED [0.5095s] [100%]
2025-12-04T11:20:45.4868163Z 
2025-12-04T11:20:45.4868310Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.4868830Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.4868968Z Traceback (most recent call last):
2025-12-04T11:20:45.4869487Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4869718Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4870233Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4870396Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4871140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4871453Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4871614Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4871619Z 
2025-12-04T11:20:45.4871744Z Expected 1 but got 2.
2025-12-04T11:20:45.4871857Z Absolute difference: 1
2025-12-04T11:20:45.4871968Z Relative difference: 1.0
2025-12-04T11:20:45.4871973Z 
2025-12-04T11:20:45.4872207Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4873123Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4873131Z 
2025-12-04T11:20:45.4873416Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4873640Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4873760Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4874305Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4874534Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4874648Z graph_break []
2025-12-04T11:20:45.4874864Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4876077Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4876210Z   if out == self.unknown_value:
2025-12-04T11:20:45.4876933Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4877052Z   warnings.warn(
2025-12-04T11:20:45.4877773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4877878Z   warnings.warn(
2025-12-04T11:20:45.4878406Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.4878530Z Traceback (most recent call last):
2025-12-04T11:20:45.4879178Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4879426Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4879886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4880071Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4880651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4880859Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4881007Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4881012Z 
2025-12-04T11:20:45.4881121Z Expected 1 but got 2.
2025-12-04T11:20:45.4881242Z Absolute difference: 1
2025-12-04T11:20:45.4881354Z Relative difference: 1.0
2025-12-04T11:20:45.4881359Z 
2025-12-04T11:20:45.4881580Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4882510Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4882585Z 
2025-12-04T11:20:45.4882858Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4883096Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4883216Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4883744Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4883989Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4884091Z graph_break []
2025-12-04T11:20:45.4884308Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4885534Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4885654Z   if out == self.unknown_value:
2025-12-04T11:20:45.4886391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4886497Z   warnings.warn(
2025-12-04T11:20:45.4887218Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4887334Z   warnings.warn(
2025-12-04T11:20:45.4887553Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4887682Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4887915Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4888446Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4888559Z graph_break []
2025-12-04T11:20:45.4888774Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4889499Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4889617Z   warnings.warn(
2025-12-04T11:20:45.4890336Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4890450Z   warnings.warn(
2025-12-04T11:20:45.4890597Z =================================== FAILURES ===================================
2025-12-04T11:20:45.4891178Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.4891322Z Traceback (most recent call last):
2025-12-04T11:20:45.4891830Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4892080Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4892574Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4892739Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4893291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4893497Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4893631Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4893636Z 
2025-12-04T11:20:45.4893763Z Expected 1 but got 2.
2025-12-04T11:20:45.4893874Z Absolute difference: 1
2025-12-04T11:20:45.4894030Z Relative difference: 1.0
2025-12-04T11:20:45.4894035Z 
2025-12-04T11:20:45.4894251Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4895156Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4895165Z 
2025-12-04T11:20:45.4895449Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4895668Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4895798Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4896396Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4896634Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4896751Z graph_break []
2025-12-04T11:20:45.4896966Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4898175Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4898310Z   if out == self.unknown_value:
2025-12-04T11:20:45.4899033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4899151Z   warnings.warn(
2025-12-04T11:20:45.4899867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4899974Z   warnings.warn(
2025-12-04T11:20:45.4900208Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4900328Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4900570Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4901100Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4901202Z graph_break []
2025-12-04T11:20:45.4901433Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4902152Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4902257Z   warnings.warn(
2025-12-04T11:20:45.4903053Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4903156Z   warnings.warn(
2025-12-04T11:20:45.4903389Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4903505Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4903733Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4904271Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4904404Z graph_break []
2025-12-04T11:20:45.4904619Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4905362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4905464Z   warnings.warn(
2025-12-04T11:20:45.4906199Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4906334Z   warnings.warn(
2025-12-04T11:20:45.4907167Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4e672e5e3ae6046c.xml -
2025-12-04T11:20:45.4907351Z =========================== short test summary info ============================
2025-12-04T11:20:45.4908304Z FAILED [0.5095s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4908311Z 
2025-12-04T11:20:45.4908431Z Expected 1 but got 2.
2025-12-04T11:20:45.4908540Z Absolute difference: 1
2025-12-04T11:20:45.4908650Z Relative difference: 1.0
2025-12-04T11:20:45.4908656Z 
2025-12-04T11:20:45.4908885Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4909801Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4909810Z 
2025-12-04T11:20:45.4910093Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4910274Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.4910476Z ================== 1 failed, 13 deselected, 2 rerun in 21.33s ==================
2025-12-04T11:20:45.4910592Z Got exit code 1
2025-12-04T11:20:45.4910701Z Retrying single test...
2025-12-04T11:20:45.4911159Z W1204 11:17:03.738000 92726 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.4911816Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-65775801d71c7290.xml
2025-12-04T11:20:45.4911987Z ============================= test session starts ==============================
2025-12-04T11:20:45.4912359Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.4912473Z cachedir: .pytest_cache
2025-12-04T11:20:45.4912994Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.4913139Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.4913250Z configfile: pytest.ini
2025-12-04T11:20:45.4913809Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.4914028Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.4915091Z stepcurrent: skipping 11 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4915226Z Running 1 items in this shard
2025-12-04T11:20:45.4915231Z 
2025-12-04T11:20:45.4916521Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:17:07.833268456 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4916559Z 
2025-12-04T11:20:45.4917093Z [W1204 11:17:22.152782250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4917099Z 
2025-12-04T11:20:45.4917612Z [W1204 11:17:22.153052159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4917618Z 
2025-12-04T11:20:45.4918142Z [W1204 11:17:22.160552917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4918178Z 
2025-12-04T11:20:45.4918690Z [W1204 11:17:22.161351484 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4918696Z 
2025-12-04T11:20:45.4919219Z [W1204 11:17:22.161551623 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4919226Z 
2025-12-04T11:20:45.4919733Z [W1204 11:17:22.168665291 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4919739Z 
2025-12-04T11:20:45.4920246Z [W1204 11:17:22.169395548 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4920264Z 
2025-12-04T11:20:45.4920777Z [W1204 11:17:22.169589756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4920782Z 
2025-12-04T11:20:45.4921291Z [W1204 11:17:24.182102377 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4921296Z 
2025-12-04T11:20:45.4921821Z [W1204 11:17:24.183860691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4921829Z 
2025-12-04T11:20:45.4922337Z [W1204 11:17:24.184081254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4922342Z 
2025-12-04T11:20:45.4922865Z [W1204 11:17:24.188096646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4922869Z 
2025-12-04T11:20:45.4923375Z [W1204 11:17:24.188791973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4923385Z 
2025-12-04T11:20:45.4923905Z [W1204 11:17:24.188994605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4923913Z 
2025-12-04T11:20:45.4924420Z [W1204 11:17:24.195201623 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4924428Z 
2025-12-04T11:20:45.4924959Z [W1204 11:17:24.195891719 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4924963Z 
2025-12-04T11:20:45.4925470Z [W1204 11:17:24.196087581 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4925476Z 
2025-12-04T11:20:45.4925611Z ('RERUN', {'yellow': True}) [19.2846s] [100%]
2025-12-04T11:20:45.4926975Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:17:25.656719457 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4926984Z 
2025-12-04T11:20:45.4927496Z [W1204 11:17:25.657625909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4927549Z 
2025-12-04T11:20:45.4928070Z [W1204 11:17:25.657842586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4928075Z 
2025-12-04T11:20:45.4928584Z [W1204 11:17:25.661947320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4928589Z 
2025-12-04T11:20:45.4929112Z [W1204 11:17:25.662844559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4929117Z 
2025-12-04T11:20:45.4929630Z [W1204 11:17:25.663045970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4929666Z 
2025-12-04T11:20:45.4930189Z [W1204 11:17:25.669201584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4930194Z 
2025-12-04T11:20:45.4930706Z [W1204 11:17:25.669893848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4930712Z 
2025-12-04T11:20:45.4931237Z [W1204 11:17:25.670122148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4931241Z 
2025-12-04T11:20:45.4931750Z [W1204 11:17:25.762623881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4931755Z 
2025-12-04T11:20:45.4932271Z [W1204 11:17:25.763462481 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4932278Z 
2025-12-04T11:20:45.4932803Z [W1204 11:17:25.763678765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4932808Z 
2025-12-04T11:20:45.4933321Z [W1204 11:17:25.767772694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4933328Z 
2025-12-04T11:20:45.4933856Z [W1204 11:17:25.768498541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4933861Z 
2025-12-04T11:20:45.4934373Z [W1204 11:17:25.768717848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4934378Z 
2025-12-04T11:20:45.4934908Z [W1204 11:17:25.774964078 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4934914Z 
2025-12-04T11:20:45.4935424Z [W1204 11:17:25.775899218 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4935429Z 
2025-12-04T11:20:45.4935955Z [W1204 11:17:25.776102383 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4935962Z 
2025-12-04T11:20:45.4936098Z ('RERUN', {'yellow': True}) [0.5392s] [100%]
2025-12-04T11:20:45.4937464Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:17:25.161046626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4937485Z 
2025-12-04T11:20:45.4938078Z [W1204 11:17:25.161779941 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4938086Z 
2025-12-04T11:20:45.4938603Z [W1204 11:17:25.161978888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4938608Z 
2025-12-04T11:20:45.4939134Z [W1204 11:17:25.165866020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4939169Z 
2025-12-04T11:20:45.4939683Z [W1204 11:17:25.166630533 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4939688Z 
2025-12-04T11:20:45.4940211Z [W1204 11:17:25.166819565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4940216Z 
2025-12-04T11:20:45.4940732Z [W1204 11:17:25.172831807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4940770Z 
2025-12-04T11:20:45.4941295Z [W1204 11:17:25.173463488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4941300Z 
2025-12-04T11:20:45.4941812Z [W1204 11:17:25.173651250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4941819Z 
2025-12-04T11:20:45.4942344Z [W1204 11:17:25.262561730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4942349Z 
2025-12-04T11:20:45.4942860Z [W1204 11:17:25.263330167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4942865Z 
2025-12-04T11:20:45.4943380Z [W1204 11:17:25.263535548 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4943401Z 
2025-12-04T11:20:45.4943916Z [W1204 11:17:25.269107917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4943921Z 
2025-12-04T11:20:45.4944437Z [W1204 11:17:25.270021935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4944444Z 
2025-12-04T11:20:45.4944972Z [W1204 11:17:25.270228730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4944977Z 
2025-12-04T11:20:45.4945487Z [W1204 11:17:25.277704694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4945492Z 
2025-12-04T11:20:45.4946024Z [W1204 11:17:25.278451894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4946029Z 
2025-12-04T11:20:45.4946542Z [W1204 11:17:25.278648270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.4946548Z 
2025-12-04T11:20:45.4946669Z FAILED [0.5035s] [100%]
2025-12-04T11:20:45.4946674Z 
2025-12-04T11:20:45.4946822Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.4947341Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.4947478Z Traceback (most recent call last):
2025-12-04T11:20:45.4947993Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4948235Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4948784Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4948951Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4949509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4949716Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4949849Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4949887Z 
2025-12-04T11:20:45.4950006Z Expected 1 but got 2.
2025-12-04T11:20:45.4950115Z Absolute difference: 1
2025-12-04T11:20:45.4950241Z Relative difference: 1.0
2025-12-04T11:20:45.4950246Z 
2025-12-04T11:20:45.4950461Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4951374Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4951379Z 
2025-12-04T11:20:45.4951669Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4951931Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4952065Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4952595Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4952826Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4952943Z graph_break []
2025-12-04T11:20:45.4953161Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4954370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4954514Z   if out == self.unknown_value:
2025-12-04T11:20:45.4955244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4955367Z   warnings.warn(
2025-12-04T11:20:45.4956090Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4956199Z   warnings.warn(
2025-12-04T11:20:45.4956728Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.4956853Z Traceback (most recent call last):
2025-12-04T11:20:45.4957375Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4957607Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4958071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4958253Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4958788Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4959006Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4959141Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4959146Z 
2025-12-04T11:20:45.4959253Z Expected 1 but got 2.
2025-12-04T11:20:45.4959374Z Absolute difference: 1
2025-12-04T11:20:45.4959485Z Relative difference: 1.0
2025-12-04T11:20:45.4959490Z 
2025-12-04T11:20:45.4959706Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4960724Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4960731Z 
2025-12-04T11:20:45.4961004Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4961236Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4961354Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4961887Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4962166Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4962267Z graph_break []
2025-12-04T11:20:45.4962498Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4963722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4963841Z   if out == self.unknown_value:
2025-12-04T11:20:45.4964617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4964722Z   warnings.warn(
2025-12-04T11:20:45.4965456Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4965564Z   warnings.warn(
2025-12-04T11:20:45.4965783Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4965914Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4966144Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4966675Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4966794Z graph_break []
2025-12-04T11:20:45.4967012Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4967758Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4967859Z   warnings.warn(
2025-12-04T11:20:45.4968579Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4968693Z   warnings.warn(
2025-12-04T11:20:45.4968840Z =================================== FAILURES ===================================
2025-12-04T11:20:45.4969350Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.4969493Z Traceback (most recent call last):
2025-12-04T11:20:45.4970009Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.4970256Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.4970715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.4970880Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.4971780Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.4971991Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.4972141Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4972146Z 
2025-12-04T11:20:45.4972254Z Expected 1 but got 2.
2025-12-04T11:20:45.4972364Z Absolute difference: 1
2025-12-04T11:20:45.4972490Z Relative difference: 1.0
2025-12-04T11:20:45.4972495Z 
2025-12-04T11:20:45.4972711Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4973803Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4973828Z 
2025-12-04T11:20:45.4974100Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4974366Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4974500Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4975029Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4975259Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4975375Z graph_break []
2025-12-04T11:20:45.4975593Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4976880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.4977061Z   if out == self.unknown_value:
2025-12-04T11:20:45.4977795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4977915Z   warnings.warn(
2025-12-04T11:20:45.4978638Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4978756Z   warnings.warn(
2025-12-04T11:20:45.4978974Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4979090Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4979336Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4979867Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4979966Z graph_break []
2025-12-04T11:20:45.4980197Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4980920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4981037Z   warnings.warn(
2025-12-04T11:20:45.4981756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4981857Z   warnings.warn(
2025-12-04T11:20:45.4982086Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.4982202Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.4982434Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.4982977Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.4983077Z graph_break []
2025-12-04T11:20:45.4983305Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.4984029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4984130Z   warnings.warn(
2025-12-04T11:20:45.4984862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.4984965Z   warnings.warn(
2025-12-04T11:20:45.4985869Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-65775801d71c7290.xml -
2025-12-04T11:20:45.4986047Z =========================== short test summary info ============================
2025-12-04T11:20:45.4987004Z FAILED [0.5035s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.4987043Z 
2025-12-04T11:20:45.4987167Z Expected 1 but got 2.
2025-12-04T11:20:45.4987281Z Absolute difference: 1
2025-12-04T11:20:45.4987393Z Relative difference: 1.0
2025-12-04T11:20:45.4987411Z 
2025-12-04T11:20:45.4987631Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.4988543Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4988553Z 
2025-12-04T11:20:45.4988832Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.4989050Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.4989263Z ================== 1 failed, 13 deselected, 2 rerun in 20.36s ==================
2025-12-04T11:20:45.4989364Z Got exit code 1
2025-12-04T11:20:45.4990198Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.4990624Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:20:45.4991070Z W1204 11:17:37.556000 92908 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.4991732Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ed754aaaf490f98.xml
2025-12-04T11:20:45.4991913Z ============================= test session starts ==============================
2025-12-04T11:20:45.4997953Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.4998124Z cachedir: .pytest_cache
2025-12-04T11:20:45.4998689Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.4998828Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.4998937Z configfile: pytest.ini
2025-12-04T11:20:45.4999497Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.4999717Z collecting ... collected 58 items / 12 deselected / 46 selected
2025-12-04T11:20:45.4999881Z stepcurrent: skipping 12 already run items.
2025-12-04T11:20:45.5000001Z Running 2 items in this shard
2025-12-04T11:20:45.5000015Z 
2025-12-04T11:20:45.5000910Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [4.3530s] [ 50%]
2025-12-04T11:20:45.5001796Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.9081s] [ 50%]
2025-12-04T11:20:45.5002587Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.9217s] [ 50%]
2025-12-04T11:20:45.5002594Z 
2025-12-04T11:20:45.5002755Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.5003271Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.5003617Z Traceback (most recent call last):
2025-12-04T11:20:45.5004148Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.5004384Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.5004863Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.5005073Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.5005609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.5005832Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.5005967Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5005973Z 
2025-12-04T11:20:45.5006097Z Expected 1 but got 2.
2025-12-04T11:20:45.5006206Z Absolute difference: 1
2025-12-04T11:20:45.5006321Z Relative difference: 1.0
2025-12-04T11:20:45.5006332Z 
2025-12-04T11:20:45.5006562Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5007526Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.5007533Z 
2025-12-04T11:20:45.5007820Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5008046Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5008164Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5008711Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.5008941Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5009041Z graph_break []
2025-12-04T11:20:45.5009278Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5010022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5010139Z   warnings.warn(
2025-12-04T11:20:45.5010859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5010965Z   warnings.warn(
2025-12-04T11:20:45.5011495Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.5011619Z Traceback (most recent call last):
2025-12-04T11:20:45.5012143Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.5012374Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.5012838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.5013019Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.5013553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.5013763Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.5013909Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5013914Z 
2025-12-04T11:20:45.5014021Z Expected 1 but got 2.
2025-12-04T11:20:45.5014143Z Absolute difference: 1
2025-12-04T11:20:45.5014252Z Relative difference: 1.0
2025-12-04T11:20:45.5014257Z 
2025-12-04T11:20:45.5014474Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5015478Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.5015487Z 
2025-12-04T11:20:45.5015755Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5015989Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5016106Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5016742Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.5017027Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5017128Z graph_break []
2025-12-04T11:20:45.5017347Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5018093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5018201Z   warnings.warn(
2025-12-04T11:20:45.5018936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5019070Z   warnings.warn(
2025-12-04T11:20:45.5019289Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5019425Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5019654Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5020182Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.5020298Z graph_break []
2025-12-04T11:20:45.5020513Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5021248Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5021352Z   warnings.warn(
2025-12-04T11:20:45.5022066Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5022179Z   warnings.warn(
2025-12-04T11:20:45.5022326Z =================================== FAILURES ===================================
2025-12-04T11:20:45.5022858Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.5022982Z Traceback (most recent call last):
2025-12-04T11:20:45.5023492Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.5023733Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.5024195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.5024360Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.5024910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.5025116Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.5025261Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5025269Z 
2025-12-04T11:20:45.5025375Z Expected 1 but got 2.
2025-12-04T11:20:45.5025484Z Absolute difference: 1
2025-12-04T11:20:45.5025611Z Relative difference: 1.0
2025-12-04T11:20:45.5025617Z 
2025-12-04T11:20:45.5025832Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5026762Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.5026768Z 
2025-12-04T11:20:45.5027096Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5027323Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5027452Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5027981Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.5028253Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5028352Z graph_break []
2025-12-04T11:20:45.5028567Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5029308Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5029416Z   warnings.warn(
2025-12-04T11:20:45.5030140Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5030288Z   warnings.warn(
2025-12-04T11:20:45.5030504Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5030634Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5030863Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5031397Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.5031509Z graph_break []
2025-12-04T11:20:45.5031726Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5032448Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5032564Z   warnings.warn(
2025-12-04T11:20:45.5033285Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5033401Z   warnings.warn(
2025-12-04T11:20:45.5033620Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5033735Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5033975Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5034502Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.5034600Z graph_break []
2025-12-04T11:20:45.5034827Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5035552Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5035671Z   warnings.warn(
2025-12-04T11:20:45.5036391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5036490Z   warnings.warn(
2025-12-04T11:20:45.5037345Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ed754aaaf490f98.xml -
2025-12-04T11:20:45.5037521Z =========================== short test summary info ============================
2025-12-04T11:20:45.5038477Z FAILED [0.9217s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5038483Z 
2025-12-04T11:20:45.5038591Z Expected 1 but got 2.
2025-12-04T11:20:45.5038701Z Absolute difference: 1
2025-12-04T11:20:45.5038886Z Relative difference: 1.0
2025-12-04T11:20:45.5038892Z 
2025-12-04T11:20:45.5039114Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5040041Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.5040095Z 
2025-12-04T11:20:45.5040366Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5040550Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.5040767Z ================== 1 failed, 12 deselected, 2 rerun in 6.22s ===================
2025-12-04T11:20:45.5040870Z Got exit code 1
2025-12-04T11:20:45.5040992Z Retrying single test...
2025-12-04T11:20:45.5041438Z W1204 11:17:58.451000 93085 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.5042105Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-36a3a8a6a9d0a436.xml
2025-12-04T11:20:45.5042320Z ============================= test session starts ==============================
2025-12-04T11:20:45.5042674Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.5042788Z cachedir: .pytest_cache
2025-12-04T11:20:45.5043321Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.5043448Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.5043572Z configfile: pytest.ini
2025-12-04T11:20:45.5044114Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.5044334Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.5045352Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.5045471Z Running 1 items in this shard
2025-12-04T11:20:45.5045476Z 
2025-12-04T11:20:45.5046795Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:18:02.909364710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5046804Z 
2025-12-04T11:20:45.5047322Z [W1204 11:18:18.165604660 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5047327Z 
2025-12-04T11:20:45.5047851Z [W1204 11:18:18.165880384 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5047861Z 
2025-12-04T11:20:45.5048370Z [W1204 11:18:18.173373199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5048378Z 
2025-12-04T11:20:45.5048885Z [W1204 11:18:18.174099024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5048906Z 
2025-12-04T11:20:45.5049413Z [W1204 11:18:18.174291668 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5049418Z 
2025-12-04T11:20:45.5049927Z [W1204 11:18:18.181446283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5049932Z 
2025-12-04T11:20:45.5050455Z [W1204 11:18:18.182214576 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5050460Z 
2025-12-04T11:20:45.5051035Z [W1204 11:18:18.182410824 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5051044Z 
2025-12-04T11:20:45.5051562Z [W1204 11:18:20.185362378 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5051567Z 
2025-12-04T11:20:45.5052101Z [W1204 11:18:20.187051096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5052106Z 
2025-12-04T11:20:45.5052624Z [W1204 11:18:20.187268919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5052629Z 
2025-12-04T11:20:45.5053136Z [W1204 11:18:20.191256448 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5053140Z 
2025-12-04T11:20:45.5053664Z [W1204 11:18:20.191945160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5053701Z 
2025-12-04T11:20:45.5054206Z [W1204 11:18:20.192143429 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5054211Z 
2025-12-04T11:20:45.5054719Z [W1204 11:18:20.198289814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5054739Z 
2025-12-04T11:20:45.5055244Z [W1204 11:18:20.198946772 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5055249Z 
2025-12-04T11:20:45.5055757Z [W1204 11:18:20.199143134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5055763Z 
2025-12-04T11:20:45.5055913Z ('RERUN', {'yellow': True}) [20.5887s] [100%]
2025-12-04T11:20:45.5057272Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:18:21.038661417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5057282Z 
2025-12-04T11:20:45.5057811Z [W1204 11:18:21.039464655 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5057816Z 
2025-12-04T11:20:45.5058325Z [W1204 11:18:21.039673592 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5058330Z 
2025-12-04T11:20:45.5058855Z [W1204 11:18:21.043809712 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5058860Z 
2025-12-04T11:20:45.5059372Z [W1204 11:18:21.044702754 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5059380Z 
2025-12-04T11:20:45.5059901Z [W1204 11:18:21.044905447 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5059906Z 
2025-12-04T11:20:45.5060415Z [W1204 11:18:21.051125928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5060422Z 
2025-12-04T11:20:45.5060932Z [W1204 11:18:21.051817298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5060937Z 
2025-12-04T11:20:45.5061458Z [W1204 11:18:21.052012962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5061462Z 
2025-12-04T11:20:45.5062038Z [W1204 11:18:21.142290976 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5062046Z 
2025-12-04T11:20:45.5062568Z [W1204 11:18:21.143065618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5062573Z 
2025-12-04T11:20:45.5063083Z [W1204 11:18:21.143271750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5063121Z 
2025-12-04T11:20:45.5063645Z [W1204 11:18:21.147245585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5063650Z 
2025-12-04T11:20:45.5064161Z [W1204 11:18:21.147908558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5064167Z 
2025-12-04T11:20:45.5064689Z [W1204 11:18:21.148105840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5064733Z 
2025-12-04T11:20:45.5065240Z [W1204 11:18:21.154247234 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5065245Z 
2025-12-04T11:20:45.5065754Z [W1204 11:18:21.155099158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5065776Z 
2025-12-04T11:20:45.5066286Z [W1204 11:18:21.155298718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5066290Z 
2025-12-04T11:20:45.5066423Z ('RERUN', {'yellow': True}) [0.9173s] [100%]
2025-12-04T11:20:45.5067726Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:18:22.932878588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5067734Z 
2025-12-04T11:20:45.5068244Z [W1204 11:18:22.933683468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5068248Z 
2025-12-04T11:20:45.5068768Z [W1204 11:18:22.933888083 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5068775Z 
2025-12-04T11:20:45.5069281Z [W1204 11:18:22.937899275 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5069286Z 
2025-12-04T11:20:45.5069803Z [W1204 11:18:22.938552881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5069808Z 
2025-12-04T11:20:45.5070321Z [W1204 11:18:22.938749165 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5070328Z 
2025-12-04T11:20:45.5070849Z [W1204 11:18:22.944951671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5070854Z 
2025-12-04T11:20:45.5071766Z [W1204 11:18:22.945606932 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5071776Z 
2025-12-04T11:20:45.5072285Z [W1204 11:18:22.945800661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5072303Z 
2025-12-04T11:20:45.5072813Z [W1204 11:18:22.036116308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5072818Z 
2025-12-04T11:20:45.5073483Z [W1204 11:18:22.036923075 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5073488Z 
2025-12-04T11:20:45.5074014Z [W1204 11:18:22.037134899 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5074019Z 
2025-12-04T11:20:45.5074525Z [W1204 11:18:22.041178483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5074572Z 
2025-12-04T11:20:45.5075100Z [W1204 11:18:22.041856946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5075105Z 
2025-12-04T11:20:45.5075615Z [W1204 11:18:22.042058583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5075619Z 
2025-12-04T11:20:45.5076142Z [W1204 11:18:22.048169858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5076151Z 
2025-12-04T11:20:45.5076660Z [W1204 11:18:22.049033772 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5076719Z 
2025-12-04T11:20:45.5077239Z [W1204 11:18:22.049233391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5077246Z 
2025-12-04T11:20:45.5077351Z FAILED [0.8916s] [100%]
2025-12-04T11:20:45.5077356Z 
2025-12-04T11:20:45.5077502Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.5078037Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.5078163Z Traceback (most recent call last):
2025-12-04T11:20:45.5078679Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.5078931Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.5079397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.5079575Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.5080111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.5080323Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.5080473Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5080478Z 
2025-12-04T11:20:45.5080586Z Expected 1 but got 2.
2025-12-04T11:20:45.5080710Z Absolute difference: 1
2025-12-04T11:20:45.5080822Z Relative difference: 1.0
2025-12-04T11:20:45.5080827Z 
2025-12-04T11:20:45.5081042Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5081974Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.5081983Z 
2025-12-04T11:20:45.5082255Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5082495Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5082614Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5083144Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.5083387Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5083488Z graph_break []
2025-12-04T11:20:45.5083707Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5085016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.5085138Z   if out == self.unknown_value:
2025-12-04T11:20:45.5085880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5086015Z   warnings.warn(
2025-12-04T11:20:45.5086733Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5086850Z   warnings.warn(
2025-12-04T11:20:45.5087368Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.5087505Z Traceback (most recent call last):
2025-12-04T11:20:45.5088016Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.5088253Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.5088763Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.5088928Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.5089465Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.5089689Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.5089821Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5089826Z 
2025-12-04T11:20:45.5089944Z Expected 1 but got 2.
2025-12-04T11:20:45.5090052Z Absolute difference: 1
2025-12-04T11:20:45.5090161Z Relative difference: 1.0
2025-12-04T11:20:45.5090166Z 
2025-12-04T11:20:45.5090392Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5091313Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.5091321Z 
2025-12-04T11:20:45.5091601Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5091821Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5091941Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5092482Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.5092710Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5092824Z graph_break []
2025-12-04T11:20:45.5093042Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5094256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.5094391Z   if out == self.unknown_value:
2025-12-04T11:20:45.5095115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5095222Z   warnings.warn(
2025-12-04T11:20:45.5095954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5096056Z   warnings.warn(
2025-12-04T11:20:45.5096360Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5096480Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5096710Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5097328Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.5097432Z graph_break []
2025-12-04T11:20:45.5097648Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5098385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5098518Z   warnings.warn(
2025-12-04T11:20:45.5099244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5099347Z   warnings.warn(
2025-12-04T11:20:45.5099494Z =================================== FAILURES ===================================
2025-12-04T11:20:45.5100029Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.5100155Z Traceback (most recent call last):
2025-12-04T11:20:45.5100710Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.5100942Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.5101405Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.5101588Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.5102123Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.5102344Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.5102482Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5102488Z 
2025-12-04T11:20:45.5102593Z Expected 1 but got 2.
2025-12-04T11:20:45.5102717Z Absolute difference: 1
2025-12-04T11:20:45.5102836Z Relative difference: 1.0
2025-12-04T11:20:45.5102844Z 
2025-12-04T11:20:45.5103060Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5103991Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.5104000Z 
2025-12-04T11:20:45.5104270Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5104502Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5104620Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5105149Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.5105394Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5105501Z graph_break []
2025-12-04T11:20:45.5105732Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5106944Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.5107065Z   if out == self.unknown_value:
2025-12-04T11:20:45.5107801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5107907Z   warnings.warn(
2025-12-04T11:20:45.5108635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5108738Z   warnings.warn(
2025-12-04T11:20:45.5109015Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5109149Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5109377Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5109905Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.5110046Z graph_break []
2025-12-04T11:20:45.5110262Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5110996Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5111098Z   warnings.warn(
2025-12-04T11:20:45.5111817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5111929Z   warnings.warn(
2025-12-04T11:20:45.5112150Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5112299Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5112542Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5113069Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.5113185Z graph_break []
2025-12-04T11:20:45.5113401Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5114124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5114241Z   warnings.warn(
2025-12-04T11:20:45.5114959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5115062Z   warnings.warn(
2025-12-04T11:20:45.5115917Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-36a3a8a6a9d0a436.xml -
2025-12-04T11:20:45.5116092Z =========================== short test summary info ============================
2025-12-04T11:20:45.5117060Z FAILED [0.8916s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5117066Z 
2025-12-04T11:20:45.5117174Z Expected 1 but got 2.
2025-12-04T11:20:45.5117296Z Absolute difference: 1
2025-12-04T11:20:45.5117408Z Relative difference: 1.0
2025-12-04T11:20:45.5117413Z 
2025-12-04T11:20:45.5117631Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5118566Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.5118574Z 
2025-12-04T11:20:45.5118841Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5119035Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.5119237Z ================== 1 failed, 13 deselected, 2 rerun in 22.43s ==================
2025-12-04T11:20:45.5119336Z Got exit code 1
2025-12-04T11:20:45.5119458Z Retrying single test...
2025-12-04T11:20:45.5119902Z W1204 11:18:34.482000 93267 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.5120563Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f55b1076fbed9be9.xml
2025-12-04T11:20:45.5120806Z ============================= test session starts ==============================
2025-12-04T11:20:45.5121157Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.5121286Z cachedir: .pytest_cache
2025-12-04T11:20:45.5121801Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.5121929Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.5122087Z configfile: pytest.ini
2025-12-04T11:20:45.5122629Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.5122848Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.5123857Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.5123977Z Running 1 items in this shard
2025-12-04T11:20:45.5124013Z 
2025-12-04T11:20:45.5125314Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:18:38.922872674 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5125324Z 
2025-12-04T11:20:45.5125844Z [W1204 11:18:55.597035519 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5125850Z 
2025-12-04T11:20:45.5126371Z [W1204 11:18:55.597300996 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5126377Z 
2025-12-04T11:20:45.5126884Z [W1204 11:18:55.604629255 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5126889Z 
2025-12-04T11:20:45.5127413Z [W1204 11:18:55.605319901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5127421Z 
2025-12-04T11:20:45.5127924Z [W1204 11:18:55.605509309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5127929Z 
2025-12-04T11:20:45.5128439Z [W1204 11:18:55.612389954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5128455Z 
2025-12-04T11:20:45.5128959Z [W1204 11:18:55.613048860 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5128964Z 
2025-12-04T11:20:45.5129472Z [W1204 11:18:55.613234657 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5129477Z 
2025-12-04T11:20:45.5130005Z [W1204 11:18:57.616100451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5130012Z 
2025-12-04T11:20:45.5130520Z [W1204 11:18:57.617812130 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5130525Z 
2025-12-04T11:20:45.5131046Z [W1204 11:18:57.618019855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5131053Z 
2025-12-04T11:20:45.5131557Z [W1204 11:18:57.622013724 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5131562Z 
2025-12-04T11:20:45.5132082Z [W1204 11:18:57.622677822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5132087Z 
2025-12-04T11:20:45.5132673Z [W1204 11:18:57.622877379 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5132680Z 
2025-12-04T11:20:45.5133198Z [W1204 11:18:57.628933356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5133203Z 
2025-12-04T11:20:45.5133709Z [W1204 11:18:57.629573525 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5133744Z 
2025-12-04T11:20:45.5134253Z [W1204 11:18:57.629767077 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5134270Z 
2025-12-04T11:20:45.5134403Z ('RERUN', {'yellow': True}) [20.9861s] [100%]
2025-12-04T11:20:45.5135690Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:18:58.466471619 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5135727Z 
2025-12-04T11:20:45.5136252Z [W1204 11:18:58.467254105 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5136257Z 
2025-12-04T11:20:45.5136836Z [W1204 11:18:58.467455498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5136850Z 
2025-12-04T11:20:45.5137375Z [W1204 11:18:58.471497101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5137380Z 
2025-12-04T11:20:45.5137884Z [W1204 11:18:58.472340780 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5137889Z 
2025-12-04T11:20:45.5138408Z [W1204 11:18:58.472554044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5138415Z 
2025-12-04T11:20:45.5138935Z [W1204 11:18:58.478622130 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5138940Z 
2025-12-04T11:20:45.5139467Z [W1204 11:18:58.479272667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5139475Z 
2025-12-04T11:20:45.5139988Z [W1204 11:18:58.479462688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5139993Z 
2025-12-04T11:20:45.5140517Z [W1204 11:18:58.569732496 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5140522Z 
2025-12-04T11:20:45.5141035Z [W1204 11:18:58.570572735 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5141042Z 
2025-12-04T11:20:45.5141568Z [W1204 11:18:58.570797768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5141572Z 
2025-12-04T11:20:45.5142082Z [W1204 11:18:58.574782774 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5142090Z 
2025-12-04T11:20:45.5142612Z [W1204 11:18:58.575467540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5142617Z 
2025-12-04T11:20:45.5143128Z [W1204 11:18:58.575673876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5143134Z 
2025-12-04T11:20:45.5143710Z [W1204 11:18:58.581796544 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5143732Z 
2025-12-04T11:20:45.5144241Z [W1204 11:18:58.582676720 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5144246Z 
2025-12-04T11:20:45.5144754Z [W1204 11:18:58.582877120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5144791Z 
2025-12-04T11:20:45.5144936Z ('RERUN', {'yellow': True}) [0.9144s] [100%]
2025-12-04T11:20:45.5146239Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:18:58.361247648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5146244Z 
2025-12-04T11:20:45.5146775Z [W1204 11:18:58.362031139 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5146808Z 
2025-12-04T11:20:45.5147317Z [W1204 11:18:58.362238581 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5147322Z 
2025-12-04T11:20:45.5147846Z [W1204 11:18:58.366209287 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5147853Z 
2025-12-04T11:20:45.5148366Z [W1204 11:18:58.366867750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5148371Z 
2025-12-04T11:20:45.5148882Z [W1204 11:18:58.367060670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5148900Z 
2025-12-04T11:20:45.5149411Z [W1204 11:18:58.373175522 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5149416Z 
2025-12-04T11:20:45.5149929Z [W1204 11:18:58.373831842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5149934Z 
2025-12-04T11:20:45.5150456Z [W1204 11:18:58.374022365 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5150463Z 
2025-12-04T11:20:45.5150975Z [W1204 11:18:59.463241692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5150981Z 
2025-12-04T11:20:45.5151502Z [W1204 11:18:59.464028036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5151507Z 
2025-12-04T11:20:45.5152018Z [W1204 11:18:59.464246357 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5152027Z 
2025-12-04T11:20:45.5152549Z [W1204 11:18:59.468192380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5152556Z 
2025-12-04T11:20:45.5153067Z [W1204 11:18:59.468860486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5153073Z 
2025-12-04T11:20:45.5153597Z [W1204 11:18:59.469059107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5153602Z 
2025-12-04T11:20:45.5154109Z [W1204 11:18:59.475103885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5154114Z 
2025-12-04T11:20:45.5154627Z [W1204 11:18:59.475920375 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5154643Z 
2025-12-04T11:20:45.5155214Z [W1204 11:18:59.476115587 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5155222Z 
2025-12-04T11:20:45.5155331Z FAILED [0.8904s] [100%]
2025-12-04T11:20:45.5155336Z 
2025-12-04T11:20:45.5155495Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.5156045Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.5156183Z Traceback (most recent call last):
2025-12-04T11:20:45.5156696Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.5156926Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.5157404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.5157573Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.5158141Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.5158360Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.5158492Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5158497Z 
2025-12-04T11:20:45.5158618Z Expected 1 but got 2.
2025-12-04T11:20:45.5158725Z Absolute difference: 1
2025-12-04T11:20:45.5158835Z Relative difference: 1.0
2025-12-04T11:20:45.5158841Z 
2025-12-04T11:20:45.5159066Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5159986Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.5159992Z 
2025-12-04T11:20:45.5160278Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5160500Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5160618Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5161158Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.5161386Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5161488Z graph_break []
2025-12-04T11:20:45.5161724Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5162932Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.5163063Z   if out == self.unknown_value:
2025-12-04T11:20:45.5163797Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5163903Z   warnings.warn(
2025-12-04T11:20:45.5164644Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5164749Z   warnings.warn(
2025-12-04T11:20:45.5165279Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.5165410Z Traceback (most recent call last):
2025-12-04T11:20:45.5165915Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.5166161Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.5166686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.5166866Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.5167403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.5167610Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.5167756Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5167792Z 
2025-12-04T11:20:45.5167901Z Expected 1 but got 2.
2025-12-04T11:20:45.5168009Z Absolute difference: 1
2025-12-04T11:20:45.5168137Z Relative difference: 1.0
2025-12-04T11:20:45.5168142Z 
2025-12-04T11:20:45.5168358Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5169281Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.5169287Z 
2025-12-04T11:20:45.5169561Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5169830Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5169960Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5170483Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.5170723Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5170825Z graph_break []
2025-12-04T11:20:45.5171313Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5172596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.5172721Z   if out == self.unknown_value:
2025-12-04T11:20:45.5173463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5173570Z   warnings.warn(
2025-12-04T11:20:45.5174289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5174409Z   warnings.warn(
2025-12-04T11:20:45.5174629Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5174745Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5174988Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5175515Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.5175631Z graph_break []
2025-12-04T11:20:45.5175854Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5176678Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5176799Z   warnings.warn(
2025-12-04T11:20:45.5177518Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5177624Z   warnings.warn(
2025-12-04T11:20:45.5177789Z =================================== FAILURES ===================================
2025-12-04T11:20:45.5178307Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:20:45.5178448Z Traceback (most recent call last):
2025-12-04T11:20:45.5179102Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.5179343Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.5179822Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.5179989Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.5180541Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.5180798Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.5180932Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5180938Z 
2025-12-04T11:20:45.5181063Z Expected 1 but got 2.
2025-12-04T11:20:45.5181171Z Absolute difference: 1
2025-12-04T11:20:45.5181282Z Relative difference: 1.0
2025-12-04T11:20:45.5181287Z 
2025-12-04T11:20:45.5181514Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5182437Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.5182492Z 
2025-12-04T11:20:45.5182777Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5182995Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5183115Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5183656Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.5183884Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5183996Z graph_break []
2025-12-04T11:20:45.5184216Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5185428Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.5185563Z   if out == self.unknown_value:
2025-12-04T11:20:45.5186290Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5186411Z   warnings.warn(
2025-12-04T11:20:45.5187128Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5187230Z   warnings.warn(
2025-12-04T11:20:45.5187461Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5187580Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5187811Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5188361Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.5188462Z graph_break []
2025-12-04T11:20:45.5188692Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5189420Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5189527Z   warnings.warn(
2025-12-04T11:20:45.5190255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5190356Z   warnings.warn(
2025-12-04T11:20:45.5190571Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5190698Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5191070Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5191613Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:20:45.5191713Z graph_break []
2025-12-04T11:20:45.5191930Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5192704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5192805Z   warnings.warn(
2025-12-04T11:20:45.5193533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5193637Z   warnings.warn(
2025-12-04T11:20:45.5194486Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f55b1076fbed9be9.xml -
2025-12-04T11:20:45.5194714Z =========================== short test summary info ============================
2025-12-04T11:20:45.5195677Z FAILED [0.8904s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5195686Z 
2025-12-04T11:20:45.5195808Z Expected 1 but got 2.
2025-12-04T11:20:45.5195916Z Absolute difference: 1
2025-12-04T11:20:45.5196028Z Relative difference: 1.0
2025-12-04T11:20:45.5196033Z 
2025-12-04T11:20:45.5196263Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5197187Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.5197193Z 
2025-12-04T11:20:45.5197481Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5197668Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.5197869Z ================== 1 failed, 13 deselected, 2 rerun in 22.82s ==================
2025-12-04T11:20:45.5197985Z Got exit code 1
2025-12-04T11:20:45.5198817Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:20:45.5199230Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:20:45.5199695Z W1204 11:19:10.799000 93449 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.5200352Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6062b5e411b734f8.xml
2025-12-04T11:20:45.5200538Z ============================= test session starts ==============================
2025-12-04T11:20:45.5200892Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.5201003Z cachedir: .pytest_cache
2025-12-04T11:20:45.5201537Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.5201666Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.5201787Z configfile: pytest.ini
2025-12-04T11:20:45.5202329Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.5202547Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.5202703Z stepcurrent: skipping 13 already run items.
2025-12-04T11:20:45.5202821Z Running 1 items in this shard
2025-12-04T11:20:45.5202826Z 
2025-12-04T11:20:45.5204142Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 W1204 11:19:16.460000 93449 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.5204283Z ('RERUN', {'yellow': True}) [4.0054s] [100%]
2025-12-04T11:20:45.5205146Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5344s] [100%]
2025-12-04T11:20:45.5205976Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.5294s] [100%]
2025-12-04T11:20:45.5205981Z 
2025-12-04T11:20:45.5206124Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.5206650Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.5206805Z Traceback (most recent call last):
2025-12-04T11:20:45.5207316Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.5207561Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.5208027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.5208213Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.5208750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.5208961Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.5209108Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5209113Z 
2025-12-04T11:20:45.5209221Z Expected 1 but got 0.
2025-12-04T11:20:45.5209335Z Absolute difference: 1
2025-12-04T11:20:45.5209462Z Relative difference: 1.0
2025-12-04T11:20:45.5209470Z 
2025-12-04T11:20:45.5209686Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5210609Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.5210617Z 
2025-12-04T11:20:45.5210884Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5211109Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5211242Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5211949Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.5212197Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5212301Z graph_break []
2025-12-04T11:20:45.5212430Z aten_mm_info [('aten.mm_256_72_1024', 2)]
2025-12-04T11:20:45.5212661Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5213400Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5213509Z   warnings.warn(
2025-12-04T11:20:45.5214246Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5214352Z   warnings.warn(
2025-12-04T11:20:45.5214878Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.5215004Z Traceback (most recent call last):
2025-12-04T11:20:45.5215587Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.5215840Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.5216370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.5216555Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.5217151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.5217361Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.5217510Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5217516Z 
2025-12-04T11:20:45.5217625Z Expected 1 but got 0.
2025-12-04T11:20:45.5217733Z Absolute difference: 1
2025-12-04T11:20:45.5217860Z Relative difference: 1.0
2025-12-04T11:20:45.5217866Z 
2025-12-04T11:20:45.5218083Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5219016Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.5219055Z 
2025-12-04T11:20:45.5219327Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5219550Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5219684Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5220376Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.5220615Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5220715Z graph_break []
2025-12-04T11:20:45.5220842Z aten_mm_info [('aten.mm_256_72_1024', 2)]
2025-12-04T11:20:45.5221078Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5221811Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5221914Z   warnings.warn(
2025-12-04T11:20:45.5222645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5222750Z   warnings.warn(
2025-12-04T11:20:45.5222981Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5223097Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5223325Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5224034Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.5224137Z graph_break []
2025-12-04T11:20:45.5224261Z aten_mm_info [('aten.mm_256_72_1024', 2)]
2025-12-04T11:20:45.5224494Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5225212Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5225329Z   warnings.warn(
2025-12-04T11:20:45.5226052Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5226156Z   warnings.warn(
2025-12-04T11:20:45.5226320Z =================================== FAILURES ===================================
2025-12-04T11:20:45.5226831Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.5226969Z Traceback (most recent call last):
2025-12-04T11:20:45.5227552Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.5227788Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.5228265Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.5228463Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.5229001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.5229225Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.5229358Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5229363Z 
2025-12-04T11:20:45.5229485Z Expected 1 but got 0.
2025-12-04T11:20:45.5229594Z Absolute difference: 1
2025-12-04T11:20:45.5229705Z Relative difference: 1.0
2025-12-04T11:20:45.5229709Z 
2025-12-04T11:20:45.5229944Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5230878Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.5230884Z 
2025-12-04T11:20:45.5231164Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5231385Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5231502Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5232211Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.5232439Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5232551Z graph_break []
2025-12-04T11:20:45.5232678Z aten_mm_info [('aten.mm_256_72_1024', 2)]
2025-12-04T11:20:45.5232895Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5233640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5233742Z   warnings.warn(
2025-12-04T11:20:45.5234462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5234581Z   warnings.warn(
2025-12-04T11:20:45.5234797Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5234926Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5235154Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5235855Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.5235972Z graph_break []
2025-12-04T11:20:45.5236096Z aten_mm_info [('aten.mm_256_72_1024', 2)]
2025-12-04T11:20:45.5236310Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5237048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5237153Z   warnings.warn(
2025-12-04T11:20:45.5237881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5237983Z   warnings.warn(
2025-12-04T11:20:45.5238202Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5238333Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5238620Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5239314Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.5239429Z graph_break []
2025-12-04T11:20:45.5239554Z aten_mm_info [('aten.mm_256_72_1024', 2)]
2025-12-04T11:20:45.5239782Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5240537Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5240638Z   warnings.warn(
2025-12-04T11:20:45.5241365Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5241466Z   warnings.warn(
2025-12-04T11:20:45.5242317Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6062b5e411b734f8.xml -
2025-12-04T11:20:45.5242525Z =========================== short test summary info ============================
2025-12-04T11:20:45.5243468Z FAILED [0.5294s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5243476Z 
2025-12-04T11:20:45.5243597Z Expected 1 but got 0.
2025-12-04T11:20:45.5243706Z Absolute difference: 1
2025-12-04T11:20:45.5243829Z Relative difference: 1.0
2025-12-04T11:20:45.5243834Z 
2025-12-04T11:20:45.5244049Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5244959Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.5244970Z 
2025-12-04T11:20:45.5245248Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5245431Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.5245642Z ================== 1 failed, 13 deselected, 2 rerun in 5.10s ===================
2025-12-04T11:20:45.5245742Z Got exit code 1
2025-12-04T11:20:45.5245852Z Retrying single test...
2025-12-04T11:20:45.5246309Z W1204 11:19:31.510000 93626 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.5246964Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-21fa07c752a411ad.xml
2025-12-04T11:20:45.5247129Z ============================= test session starts ==============================
2025-12-04T11:20:45.5247490Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.5247606Z cachedir: .pytest_cache
2025-12-04T11:20:45.5248138Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.5248262Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.5248371Z configfile: pytest.ini
2025-12-04T11:20:45.5248923Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.5249145Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.5250142Z stepcurrent: skipping 13 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.5250269Z Running 1 items in this shard
2025-12-04T11:20:45.5250275Z 
2025-12-04T11:20:45.5251620Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:19:37.474114044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5251629Z 
2025-12-04T11:20:45.5252160Z [W1204 11:19:52.322354498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5252197Z 
2025-12-04T11:20:45.5252712Z [W1204 11:19:52.322613110 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5252718Z 
2025-12-04T11:20:45.5253243Z [W1204 11:19:52.330983752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5253248Z 
2025-12-04T11:20:45.5253753Z [W1204 11:19:52.331924649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5253762Z 
2025-12-04T11:20:45.5254282Z [W1204 11:19:52.332122589 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5254321Z 
2025-12-04T11:20:45.5254832Z [W1204 11:19:52.339912127 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5254839Z 
2025-12-04T11:20:45.5255352Z [W1204 11:19:52.340729993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5255356Z 
2025-12-04T11:20:45.5255869Z [W1204 11:19:52.340934308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5255873Z 
2025-12-04T11:20:45.5256406Z W1204 11:19:53.064000 93626 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.5256933Z [W1204 11:19:53.539850131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5256940Z 
2025-12-04T11:20:45.5257450Z [W1204 11:19:53.541683967 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5257455Z 
2025-12-04T11:20:45.5257976Z [W1204 11:19:53.541912926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5257984Z 
2025-12-04T11:20:45.5258490Z [W1204 11:19:53.546785248 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5258495Z 
2025-12-04T11:20:45.5259014Z [W1204 11:19:53.547522672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5259019Z 
2025-12-04T11:20:45.5259530Z [W1204 11:19:53.547731238 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5259537Z 
2025-12-04T11:20:45.5260057Z [W1204 11:19:53.554767633 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5260062Z 
2025-12-04T11:20:45.5260568Z [W1204 11:19:53.555599210 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5260575Z 
2025-12-04T11:20:45.5261085Z [W1204 11:19:53.555812084 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5261103Z 
2025-12-04T11:20:45.5261237Z ('RERUN', {'yellow': True}) [19.8935s] [100%]
2025-12-04T11:20:45.5262602Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:19:53.021321403 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5262612Z 
2025-12-04T11:20:45.5263135Z [W1204 11:19:53.022113670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5263139Z 
2025-12-04T11:20:45.5263656Z [W1204 11:19:53.022326042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5263690Z 
2025-12-04T11:20:45.5264216Z [W1204 11:19:53.027291959 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5264221Z 
2025-12-04T11:20:45.5264727Z [W1204 11:19:53.027971346 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5264732Z 
2025-12-04T11:20:45.5265257Z [W1204 11:19:53.028165777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5265293Z 
2025-12-04T11:20:45.5265803Z [W1204 11:19:53.035049819 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5265808Z 
2025-12-04T11:20:45.5266331Z [W1204 11:19:53.035714476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5266340Z 
2025-12-04T11:20:45.5266849Z [W1204 11:19:53.035906472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5266854Z 
2025-12-04T11:20:45.5267361Z [W1204 11:19:53.144804588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5267381Z 
2025-12-04T11:20:45.5267895Z [W1204 11:19:53.145593970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5267902Z 
2025-12-04T11:20:45.5268408Z [W1204 11:19:53.145804810 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5268414Z 
2025-12-04T11:20:45.5268936Z [W1204 11:19:53.150576939 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5268943Z 
2025-12-04T11:20:45.5269451Z [W1204 11:19:53.151252477 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5269456Z 
2025-12-04T11:20:45.5269977Z [W1204 11:19:53.151454687 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5269982Z 
2025-12-04T11:20:45.5270491Z [W1204 11:19:53.158214770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5270496Z 
2025-12-04T11:20:45.5271254Z [W1204 11:19:53.158895195 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5271261Z 
2025-12-04T11:20:45.5271857Z [W1204 11:19:53.159095257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5271866Z 
2025-12-04T11:20:45.5272000Z ('RERUN', {'yellow': True}) [0.5623s] [100%]
2025-12-04T11:20:45.5273294Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:19:54.561837973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5273300Z 
2025-12-04T11:20:45.5273947Z [W1204 11:19:54.562656934 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5273953Z 
2025-12-04T11:20:45.5274483Z [W1204 11:19:54.562873174 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5274487Z 
2025-12-04T11:20:45.5274999Z [W1204 11:19:54.567928937 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5275045Z 
2025-12-04T11:20:45.5275564Z [W1204 11:19:54.568658042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5275569Z 
2025-12-04T11:20:45.5276075Z [W1204 11:19:54.568862515 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5276080Z 
2025-12-04T11:20:45.5276599Z [W1204 11:19:54.575772272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5276608Z 
2025-12-04T11:20:45.5277117Z [W1204 11:19:54.576518516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5277167Z 
2025-12-04T11:20:45.5277688Z [W1204 11:19:54.576735583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5277696Z 
2025-12-04T11:20:45.5278204Z [W1204 11:19:54.692060738 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5278209Z 
2025-12-04T11:20:45.5278716Z [W1204 11:19:54.692876096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5278737Z 
2025-12-04T11:20:45.5279256Z [W1204 11:19:54.693093523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5279261Z 
2025-12-04T11:20:45.5279780Z [W1204 11:19:54.697875315 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5279787Z 
2025-12-04T11:20:45.5280307Z [W1204 11:19:54.698564970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5280311Z 
2025-12-04T11:20:45.5280828Z [W1204 11:19:54.698766571 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5280832Z 
2025-12-04T11:20:45.5281356Z [W1204 11:19:54.705772301 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5281361Z 
2025-12-04T11:20:45.5281869Z [W1204 11:19:54.706468384 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5281874Z 
2025-12-04T11:20:45.5282402Z [W1204 11:19:54.706667911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5282409Z 
2025-12-04T11:20:45.5282515Z FAILED [0.5452s] [100%]
2025-12-04T11:20:45.5282519Z 
2025-12-04T11:20:45.5282664Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.5283191Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.5283320Z Traceback (most recent call last):
2025-12-04T11:20:45.5283850Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.5284085Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.5284552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.5284798Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.5285337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.5285563Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.5285698Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5285704Z 
2025-12-04T11:20:45.5285812Z Expected 1 but got 0.
2025-12-04T11:20:45.5285973Z Absolute difference: 1
2025-12-04T11:20:45.5286088Z Relative difference: 1.0
2025-12-04T11:20:45.5286093Z 
2025-12-04T11:20:45.5286310Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5287240Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.5287245Z 
2025-12-04T11:20:45.5287515Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5287759Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5287912Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5288608Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.5288850Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5288956Z graph_break []
2025-12-04T11:20:45.5289097Z aten_mm_info [('aten.mm_256_72_1024', 2)]
2025-12-04T11:20:45.5289317Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5290526Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.5290665Z   if out == self.unknown_value:
2025-12-04T11:20:45.5291393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5291517Z   warnings.warn(
2025-12-04T11:20:45.5292237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5292344Z   warnings.warn(
2025-12-04T11:20:45.5292865Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.5292993Z Traceback (most recent call last):
2025-12-04T11:20:45.5293500Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.5293747Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.5294205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.5294386Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.5294919Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.5295125Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.5295273Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5295278Z 
2025-12-04T11:20:45.5295387Z Expected 1 but got 0.
2025-12-04T11:20:45.5295495Z Absolute difference: 1
2025-12-04T11:20:45.5295617Z Relative difference: 1.0
2025-12-04T11:20:45.5295622Z 
2025-12-04T11:20:45.5295835Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5296911Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.5296919Z 
2025-12-04T11:20:45.5297192Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5297416Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5297548Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5298244Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.5298520Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5298620Z graph_break []
2025-12-04T11:20:45.5298747Z aten_mm_info [('aten.mm_256_72_1024', 2)]
2025-12-04T11:20:45.5298984Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5300193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.5300360Z   if out == self.unknown_value:
2025-12-04T11:20:45.5301086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5301191Z   warnings.warn(
2025-12-04T11:20:45.5301924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5302030Z   warnings.warn(
2025-12-04T11:20:45.5302249Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5302379Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5302608Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5303318Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.5303422Z graph_break []
2025-12-04T11:20:45.5303546Z aten_mm_info [('aten.mm_256_72_1024', 2)]
2025-12-04T11:20:45.5303775Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5304500Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5304618Z   warnings.warn(
2025-12-04T11:20:45.5305338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5305441Z   warnings.warn(
2025-12-04T11:20:45.5305604Z =================================== FAILURES ===================================
2025-12-04T11:20:45.5306118Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.5306244Z Traceback (most recent call last):
2025-12-04T11:20:45.5306763Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.5306993Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.5307462Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.5307628Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.5308161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.5308381Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.5308513Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5308518Z 
2025-12-04T11:20:45.5308637Z Expected 1 but got 0.
2025-12-04T11:20:45.5308827Z Absolute difference: 1
2025-12-04T11:20:45.5308941Z Relative difference: 1.0
2025-12-04T11:20:45.5308949Z 
2025-12-04T11:20:45.5309182Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5310086Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.5310123Z 
2025-12-04T11:20:45.5310393Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5310627Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5310747Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5311462Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.5311697Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5311799Z graph_break []
2025-12-04T11:20:45.5311974Z aten_mm_info [('aten.mm_256_72_1024', 2)]
2025-12-04T11:20:45.5312194Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5313421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.5313542Z   if out == self.unknown_value:
2025-12-04T11:20:45.5314268Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5314385Z   warnings.warn(
2025-12-04T11:20:45.5315104Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5315211Z   warnings.warn(
2025-12-04T11:20:45.5315444Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5315563Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5315803Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5316500Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.5316601Z graph_break []
2025-12-04T11:20:45.5316739Z aten_mm_info [('aten.mm_256_72_1024', 2)]
2025-12-04T11:20:45.5316954Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5317695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5317798Z   warnings.warn(
2025-12-04T11:20:45.5318520Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5318637Z   warnings.warn(
2025-12-04T11:20:45.5318853Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5318969Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5319209Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5319902Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.5320015Z graph_break []
2025-12-04T11:20:45.5320139Z aten_mm_info [('aten.mm_256_72_1024', 2)]
2025-12-04T11:20:45.5320354Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5321154Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5321258Z   warnings.warn(
2025-12-04T11:20:45.5321972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5322086Z   warnings.warn(
2025-12-04T11:20:45.5322924Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-21fa07c752a411ad.xml -
2025-12-04T11:20:45.5323144Z =========================== short test summary info ============================
2025-12-04T11:20:45.5324082Z FAILED [0.5452s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5324088Z 
2025-12-04T11:20:45.5324209Z Expected 1 but got 0.
2025-12-04T11:20:45.5324325Z Absolute difference: 1
2025-12-04T11:20:45.5324469Z Relative difference: 1.0
2025-12-04T11:20:45.5324474Z 
2025-12-04T11:20:45.5324702Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5325608Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.5325616Z 
2025-12-04T11:20:45.5325883Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5326075Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.5326276Z ================== 1 failed, 13 deselected, 2 rerun in 21.03s ==================
2025-12-04T11:20:45.5326390Z Got exit code 1
2025-12-04T11:20:45.5326498Z Retrying single test...
2025-12-04T11:20:45.5326950Z W1204 11:20:06.141000 93808 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.5327622Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c09ee7a97c51183.xml
2025-12-04T11:20:45.5327791Z ============================= test session starts ==============================
2025-12-04T11:20:45.5328152Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.5328265Z cachedir: .pytest_cache
2025-12-04T11:20:45.5328786Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.5328924Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.5329033Z configfile: pytest.ini
2025-12-04T11:20:45.5329575Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.5329811Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T11:20:45.5330807Z stepcurrent: skipping 13 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.5330938Z Running 1 items in this shard
2025-12-04T11:20:45.5330944Z 
2025-12-04T11:20:45.5332236Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:20:11.096681931 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5332245Z 
2025-12-04T11:20:45.5332777Z [W1204 11:20:28.186452159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5332782Z 
2025-12-04T11:20:45.5333364Z [W1204 11:20:28.186716034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5333373Z 
2025-12-04T11:20:45.5333883Z [W1204 11:20:28.194944748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5333901Z 
2025-12-04T11:20:45.5334414Z [W1204 11:20:28.195851251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5334449Z 
2025-12-04T11:20:45.5334958Z [W1204 11:20:28.196052020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5334963Z 
2025-12-04T11:20:45.5335484Z [W1204 11:20:28.203651391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5335489Z 
2025-12-04T11:20:45.5336002Z [W1204 11:20:28.204319762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5336007Z 
2025-12-04T11:20:45.5336641Z [W1204 11:20:28.204510086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5336647Z 
2025-12-04T11:20:45.5337108Z W1204 11:20:28.926000 93808 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.5337635Z [W1204 11:20:29.400484846 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5337640Z 
2025-12-04T11:20:45.5338149Z [W1204 11:20:29.402257914 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5338154Z 
2025-12-04T11:20:45.5338663Z [W1204 11:20:29.402479468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5338682Z 
2025-12-04T11:20:45.5339193Z [W1204 11:20:29.407235324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5339201Z 
2025-12-04T11:20:45.5339708Z [W1204 11:20:29.407969242 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5339712Z 
2025-12-04T11:20:45.5340235Z [W1204 11:20:29.408182677 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5340240Z 
2025-12-04T11:20:45.5340748Z [W1204 11:20:29.415007990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5340753Z 
2025-12-04T11:20:45.5341277Z [W1204 11:20:29.415715316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5341283Z 
2025-12-04T11:20:45.5341793Z [W1204 11:20:29.415920865 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5341800Z 
2025-12-04T11:20:45.5341949Z ('RERUN', {'yellow': True}) [21.1250s] [100%]
2025-12-04T11:20:45.5343226Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:20:29.876668578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5343236Z 
2025-12-04T11:20:45.5343758Z [W1204 11:20:29.877453862 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5343764Z 
2025-12-04T11:20:45.5344271Z [W1204 11:20:29.877661954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5344276Z 
2025-12-04T11:20:45.5344849Z [W1204 11:20:29.882693558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5344874Z 
2025-12-04T11:20:45.5345387Z [W1204 11:20:29.883378160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5345392Z 
2025-12-04T11:20:45.5345903Z [W1204 11:20:29.883584684 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5345938Z 
2025-12-04T11:20:45.5346462Z [W1204 11:20:29.890428222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5346466Z 
2025-12-04T11:20:45.5346973Z [W1204 11:20:29.891119052 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5346978Z 
2025-12-04T11:20:45.5347498Z [W1204 11:20:29.891316433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5347550Z 
2025-12-04T11:20:45.5348061Z [W1204 11:20:29.002453388 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5348066Z 
2025-12-04T11:20:45.5348584Z [W1204 11:20:29.003289758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5348591Z 
2025-12-04T11:20:45.5349096Z [W1204 11:20:29.003516496 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5349101Z 
2025-12-04T11:20:45.5349622Z [W1204 11:20:29.008294928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5349627Z 
2025-12-04T11:20:45.5350138Z [W1204 11:20:29.009042062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5350145Z 
2025-12-04T11:20:45.5350655Z [W1204 11:20:29.009251483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5350659Z 
2025-12-04T11:20:45.5351182Z [W1204 11:20:29.016184178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5351189Z 
2025-12-04T11:20:45.5351697Z [W1204 11:20:29.016918274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5351703Z 
2025-12-04T11:20:45.5352229Z [W1204 11:20:29.017122621 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5352234Z 
2025-12-04T11:20:45.5352369Z ('RERUN', {'yellow': True}) [0.5616s] [100%]
2025-12-04T11:20:45.5353660Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:20:30.414269816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5353668Z 
2025-12-04T11:20:45.5354181Z [W1204 11:20:30.415084846 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5354188Z 
2025-12-04T11:20:45.5354711Z [W1204 11:20:30.415308496 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5354715Z 
2025-12-04T11:20:45.5355224Z [W1204 11:20:30.420329618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5355229Z 
2025-12-04T11:20:45.5355801Z [W1204 11:20:30.421069777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5355824Z 
2025-12-04T11:20:45.5356333Z [W1204 11:20:30.421283110 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5356337Z 
2025-12-04T11:20:45.5356849Z [W1204 11:20:30.428131946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5356885Z 
2025-12-04T11:20:45.5357411Z [W1204 11:20:30.428869829 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5357416Z 
2025-12-04T11:20:45.5357930Z [W1204 11:20:30.429078765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5357934Z 
2025-12-04T11:20:45.5358461Z [W1204 11:20:30.544890644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5358466Z 
2025-12-04T11:20:45.5359006Z [W1204 11:20:30.545722248 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5359011Z 
2025-12-04T11:20:45.5359529Z [W1204 11:20:30.545945278 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5359536Z 
2025-12-04T11:20:45.5360046Z [W1204 11:20:30.550844335 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5360051Z 
2025-12-04T11:20:45.5360573Z [W1204 11:20:30.551600596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5360578Z 
2025-12-04T11:20:45.5361089Z [W1204 11:20:30.551808446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5361098Z 
2025-12-04T11:20:45.5361610Z [W1204 11:20:30.558984282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5361635Z 
2025-12-04T11:20:45.5362147Z [W1204 11:20:30.559735594 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5362154Z 
2025-12-04T11:20:45.5362662Z [W1204 11:20:30.559941065 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:20:45.5362668Z 
2025-12-04T11:20:45.5362789Z FAILED [0.5422s] [100%]
2025-12-04T11:20:45.5362794Z 
2025-12-04T11:20:45.5362941Z ==================================== RERUNS ====================================
2025-12-04T11:20:45.5363468Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.5363597Z Traceback (most recent call last):
2025-12-04T11:20:45.5364110Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.5364359Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.5364827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.5364996Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.5365543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.5365751Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.5365900Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5365905Z 
2025-12-04T11:20:45.5366012Z Expected 1 but got 0.
2025-12-04T11:20:45.5366124Z Absolute difference: 1
2025-12-04T11:20:45.5366247Z Relative difference: 1.0
2025-12-04T11:20:45.5366252Z 
2025-12-04T11:20:45.5366539Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5367464Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.5367469Z 
2025-12-04T11:20:45.5367738Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5367992Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5368121Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5368819Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.5369060Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5369160Z graph_break []
2025-12-04T11:20:45.5369289Z aten_mm_info [('aten.mm_256_72_1024', 2)]
2025-12-04T11:20:45.5369519Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5370843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.5371251Z   if out == self.unknown_value:
2025-12-04T11:20:45.5372055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5372161Z   warnings.warn(
2025-12-04T11:20:45.5372889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5372993Z   warnings.warn(
2025-12-04T11:20:45.5373510Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.5373657Z Traceback (most recent call last):
2025-12-04T11:20:45.5374162Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.5374411Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.5374876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.5375041Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.5375592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.5375800Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.5375932Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5375938Z 
2025-12-04T11:20:45.5376058Z Expected 1 but got 0.
2025-12-04T11:20:45.5376173Z Absolute difference: 1
2025-12-04T11:20:45.5376364Z Relative difference: 1.0
2025-12-04T11:20:45.5376370Z 
2025-12-04T11:20:45.5376591Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5377498Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.5377507Z 
2025-12-04T11:20:45.5377792Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5378014Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5378149Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5378848Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.5379229Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5379347Z graph_break []
2025-12-04T11:20:45.5379472Z aten_mm_info [('aten.mm_256_72_1024', 2)]
2025-12-04T11:20:45.5379691Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5380913Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.5381077Z   if out == self.unknown_value:
2025-12-04T11:20:45.5381820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5381923Z   warnings.warn(
2025-12-04T11:20:45.5382639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5382804Z   warnings.warn(
2025-12-04T11:20:45.5383025Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5383157Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5383388Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5384084Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.5384202Z graph_break []
2025-12-04T11:20:45.5384326Z aten_mm_info [('aten.mm_256_72_1024', 2)]
2025-12-04T11:20:45.5384547Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5385287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5385390Z   warnings.warn(
2025-12-04T11:20:45.5386122Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5386226Z   warnings.warn(
2025-12-04T11:20:45.5386376Z =================================== FAILURES ===================================
2025-12-04T11:20:45.5386897Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:20:45.5387024Z Traceback (most recent call last):
2025-12-04T11:20:45.5387548Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:20:45.5387782Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:20:45.5388244Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:20:45.5388425Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:20:45.5388960Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:20:45.5389170Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:20:45.5389317Z AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5389322Z 
2025-12-04T11:20:45.5389429Z Expected 1 but got 0.
2025-12-04T11:20:45.5389553Z Absolute difference: 1
2025-12-04T11:20:45.5389665Z Relative difference: 1.0
2025-12-04T11:20:45.5389670Z 
2025-12-04T11:20:45.5389885Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5390810Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.5390816Z 
2025-12-04T11:20:45.5391084Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5391384Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5391506Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5392202Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.5392441Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5392642Z graph_break []
2025-12-04T11:20:45.5392765Z aten_mm_info [('aten.mm_256_72_1024', 2)]
2025-12-04T11:20:45.5392996Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5394198Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:20:45.5394335Z   if out == self.unknown_value:
2025-12-04T11:20:45.5395060Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5395195Z   warnings.warn(
2025-12-04T11:20:45.5395930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5396035Z   warnings.warn(
2025-12-04T11:20:45.5396270Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5396387Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5396615Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5397324Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.5397426Z graph_break []
2025-12-04T11:20:45.5397554Z aten_mm_info [('aten.mm_256_72_1024', 2)]
2025-12-04T11:20:45.5397785Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5398505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5398620Z   warnings.warn(
2025-12-04T11:20:45.5399339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5399440Z   warnings.warn(
2025-12-04T11:20:45.5399667Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:20:45.5399784Z stats [('calls_captured', 6)]
2025-12-04T11:20:45.5400011Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:20:45.5400719Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:20:45.5400820Z graph_break []
2025-12-04T11:20:45.5400957Z aten_mm_info [('aten.mm_256_72_1024', 2)]
2025-12-04T11:20:45.5401172Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:20:45.5401892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5402007Z   warnings.warn(
2025-12-04T11:20:45.5402720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:20:45.5402834Z   warnings.warn(
2025-12-04T11:20:45.5403673Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c09ee7a97c51183.xml -
2025-12-04T11:20:45.5403906Z =========================== short test summary info ============================
2025-12-04T11:20:45.5404861Z FAILED [0.5422s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:20:45.5404868Z 
2025-12-04T11:20:45.5405004Z Expected 1 but got 0.
2025-12-04T11:20:45.5405127Z Absolute difference: 1
2025-12-04T11:20:45.5405243Z Relative difference: 1.0
2025-12-04T11:20:45.5405248Z 
2025-12-04T11:20:45.5405466Z To execute this test, run the following from the base repo dir:
2025-12-04T11:20:45.5406389Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.5406394Z 
2025-12-04T11:20:45.5406667Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:20:45.5406864Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:20:45.5407097Z ================== 1 failed, 13 deselected, 2 rerun in 22.26s ==================
2025-12-04T11:20:45.5407199Z Got exit code 1
2025-12-04T11:20:45.5408037Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:20:45.5408451Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:20:45.5408914Z W1204 11:20:42.127000 93990 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:20:45.5409570Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c713b1dc3f3923ec.xml
2025-12-04T11:20:45.5409743Z ============================= test session starts ==============================
2025-12-04T11:20:45.5410108Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:20:45.5410220Z cachedir: .pytest_cache
2025-12-04T11:20:45.5410741Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:20:45.5410884Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:20:45.5410994Z configfile: pytest.ini
2025-12-04T11:20:45.5411546Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:20:45.5411766Z collecting ... collected 58 items / 14 deselected / 44 selected
2025-12-04T11:20:45.5411910Z stepcurrent: skipping 14 already run items.
2025-12-04T11:20:45.5412037Z Running 0 items in this shard
2025-12-04T11:20:45.5412042Z 
2025-12-04T11:20:45.5412886Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c713b1dc3f3923ec.xml -
2025-12-04T11:20:45.5413068Z ============================ 14 deselected in 0.02s ============================
2025-12-04T11:20:45.5424371Z The following tests failed consistently: ['test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16']
2025-12-04T11:20:45.5424461Z 
2025-12-04T11:20:45.5425100Z FINISHED PRINTING LOG FILE of inductor/test_cuda_select_algorithm 3/5 (test/test-reports/inductor.test_cuda_select_algorithm_3.5_e3565bc7025c1889_.log)
2025-12-04T11:20:45.5425119Z 
2025-12-04T11:20:45.5425518Z Finished inductor/test_cuda_select_algorithm 3/5 ... [2025-12-04 11:20:45.178343][7673.561228993], took 21.29min
2025-12-04T11:20:45.5426470Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-74cab4bdcde89184.xml
2025-12-04T11:20:45.5427392Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-77e37a2f8b75b3d9.xml
2025-12-04T11:20:45.5428283Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3ba19b390afd5854.xml
2025-12-04T11:20:45.5429182Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4ad317a243ecdd30.xml
2025-12-04T11:20:45.5430083Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f482798b2b39d897.xml
2025-12-04T11:20:45.5430997Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cbe2514f89eef609.xml
2025-12-04T11:20:45.5431886Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3707d31910126ebf.xml
2025-12-04T11:20:45.5432781Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dedaec5daecec784.xml
2025-12-04T11:20:45.5433682Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2f4f0e9c4ac682e4.xml
2025-12-04T11:20:45.5434626Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-580d25229e34cb07.xml
2025-12-04T11:20:45.5445728Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9d15e1ab064c4537.xml
2025-12-04T11:20:45.5736362Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e6d909bcc6975bf8.xml
2025-12-04T11:20:45.6049956Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0a612698d44183a1.xml
2025-12-04T11:20:45.6308071Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-90f2ceb88314c75a.xml
2025-12-04T11:20:45.6564277Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9644b19a5203c0ee.xml
2025-12-04T11:20:45.6867653Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a6f999e52eb1904.xml
2025-12-04T11:20:45.7150757Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-547414903ca204e9.xml
2025-12-04T11:20:45.7441766Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-41f0f199b083e6d2.xml
2025-12-04T11:20:45.7753867Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-438f9d52209526cc.xml
2025-12-04T11:20:45.8065430Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98df9406c6e0faf3.xml
2025-12-04T11:20:45.8387162Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f706cf73cc88a5b8.xml
2025-12-04T11:20:45.8677489Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c67f05de6c39b0d8.xml
2025-12-04T11:20:45.8947568Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0e30a339afee7d22.xml
2025-12-04T11:20:45.9237064Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4a14a2e6be65f97f.xml
2025-12-04T11:20:45.9752942Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82a3db4b14f41cd2.xml
2025-12-04T11:20:46.0061331Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a02c7191ab69f431.xml
2025-12-04T11:20:46.0348172Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e37b8ebc7938792f.xml
2025-12-04T11:20:46.0639208Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ee37665d187f9309.xml
2025-12-04T11:20:46.0933923Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-511047743df1b08e.xml
2025-12-04T11:20:46.1250098Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4d9221d5ac70ff44.xml
2025-12-04T11:20:46.1529189Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-af9a500a606c950b.xml
2025-12-04T11:20:46.1835364Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e3ba96547605fc4e.xml
2025-12-04T11:20:46.2124390Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ce470e45644e1cc6.xml
2025-12-04T11:20:46.2416478Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cece0bb00c5477e6.xml
2025-12-04T11:20:46.2722837Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4e672e5e3ae6046c.xml
2025-12-04T11:20:46.3021077Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-65775801d71c7290.xml
2025-12-04T11:20:46.3319095Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ed754aaaf490f98.xml
2025-12-04T11:20:46.3591063Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-36a3a8a6a9d0a436.xml
2025-12-04T11:20:46.3912580Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f55b1076fbed9be9.xml
2025-12-04T11:20:46.4191952Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6062b5e411b734f8.xml
2025-12-04T11:20:46.4457339Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-21fa07c752a411ad.xml
2025-12-04T11:20:46.4736840Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c09ee7a97c51183.xml
2025-12-04T11:20:46.5005309Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c713b1dc3f3923ec.xml
2025-12-04T11:20:46.7587574Z Uploading logs for 57119749248 to S3
2025-12-04T11:20:46.8460948Z Uploading artifacts took 0.32 seconds
2025-12-04T11:20:46.8461383Z inductor/test_cuda_select_algorithm 3/5 failed!
2025-12-04T11:20:46.8466097Z Running inductor/test_compile_subprocess 3/3 ... [2025-12-04 11:20:46.846430][7675.229324441]
2025-12-04T11:20:46.8466742Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:20:46.8471701Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compile_subprocess.py', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:20:46.846893]
2025-12-04T11:29:55.3087917Z 
2025-12-04T11:29:55.3091520Z PRINTING LOG FILE of inductor/test_compile_subprocess 3/3 (test/test-reports/inductor.test_compile_subprocess_3.3_92ce494afd455b37_.log)
2025-12-04T11:29:55.3093795Z W1204 11:20:56.511000 94107 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:29:55.3095442Z Test results will be stored in test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-84a2c5e5cdda7bdd.xml
2025-12-04T11:29:55.3096708Z ============================= test session starts ==============================
2025-12-04T11:29:55.3097995Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:29:55.3098610Z cachedir: .pytest_cache
2025-12-04T11:29:55.3099325Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:29:55.3100555Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:29:55.3101096Z configfile: pytest.ini
2025-12-04T11:29:55.3102323Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:29:55.3103719Z collecting ... collected 879 items
2025-12-04T11:29:55.3104548Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T11:29:55.3234029Z Running 288 items in this shard: test/inductor/test_compile_subprocess.py::GPUTests::test__dyn_quant_matmul_4bit_bf16_input_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test__unsafe_masked_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test__unsafe_masked_index_put_accumulate_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_avg_pool2d_low_prec_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_avg_pool_errors_with_long_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex7_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_addmv_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_dtype_device_layout_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_override_registration_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_support_out_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_arange2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_arange5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_argmax_argmin_with_duplicates_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_argmax_argmin_with_nan_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_argmax_to_float_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d6_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d_backward2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d_backward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool3d_backward2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_baddbmm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_batch_norm_2d_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bernoulli2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bitwise2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bitwise_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bmm1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_broadcast_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_default_kwargs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int32_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_buffer_batch_norm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_buffer_copied_in_graph_with_different_shapes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_buffer_use_after_remove_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_float_ndigits_neg_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_float_ndigits_pos_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_float_ndigits_zero_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_int_ndigits_zero_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_of_loops_and_extern_kernel_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_single_empty_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_upcasting_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cauchy_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_chunk_recompiles_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_computed_buffer_inlining_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_consecutive_split_cumprod_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_const_int32_to_float_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_1d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_nd_inplace_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv_functional_bn_fuse_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_convolution4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_copy_non_blocking_is_pinned_use_cat_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_copy_with_scalar_src_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_cpu_scalar_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_gpu_tensor_cpp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_gpu_tensor_dynamic_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cumsum_inf_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cumsum_no_mask_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_op_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_op_fixed_layout_sequential_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_scan_op_multi_input_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_data_type_propogation_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dense_mask_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dropout_trivial_1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float64_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_elu_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_erfc_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_erfinv_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_exp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_expanded_reduction_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_expm1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fallback_mutable_op_with_return_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fill1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_flip_cat_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_flip_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_float_index_expression_type_promotion_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_floordiv_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fmod_zero_dim_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_forced_buffer_realize_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fractional_max_pool2d2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_full_like_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fuse_large_params_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fusing_write_into_disjoint_read_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_gather1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_gather_scatter_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_generate_rand_fp8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_glu_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_both_scalars_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_constant_tensor2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_mutation_real_name_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_no_inputs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_pad_dynamic_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_refcount_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_grid_sampler_2d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_dynamic_shapes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_abs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_floordiv_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_remainder_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put_failed_reinplace_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put_fallback1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_select_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_indirect_load_broadcast_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inductor_layout_optimization_input_mutations_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inner_fn_str_and_stride_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inplace_add_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_input_mutation2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_input_mutation3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_input_mutation4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_insignificant_strides_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_isinf2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_issue102546_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_kernel_names_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_l1_loss_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_large_grid_use_block_ptr_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_layer_norm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_leaky_relu_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lgamma_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_like_rands2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linear_mixed_dtype_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lite_regional_compile_repeated_blocks_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lite_triton_kernel_wrapper_functional_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_log2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_log_fp64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_logcumsumexp_zero_dim_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_low_memory_max_pool_dilation_1_dim_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_low_memory_max_pool_dilation_2_dim_3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mark_dynamic_with_hint_override_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_masked_fill_promotion_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_matmul_layer_norm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_min_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mean_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_min_max_reduction_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_misaligned_address_issue1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mixed_mm2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mixed_mm3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mm_mixed_dtype_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mul_index_expr_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multi_gpu_device_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multi_threading_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multilayer_any_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multilayer_sum_low_prec_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multilayer_var_lowp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mutable_custom_op_fixed_layout2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nan_sort_stable_False_descending_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_new_empty_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nll_loss_backward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nll_loss_forward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_one_hot_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pattern_matcher_unbacked_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_bessel_j0_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_bessel_y0_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_bessel_y1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_chebyshev_polynomial_t_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_chebyshev_polynomial_w_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_entr_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_erfcx_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_expm1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_gammaincc_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_gammaln_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_laguerre_polynomial_l_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_log_ndtr_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_logit_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_multigammaln_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_ndtri_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_psi_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_scaled_modified_bessel_k1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pow3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pow_symfloat_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_prod_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_progressive, test/inductor/test_compile_subprocess.py::GPUTests::test_rand_like_deterministic_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_randint_distribution_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_randn_generator_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_randn_like_empty_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reduction2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reduction5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remainder_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_scatter_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_interleave_decomposition_has_clamp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_require_stride_expanded_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_resize_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_roi_align_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_roll_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_round_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_rsqrt_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_rsqrt_dynamic_shapes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scaled_dot_product_attention_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter_add3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter_reduce3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sdpa_unaligned_mask_freezing_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_shape_padding_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_signbit_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_silu_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sin_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_mutation1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_scatter5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sort_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sort_stable_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sort_transpose_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_special_polygamma_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_cumprod_low_prec_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_cumsum_low_prec_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_reduction_with_int64_size_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_with_unbacked_symints_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sqrt_dynamic_shapes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_squeeze1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_squeeze2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_squeeze_varargs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_stack_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_std_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_strided_inputs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum_keepdims_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tanh_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tmp_not_defined_issue1_use_block_ptr_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tmp_not_defined_issue3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_to_dtype_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_triton_argmin_argmax_transpose_logical_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_uint4x2_mixed_mm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unbacked_floordiv_simplify_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unbacked_floordiv_simplify_errors_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unroll_small_reduction_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_upsample_nearest2d_backward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_var_correction_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_var_mean_div_by_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_var_mean_tile_reduction_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_var_mean_tile_reduction_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_vertical_fusion1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_view_as_complex_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_views2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_weight_norm_bwd_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_weight_norm_conv2d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_where_broadcast_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_where_with_logical_op_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_xblock_divides_xnumel_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_zero_dim_reductions_cuda
2025-12-04T11:29:55.3347862Z 
2025-12-04T11:29:55.3348787Z inductor/test_compile_subprocess.py::GPUTests::test__dyn_quant_matmul_4bit_bf16_input_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0047s] (No _dyn_quant_matmul_4bit implementation on CUDA) [  0%]
2025-12-04T11:29:55.3350562Z inductor/test_compile_subprocess.py::GPUTests::test__unsafe_masked_index_cuda <- test/inductor/test_torchinductor.py PASSED [18.4052s] [  0%]
2025-12-04T11:29:55.3351971Z inductor/test_compile_subprocess.py::GPUTests::test__unsafe_masked_index_put_accumulate_cuda <- test/inductor/test_torchinductor.py PASSED [0.9283s] [  1%]
2025-12-04T11:29:55.3353394Z inductor/test_compile_subprocess.py::GPUTests::test_adaptive_avg_pool2d_low_prec_cuda <- test/inductor/test_torchinductor.py PASSED [0.6602s] [  1%]
2025-12-04T11:29:55.3354861Z inductor/test_compile_subprocess.py::GPUTests::test_adaptive_avg_pool_errors_with_long_cuda <- test/inductor/test_torchinductor.py PASSED [0.6870s] [  1%]
2025-12-04T11:29:55.3356196Z inductor/test_compile_subprocess.py::GPUTests::test_add_complex4_cuda <- test/inductor/test_torchinductor.py PASSED [1.5006s] [  2%]
2025-12-04T11:29:55.3357432Z inductor/test_compile_subprocess.py::GPUTests::test_add_complex7_cuda <- test/inductor/test_torchinductor.py PASSED [0.6564s] [  2%]
2025-12-04T11:29:55.3358756Z inductor/test_compile_subprocess.py::GPUTests::test_add_complex8_cuda <- test/inductor/test_torchinductor.py PASSED [0.6202s] [  2%]
2025-12-04T11:29:55.3359991Z inductor/test_compile_subprocess.py::GPUTests::test_add_complex_cuda <- test/inductor/test_torchinductor.py PASSED [0.6110s] [  3%]
2025-12-04T11:29:55.3361698Z inductor/test_compile_subprocess.py::GPUTests::test_addmv_cuda <- test/inductor/test_torchinductor.py W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.3363317Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.3364818Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.3366216Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.3367581Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.3369146Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.3370666Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.3372207Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.3373580Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.3375103Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.3376890Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.3378305Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.3379691Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.3381199Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.3382655Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.3384116Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.3385577Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.3387102Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.3388592Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.3390098Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.3391630Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.3393198Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.3394859Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.3396023Z PASSED [1.0175s] [  3%]
2025-12-04T11:29:55.3396943Z inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_dtype_device_layout_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (Requires sm80) [  3%]
2025-12-04T11:29:55.3398536Z inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_override_registration_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (Requires sm80) [  4%]
2025-12-04T11:29:55.3400075Z inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_support_out_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (Requires sm80) [  4%]
2025-12-04T11:29:55.3401915Z inductor/test_compile_subprocess.py::GPUTests::test_arange2_cuda <- test/inductor/test_torchinductor.py W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.3403543Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.3405030Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.3406439Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.3407860Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.3409413Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.3410909Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.3412285Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.3413660Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.3415170Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.3416769Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.3418220Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.3419606Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.3421061Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.3422532Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.3423999Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.3425481Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.3426974Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.3428446Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.3429961Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.3431489Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.3433071Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.3434661Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default
2025-12-04T11:29:55.3435728Z PASSED [0.5444s] [  4%]
2025-12-04T11:29:55.3436952Z inductor/test_compile_subprocess.py::GPUTests::test_arange5_cuda <- test/inductor/test_torchinductor.py W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.3438653Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.3440156Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.3441557Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.3445944Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.3447556Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.3449069Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.3450394Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.3451849Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.3453379Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.3454941Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.3456346Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.3457840Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.3459310Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.3460766Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.3462228Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.3463674Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.3465165Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.3466656Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.3468152Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.3469670Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.3471469Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.3473174Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default
2025-12-04T11:29:55.3474655Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.3475758Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] Traceback (most recent call last):
2025-12-04T11:29:55.3477372Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.3478767Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1]     ).serialize()
2025-12-04T11:29:55.3480123Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.3481749Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.3483252Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.3484563Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1]     pickler.dump(obj)
2025-12-04T11:29:55.3485944Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.3487464Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.3488983Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.3490389Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1]     cls(obj, pickler.options),
2025-12-04T11:29:55.3491764Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.3493227Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.3494685Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.3496149Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.3497678Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.3499183Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.3500681Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.3502185Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.3503758Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.3505317Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.3506908Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default
2025-12-04T11:29:55.3508017Z PASSED [0.7799s] [  5%]
2025-12-04T11:29:55.3508918Z inductor/test_compile_subprocess.py::GPUTests::test_argmax_argmin_with_duplicates_cuda <- test/inductor/test_torchinductor.py PASSED [2.3291s] [  5%]
2025-12-04T11:29:55.3510274Z inductor/test_compile_subprocess.py::GPUTests::test_argmax_argmin_with_nan_cuda <- test/inductor/test_torchinductor.py PASSED [4.0337s] [  5%]
2025-12-04T11:29:55.3512051Z inductor/test_compile_subprocess.py::GPUTests::test_argmax_to_float_cuda <- test/inductor/test_torchinductor.py W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.3513737Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.3515225Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.3516635Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.3517979Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.3519526Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.3521029Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.3522359Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.3523727Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.3525246Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.3526762Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.3528172Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.3529552Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.3530996Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.3532448Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.3533933Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.3535401Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.3536978Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.3538522Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.3540024Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.3541557Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.3543162Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.3544833Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.3545969Z PASSED [1.0764s] [  6%]
2025-12-04T11:29:55.3546712Z inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d2_cuda <- test/inductor/test_torchinductor.py PASSED [1.1000s] [  6%]
2025-12-04T11:29:55.3547936Z inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d3_cuda <- test/inductor/test_torchinductor.py PASSED [1.6083s] [  6%]
2025-12-04T11:29:55.3549165Z inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d6_cuda <- test/inductor/test_torchinductor.py PASSED [0.7689s] [  7%]
2025-12-04T11:29:55.3550429Z inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d_backward2_cuda <- test/inductor/test_torchinductor.py PASSED [10.1314s] [  7%]
2025-12-04T11:29:55.3551741Z inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d_backward_cuda <- test/inductor/test_torchinductor.py PASSED [1.6173s] [  7%]
2025-12-04T11:29:55.3553332Z inductor/test_compile_subprocess.py::GPUTests::test_avg_pool3d_backward2_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0005s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  8%]
2025-12-04T11:29:55.3555308Z inductor/test_compile_subprocess.py::GPUTests::test_baddbmm_cuda <- test/inductor/test_torchinductor.py W1204 11:21:48.565000 94292 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:29:55.3556515Z PASSED [1.8050s] [  8%]
2025-12-04T11:29:55.3557263Z inductor/test_compile_subprocess.py::GPUTests::test_batch_norm_2d_2_cuda <- test/inductor/test_torchinductor.py PASSED [2.8427s] [  9%]
2025-12-04T11:29:55.3558992Z inductor/test_compile_subprocess.py::GPUTests::test_bernoulli2_cuda <- test/inductor/test_torchinductor.py W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.3560636Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.3562273Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.3563667Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.3565107Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.3566666Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.3568176Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.3569536Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.3571145Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.3572725Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.3574250Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.3575755Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.3577216Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.3578687Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.3580154Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.3581619Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.3583065Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.3584569Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.3586062Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.3587559Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.3589083Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.3590646Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.3592278Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.inductor_seeds.default
2025-12-04T11:29:55.3593800Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.3594904Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.3596480Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.3597882Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.3599238Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.3600845Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.3602437Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.3603759Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.3605131Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.3606680Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.3608205Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.3609610Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.3610986Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.3612444Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.3613915Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.3615373Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.3616929Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.3618417Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.3619904Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.3621404Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.3622920Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.3624495Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.3626130Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.inductor_seeds.default
2025-12-04T11:29:55.3627293Z PASSED [1.4791s] [  9%]
2025-12-04T11:29:55.3628030Z inductor/test_compile_subprocess.py::GPUTests::test_bitwise2_cuda <- test/inductor/test_torchinductor.py PASSED [0.3629s] [  9%]
2025-12-04T11:29:55.3629241Z inductor/test_compile_subprocess.py::GPUTests::test_bitwise_cuda <- test/inductor/test_torchinductor.py PASSED [0.3315s] [ 10%]
2025-12-04T11:29:55.3630402Z inductor/test_compile_subprocess.py::GPUTests::test_bmm1_cuda <- test/inductor/test_torchinductor.py PASSED [0.6155s] [ 10%]
2025-12-04T11:29:55.3631719Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_broadcast_cuda <- test/inductor/test_torchinductor.py PASSED [0.5180s] [ 10%]
2025-12-04T11:29:55.3633063Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_default_kwargs_cuda <- test/inductor/test_torchinductor.py PASSED [0.2226s] [ 11%]
2025-12-04T11:29:55.3634427Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int32_uint8_cuda <- test/inductor/test_torchinductor.py PASSED [1.5085s] [ 11%]
2025-12-04T11:29:55.3635764Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_int16_cuda <- test/inductor/test_torchinductor.py PASSED [1.5006s] [ 11%]
2025-12-04T11:29:55.3637151Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_int32_cuda <- test/inductor/test_torchinductor.py PASSED [1.5240s] [ 12%]
2025-12-04T11:29:55.3638504Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_uint8_cuda <- test/inductor/test_torchinductor.py PASSED [1.5327s] [ 12%]
2025-12-04T11:29:55.3639850Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_int32_cuda <- test/inductor/test_torchinductor.py PASSED [1.5015s] [ 12%]
2025-12-04T11:29:55.3641172Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_int64_cuda <- test/inductor/test_torchinductor.py PASSED [1.5204s] [ 13%]
2025-12-04T11:29:55.3642516Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_int16_cuda <- test/inductor/test_torchinductor.py PASSED [1.4869s] [ 13%]
2025-12-04T11:29:55.3643868Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_int32_cuda <- test/inductor/test_torchinductor.py PASSED [1.4842s] [ 13%]
2025-12-04T11:29:55.3645211Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_int8_cuda <- test/inductor/test_torchinductor.py PASSED [1.8127s] [ 14%]
2025-12-04T11:29:55.3646562Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_uint8_cuda <- test/inductor/test_torchinductor.py PASSED [1.5066s] [ 14%]
2025-12-04T11:29:55.3647865Z inductor/test_compile_subprocess.py::GPUTests::test_buffer_batch_norm_cuda <- test/inductor/test_torchinductor.py PASSED [1.4812s] [ 14%]
2025-12-04T11:29:55.3649264Z inductor/test_compile_subprocess.py::GPUTests::test_buffer_copied_in_graph_with_different_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4896s] [ 15%]
2025-12-04T11:29:55.3650709Z inductor/test_compile_subprocess.py::GPUTests::test_buffer_use_after_remove_cuda <- test/inductor/test_torchinductor.py PASSED [2.6171s] [ 15%]
2025-12-04T11:29:55.3652083Z inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_float_ndigits_neg_cuda <- test/inductor/test_torchinductor.py PASSED [0.3194s] [ 15%]
2025-12-04T11:29:55.3653495Z inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_float_ndigits_pos_cuda <- test/inductor/test_torchinductor.py PASSED [0.2651s] [ 16%]
2025-12-04T11:29:55.3654925Z inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_float_ndigits_zero_cuda <- test/inductor/test_torchinductor.py PASSED [0.2627s] [ 16%]
2025-12-04T11:29:55.3656420Z inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_int_ndigits_zero_cuda <- test/inductor/test_torchinductor.py PASSED [0.2038s] [ 17%]
2025-12-04T11:29:55.3658370Z inductor/test_compile_subprocess.py::GPUTests::test_cat_of_loops_and_extern_kernel_cuda <- test/inductor/test_torchinductor.py W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.3660092Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.3661564Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.3663115Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.3664537Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.3666087Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.3667580Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.3668942Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.3670319Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.3672084Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.3673603Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.3674994Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.3676386Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.3677983Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.3679447Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.3680911Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.3682365Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.3683857Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.3685347Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.3686855Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.3688509Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.3690189Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.3691922Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims._low_memory_max_pool_with_offsets.default
2025-12-04T11:29:55.3693132Z PASSED [1.1611s] [ 17%]
2025-12-04T11:29:55.3693908Z inductor/test_compile_subprocess.py::GPUTests::test_cat_single_empty_cuda <- test/inductor/test_torchinductor.py PASSED [0.2635s] [ 17%]
2025-12-04T11:29:55.3695254Z inductor/test_compile_subprocess.py::GPUTests::test_cat_uint8_cuda <- test/inductor/test_torchinductor.py PASSED [0.3904s] [ 18%]
2025-12-04T11:29:55.3696558Z inductor/test_compile_subprocess.py::GPUTests::test_cat_upcasting_cuda <- test/inductor/test_torchinductor.py PASSED [0.6337s] [ 18%]
2025-12-04T11:29:55.3697780Z inductor/test_compile_subprocess.py::GPUTests::test_cauchy_cuda <- test/inductor/test_torchinductor.py PASSED [0.2784s] [ 18%]
2025-12-04T11:29:55.3699024Z inductor/test_compile_subprocess.py::GPUTests::test_chunk_recompiles_cuda <- test/inductor/test_torchinductor.py PASSED [1.1241s] [ 19%]
2025-12-04T11:29:55.3700885Z inductor/test_compile_subprocess.py::GPUTests::test_computed_buffer_inlining_cuda <- test/inductor/test_torchinductor.py W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.3702605Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.3704097Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.3705504Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.3706871Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.3708421Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.3709934Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.3711267Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.3712648Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.3714154Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.3715667Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.3717076Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.3718480Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.3719940Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.3721420Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.3722892Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.3724358Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.3725951Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.3727436Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.3728925Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.3730477Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.3732038Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.3733631Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default
2025-12-04T11:29:55.3734688Z PASSED [0.2243s] [ 19%]
2025-12-04T11:29:55.3735499Z inductor/test_compile_subprocess.py::GPUTests::test_consecutive_split_cumprod_cuda <- test/inductor/test_torchinductor.py PASSED [0.4692s] [ 19%]
2025-12-04T11:29:55.3737393Z inductor/test_compile_subprocess.py::GPUTests::test_const_int32_to_float_cuda <- test/inductor/test_torchinductor.py W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.3739067Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.3740552Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.3741948Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.3743303Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.3744851Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.3746349Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.3747663Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.3749054Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.3750574Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.3752130Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.3753536Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.3754907Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.3756398Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.3757893Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.3759353Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.3760934Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.3762452Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.3763938Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.3765434Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.3766951Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.3768506Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.3770187Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.3771572Z PASSED [0.8194s] [ 20%]
2025-12-04T11:29:55.3772340Z inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_1d_cuda <- test/inductor/test_torchinductor.py PASSED [0.6795s] [ 20%]
2025-12-04T11:29:55.3773630Z inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_float64_cuda <- test/inductor/test_torchinductor.py PASSED [0.3179s] [ 20%]
2025-12-04T11:29:55.3774939Z inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_nd_inplace_cuda <- test/inductor/test_torchinductor.py PASSED [0.1827s] [ 21%]
2025-12-04T11:29:55.3776506Z inductor/test_compile_subprocess.py::GPUTests::test_conv_functional_bn_fuse_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (only support cpu conv bn test) [ 21%]
2025-12-04T11:29:55.3777970Z inductor/test_compile_subprocess.py::GPUTests::test_convolution4_cuda <- test/inductor/test_torchinductor.py PASSED [0.5776s] [ 21%]
2025-12-04T11:29:55.3779820Z inductor/test_compile_subprocess.py::GPUTests::test_copy_non_blocking_is_pinned_use_cat_True_cuda <- test/inductor/test_torchinductor.py W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.3781571Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.3783141Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.3784553Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.3785918Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.3787508Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.3789043Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.3790372Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.3791755Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.3793329Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.3794845Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.3796244Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.3797631Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.3799093Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.3800552Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.3802005Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.3803468Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.3804954Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.3806441Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.3807950Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.3809460Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.3811042Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.3812662Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.device_put.default
2025-12-04T11:29:55.3814088Z W1204 11:22:25.236000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3815026Z W1204 11:22:25.237000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3815939Z W1204 11:22:25.238000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3816925Z W1204 11:22:25.239000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3817884Z W1204 11:22:25.240000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3818827Z W1204 11:22:25.240000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3819748Z W1204 11:22:25.241000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3820668Z W1204 11:22:25.242000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3821585Z W1204 11:22:25.242000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3822524Z W1204 11:22:25.243000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3823446Z W1204 11:22:25.244000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3824368Z W1204 11:22:25.244000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3825291Z W1204 11:22:25.245000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3826204Z W1204 11:22:25.246000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3827132Z W1204 11:22:25.246000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3828059Z W1204 11:22:25.247000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3828985Z W1204 11:22:25.248000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3829899Z W1204 11:22:25.249000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3830824Z W1204 11:22:25.249000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3831756Z W1204 11:22:25.250000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3832685Z W1204 11:22:25.251000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3833593Z W1204 11:22:25.251000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3834520Z W1204 11:22:25.252000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3835450Z W1204 11:22:25.253000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3836377Z W1204 11:22:25.253000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3837290Z W1204 11:22:25.254000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3838215Z W1204 11:22:25.255000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3839140Z W1204 11:22:25.255000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3840064Z W1204 11:22:25.256000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3840973Z W1204 11:22:25.257000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3841956Z W1204 11:22:25.257000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3842881Z W1204 11:22:25.258000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3843793Z W1204 11:22:25.259000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3844710Z W1204 11:22:25.259000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3846345Z W1204 11:22:25.260000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3847304Z W1204 11:22:25.261000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3848222Z W1204 11:22:25.262000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3849126Z W1204 11:22:25.262000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3850048Z W1204 11:22:25.263000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3851002Z W1204 11:22:25.264000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3851906Z W1204 11:22:25.264000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3852830Z W1204 11:22:25.265000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3853752Z W1204 11:22:25.266000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3854675Z W1204 11:22:25.266000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3855587Z W1204 11:22:25.267000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3856586Z W1204 11:22:25.268000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3857506Z W1204 11:22:25.268000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3858431Z W1204 11:22:25.269000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3859338Z W1204 11:22:25.270000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3860263Z W1204 11:22:25.271000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3861187Z W1204 11:22:25.271000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3862227Z W1204 11:22:25.272000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3863130Z W1204 11:22:25.273000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3864056Z W1204 11:22:25.273000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3864976Z W1204 11:22:25.274000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3865896Z W1204 11:22:25.275000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3866807Z W1204 11:22:25.275000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3867738Z W1204 11:22:25.276000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3868662Z W1204 11:22:25.277000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3869579Z W1204 11:22:25.277000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3870489Z W1204 11:22:25.278000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3872153Z W1204 11:22:25.279000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3873094Z W1204 11:22:25.279000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3874016Z W1204 11:22:25.280000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3874984Z W1204 11:22:25.281000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3875976Z W1204 11:22:25.282000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3876897Z W1204 11:22:25.282000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3877803Z W1204 11:22:25.283000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3878724Z W1204 11:22:25.284000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3879783Z W1204 11:22:25.284000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3880706Z W1204 11:22:25.285000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3881610Z W1204 11:22:25.286000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3882535Z W1204 11:22:25.286000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3883461Z W1204 11:22:25.287000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3884386Z W1204 11:22:25.288000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3885289Z W1204 11:22:25.288000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3886208Z W1204 11:22:25.289000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3887122Z W1204 11:22:25.290000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3888043Z W1204 11:22:25.291000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3888953Z W1204 11:22:25.291000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3889870Z W1204 11:22:25.292000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3890785Z W1204 11:22:25.293000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3891699Z W1204 11:22:25.294000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3892605Z W1204 11:22:25.294000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3893524Z W1204 11:22:25.295000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3894449Z W1204 11:22:25.296000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3895366Z W1204 11:22:25.296000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3896279Z W1204 11:22:25.297000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3897285Z W1204 11:22:25.298000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3898210Z W1204 11:22:25.298000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3899135Z W1204 11:22:25.299000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3900091Z W1204 11:22:25.300000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3901015Z W1204 11:22:25.301000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3901933Z W1204 11:22:25.301000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3902890Z W1204 11:22:25.302000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3903831Z W1204 11:22:25.303000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3904750Z W1204 11:22:25.303000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3905668Z W1204 11:22:25.304000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3906579Z W1204 11:22:25.305000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3907534Z W1204 11:22:25.305000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3908192Z PASSED [3.4798s] [ 22%]
2025-12-04T11:29:55.3908986Z inductor/test_compile_subprocess.py::GPUTests::test_copy_with_scalar_src_cuda <- test/inductor/test_torchinductor.py PASSED [0.5524s] [ 22%]
2025-12-04T11:29:55.3910312Z inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_cpu_scalar_cuda <- test/inductor/test_torchinductor.py PASSED [8.1973s] [ 22%]
2025-12-04T11:29:55.3912077Z inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_gpu_tensor_cpp_cuda <- test/inductor/test_torchinductor.py W1204 11:22:36.342000 94292 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3913350Z PASSED [6.8863s] [ 23%]
2025-12-04T11:29:55.3914569Z inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_gpu_tensor_dynamic_cuda <- test/inductor/test_torchinductor.py W1204 11:22:43.268000 94292 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.3915846Z PASSED [0.2600s] [ 23%]
2025-12-04T11:29:55.3916561Z inductor/test_compile_subprocess.py::GPUTests::test_cumsum_inf_cuda <- test/inductor/test_torchinductor.py PASSED [0.6998s] [ 23%]
2025-12-04T11:29:55.3917791Z inductor/test_compile_subprocess.py::GPUTests::test_cumsum_no_mask_cuda <- test/inductor/test_torchinductor.py PASSED [0.9193s] [ 24%]
2025-12-04T11:29:55.3919519Z inductor/test_compile_subprocess.py::GPUTests::test_custom_op_2_cuda <- test/inductor/test_torchinductor.py W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.3921151Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.3922632Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.3924040Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.3925402Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.3926960Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.3928448Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.3929823Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.3931200Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.3932722Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.3934312Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.3935702Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.3937190Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.3938688Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.3940144Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.3941610Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.3943049Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.3944545Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.3946022Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.3947523Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.3949033Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.3950597Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.3952179Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.test.foo2.default
2025-12-04T11:29:55.3953240Z PASSED [0.2772s] [ 24%]
2025-12-04T11:29:55.3954576Z inductor/test_compile_subprocess.py::GPUTests::test_custom_op_fixed_layout_sequential_cuda <- test/inductor/test_torchinductor.py W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.3956302Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.3957787Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.3959182Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.3960572Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.3962127Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.3963613Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.3964973Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.3966435Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.3967965Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.3969460Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.3970907Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.3972530Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.3973994Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.3975456Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.3976966Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.3978425Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.3979917Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.3981396Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.3982901Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.3984411Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.3985986Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.3987561Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.test.bar.default
2025-12-04T11:29:55.3988971Z W1204 11:22:45.399000 94107 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:29:55.3989658Z PASSED [0.3617s] [ 25%]
2025-12-04T11:29:55.3990460Z inductor/test_compile_subprocess.py::GPUTests::test_custom_scan_op_multi_input_cuda <- test/inductor/test_torchinductor.py PASSED [0.1730s] [ 25%]
2025-12-04T11:29:55.3992028Z inductor/test_compile_subprocess.py::GPUTests::test_data_type_propogation_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (triton not supported) [ 25%]
2025-12-04T11:29:55.3993456Z inductor/test_compile_subprocess.py::GPUTests::test_dense_mask_index_cuda <- test/inductor/test_torchinductor.py PASSED [0.5806s] [ 26%]
2025-12-04T11:29:55.3994652Z inductor/test_compile_subprocess.py::GPUTests::test_div1_cuda <- test/inductor/test_torchinductor.py PASSED [0.6825s] [ 26%]
2025-12-04T11:29:55.3995854Z inductor/test_compile_subprocess.py::GPUTests::test_div8_cuda <- test/inductor/test_torchinductor.py PASSED [0.8310s] [ 26%]
2025-12-04T11:29:55.3997122Z inductor/test_compile_subprocess.py::GPUTests::test_dropout_trivial_1_cuda <- test/inductor/test_torchinductor.py PASSED [0.2748s] [ 27%]
2025-12-04T11:29:55.3998621Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_int8_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0034s] (uses bfloat16 which requires SM >= 80) [ 27%]
2025-12-04T11:29:55.4000351Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_float32_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (uses bfloat16 which requires SM >= 80) [ 27%]
2025-12-04T11:29:55.4002122Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_float64_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 28%]
2025-12-04T11:29:55.4003854Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float64_int64_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 28%]
2025-12-04T11:29:55.4005567Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_float16_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (uses bfloat16 which requires SM >= 80) [ 28%]
2025-12-04T11:29:55.4007278Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_float32_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 29%]
2025-12-04T11:29:55.4008971Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_uint8_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 29%]
2025-12-04T11:29:55.4010649Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_int8_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 29%]
2025-12-04T11:29:55.4012333Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_int8_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 30%]
2025-12-04T11:29:55.4014014Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_uint8_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 30%]
2025-12-04T11:29:55.4015717Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_float16_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 30%]
2025-12-04T11:29:55.4017485Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_float32_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 31%]
2025-12-04T11:29:55.4019175Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_float64_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 31%]
2025-12-04T11:29:55.4020872Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_int16_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 31%]
2025-12-04T11:29:55.4022566Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_int16_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 32%]
2025-12-04T11:29:55.4024296Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_uint8_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (uses bfloat16 which requires SM >= 80) [ 32%]
2025-12-04T11:29:55.4026208Z inductor/test_compile_subprocess.py::GPUTests::test_elu_cuda <- test/inductor/test_torchinductor.py W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.4027829Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.4029356Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.4030752Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.4032113Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.4033680Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.4035179Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.4036517Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.4037889Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.4039412Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.4040915Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.4042315Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.4043707Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.4045166Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.4046611Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.4048072Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.4049536Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.4051026Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.4052516Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.4054006Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.4055567Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.4057210Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.4058924Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.4060109Z PASSED [0.9188s] [ 32%]
2025-12-04T11:29:55.4060805Z inductor/test_compile_subprocess.py::GPUTests::test_erfc_cuda <- test/inductor/test_torchinductor.py PASSED [0.7078s] [ 33%]
2025-12-04T11:29:55.4061988Z inductor/test_compile_subprocess.py::GPUTests::test_erfinv_cuda <- test/inductor/test_torchinductor.py PASSED [0.7006s] [ 33%]
2025-12-04T11:29:55.4063163Z inductor/test_compile_subprocess.py::GPUTests::test_exp_cuda <- test/inductor/test_torchinductor.py PASSED [0.5491s] [ 34%]
2025-12-04T11:29:55.4064429Z inductor/test_compile_subprocess.py::GPUTests::test_expanded_reduction_cuda <- test/inductor/test_torchinductor.py PASSED [0.8582s] [ 34%]
2025-12-04T11:29:55.4065665Z inductor/test_compile_subprocess.py::GPUTests::test_expm1_cuda <- test/inductor/test_torchinductor.py PASSED [4.0652s] [ 34%]
2025-12-04T11:29:55.4067450Z inductor/test_compile_subprocess.py::GPUTests::test_fallback_mutable_op_with_return_cuda <- test/inductor/test_torchinductor.py W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.4069145Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] Traceback (most recent call last):
2025-12-04T11:29:55.4070612Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.4072155Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493]     ).serialize()
2025-12-04T11:29:55.4073497Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.4075036Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.4076521Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.4077813Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493]     pickler.dump(obj)
2025-12-04T11:29:55.4079142Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.4080639Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.4082129Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.4083507Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493]     cls(obj, pickler.options),
2025-12-04T11:29:55.4084843Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.4086380Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.4087820Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.4089259Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.4090814Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.4092274Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.4093741Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.4095263Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.4096833Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.4098386Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.4099958Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.mylib.inplace_.default
2025-12-04T11:29:55.4101035Z PASSED [0.0600s] [ 35%]
2025-12-04T11:29:55.4101748Z inductor/test_compile_subprocess.py::GPUTests::test_fill1_cuda <- test/inductor/test_torchinductor.py PASSED [0.5711s] [ 35%]
2025-12-04T11:29:55.4103428Z inductor/test_compile_subprocess.py::GPUTests::test_flip_cat_cuda <- test/inductor/test_torchinductor.py W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.4105023Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.4106511Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.4107909Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.4109268Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.4110819Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.4112305Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.4113631Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.4115015Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.4116539Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.4118094Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.4119504Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.4120925Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.4122411Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.4123871Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.4125318Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.4126811Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.4128297Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.4129786Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.4131283Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.4132790Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.4134360Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.4135940Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.rev.default
2025-12-04T11:29:55.4137474Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.4138565Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.4140046Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.4141445Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.4142797Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.4144344Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.4145824Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.4147152Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.4148604Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.4150134Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.4151676Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.4153117Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.4154506Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.4155967Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.4157478Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.4158921Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.4160380Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.4161865Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.4163350Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.4164846Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.4166351Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.4167930Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.4169506Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.rev.default
2025-12-04T11:29:55.4170570Z PASSED [0.6333s] [ 35%]
2025-12-04T11:29:55.4171986Z inductor/test_compile_subprocess.py::GPUTests::test_flip_cuda <- test/inductor/test_torchinductor.py W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.4173601Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.4175113Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.4176581Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.4178034Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.4179575Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.4181083Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.4182465Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.4183902Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.4185429Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.4186937Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.4188397Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.4189787Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.4191252Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.4192696Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.4194159Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.4195629Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.4197119Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.4198608Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.4200094Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.4201615Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.4203190Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.4204781Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.rev.default
2025-12-04T11:29:55.4206252Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.4207338Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.4208889Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.4210297Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.4211651Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.4213227Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.4214761Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.4216086Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.4217534Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.4219109Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.4220610Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.4222021Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.4223399Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.4224857Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.4226307Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.4227762Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.4229227Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.4230716Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.4232197Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.4233678Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.4235196Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.4236770Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.4238348Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.rev.default
2025-12-04T11:29:55.4239414Z PASSED [0.4836s] [ 36%]
2025-12-04T11:29:55.4240317Z inductor/test_compile_subprocess.py::GPUTests::test_float_index_expression_type_promotion_cuda <- test/inductor/test_torchinductor.py PASSED [0.2572s] [ 36%]
2025-12-04T11:29:55.4241671Z inductor/test_compile_subprocess.py::GPUTests::test_floordiv_cuda <- test/inductor/test_torchinductor.py PASSED [0.6685s] [ 36%]
2025-12-04T11:29:55.4242896Z inductor/test_compile_subprocess.py::GPUTests::test_fmod_zero_dim_cuda <- test/inductor/test_torchinductor.py PASSED [1.0238s] [ 37%]
2025-12-04T11:29:55.4244734Z inductor/test_compile_subprocess.py::GPUTests::test_forced_buffer_realize_cuda <- test/inductor/test_torchinductor.py W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.4246413Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.4247907Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.4249334Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.4250700Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.4252259Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.4253741Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.4255068Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.4256521Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.4258047Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.4259550Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.4260950Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.4262342Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.4263798Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.4265249Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.4266833Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.4268293Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.4269781Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.4271544Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.4273049Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.4274601Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.4276215Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.4277858Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops._inductor_test.realize.default
2025-12-04T11:29:55.4279371Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.4280511Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.4281991Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.4283392Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.4284749Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.4286303Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.4287794Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.4289115Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.4290502Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.4292022Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.4293545Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.4294934Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.4296316Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.4297833Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.4299302Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.4300887Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.4302493Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.4304195Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.4306099Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.4307748Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.4309388Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.4311143Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.4312940Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops._inductor_test.realize.default
2025-12-04T11:29:55.4314119Z PASSED [0.3363s] [ 37%]
2025-12-04T11:29:55.4315110Z inductor/test_compile_subprocess.py::GPUTests::test_fractional_max_pool2d2_cuda <- test/inductor/test_torchinductor.py PASSED [1.4649s] [ 37%]
2025-12-04T11:29:55.4316518Z inductor/test_compile_subprocess.py::GPUTests::test_full_like_cuda <- test/inductor/test_torchinductor.py PASSED [0.3850s] [ 38%]
2025-12-04T11:29:55.4318183Z inductor/test_compile_subprocess.py::GPUTests::test_fuse_large_params_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0036s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 38%]
2025-12-04T11:29:55.4320472Z inductor/test_compile_subprocess.py::GPUTests::test_fusing_write_into_disjoint_read_cuda <- test/inductor/test_torchinductor.py W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.4344808Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.4346348Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.4347731Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.4349077Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.4350629Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.4352127Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.4353455Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.4354840Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.4356350Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.4357971Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.4359379Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.4360808Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.4362296Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.4363758Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.4365219Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.4366723Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.4368221Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.4369697Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.4371414Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.4372955Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.4374530Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.4376117Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.rev.default
2025-12-04T11:29:55.4377645Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.4378745Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.4380236Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.4381641Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.4382987Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.4384547Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.4386057Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.4387480Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.4388868Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.4390393Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.4392010Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.4393419Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.4394798Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.4396265Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.4397756Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.4399214Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.4400673Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.4402168Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.4403636Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.4405136Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.4406658Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.4408233Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.4409809Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.rev.default
2025-12-04T11:29:55.4410867Z PASSED [1.7494s] [ 38%]
2025-12-04T11:29:55.4411607Z inductor/test_compile_subprocess.py::GPUTests::test_gather1_cuda <- test/inductor/test_torchinductor.py PASSED [0.8960s] [ 39%]
2025-12-04T11:29:55.4412844Z inductor/test_compile_subprocess.py::GPUTests::test_gather_scatter_cuda <- test/inductor/test_torchinductor.py PASSED [0.5518s] [ 39%]
2025-12-04T11:29:55.4414105Z inductor/test_compile_subprocess.py::GPUTests::test_generate_rand_fp8_cuda <- test/inductor/test_torchinductor.py PASSED [0.0036s] [ 39%]
2025-12-04T11:29:55.4415301Z inductor/test_compile_subprocess.py::GPUTests::test_glu_cuda <- test/inductor/test_torchinductor.py PASSED [0.8363s] [ 40%]
2025-12-04T11:29:55.4416611Z inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_both_scalars_cuda <- test/inductor/test_torchinductor.py PASSED [0.7717s] [ 40%]
2025-12-04T11:29:55.4418074Z inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_constant_tensor2_cuda <- test/inductor/test_torchinductor.py PASSED [0.2118s] [ 40%]
2025-12-04T11:29:55.4420017Z inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_mutation_real_name_cuda <- test/inductor/test_torchinductor.py W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.4421776Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.4423314Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.4424701Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.4426071Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.4427668Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.4429157Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.4430474Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.4431841Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.4433356Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.4434865Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.4436257Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.4437646Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.4439112Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.4440550Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.4441998Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.4443440Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.4444919Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.4446401Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.4447899Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.4449450Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.4451033Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.4452706Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.4454242Z W1204 11:23:07.238000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.4455180Z W1204 11:23:07.239000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T11:29:55.4455817Z PASSED [0.3561s] [ 41%]
2025-12-04T11:29:55.4457174Z inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_no_inputs_cuda <- test/inductor/test_torchinductor.py W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.4458911Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] Traceback (most recent call last):
2025-12-04T11:29:55.4460381Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.4461766Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0]     ).serialize()
2025-12-04T11:29:55.4463108Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.4464650Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.4466148Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.4467463Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0]     pickler.dump(obj)
2025-12-04T11:29:55.4468821Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.4470338Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.4472047Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.4473439Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.4474800Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.4476241Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.4477687Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.4479142Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.4480701Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.4482188Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.4483722Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.4485294Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.4486797Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.4488361Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.4490042Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.inductor_seeds.default
2025-12-04T11:29:55.4491154Z PASSED [0.8199s] [ 41%]
2025-12-04T11:29:55.4491968Z inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_pad_dynamic_cuda <- test/inductor/test_torchinductor.py PASSED [3.9940s] [ 42%]
2025-12-04T11:29:55.4493330Z inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_refcount_cuda <- test/inductor/test_torchinductor.py PASSED [5.2116s] [ 42%]
2025-12-04T11:29:55.4495132Z inductor/test_compile_subprocess.py::GPUTests::test_grid_sampler_2d_cuda <- test/inductor/test_torchinductor.py W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.4496844Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.4498325Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.4499713Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.4501057Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.4502591Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.4504089Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.4505408Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.4506774Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.4508287Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.4509784Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.4511241Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.4512612Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.4514050Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.4515564Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.4517016Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.4518476Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.4519982Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.4521470Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.4522976Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.4524490Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.4526059Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.4527638Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default
2025-12-04T11:29:55.4528710Z PASSED [3.4367s] [ 42%]
2025-12-04T11:29:55.4529436Z inductor/test_compile_subprocess.py::GPUTests::test_index2_cuda <- test/inductor/test_torchinductor.py PASSED [0.9474s] [ 43%]
2025-12-04T11:29:55.4531177Z inductor/test_compile_subprocess.py::GPUTests::test_index_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.4532846Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.4534337Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.4535737Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.4537196Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.4538753Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.4540245Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.4541622Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.4543008Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.4544531Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.4546107Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.4547499Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.4548880Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.4550371Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.4551840Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.4553285Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.4554735Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.4556228Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.4557715Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.4559218Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.4560732Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.4562299Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.4563893Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default
2025-12-04T11:29:55.4565363Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.4566462Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.4567935Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.4569337Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.4570690Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.4572527Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.4574027Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.4575347Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.4576896Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.4578422Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.4579935Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.4581373Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.4582754Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.4584219Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.4585680Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.4587143Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.4588595Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.4590093Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.4591586Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.4593078Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.4594585Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.4596162Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.4597758Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default
2025-12-04T11:29:55.4598834Z PASSED [1.3900s] [ 43%]
2025-12-04T11:29:55.4600113Z inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_abs_cuda <- test/inductor/test_torchinductor.py W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.4601802Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.4603333Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.4604730Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.4606085Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.4607697Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.4609220Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.4610550Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.4611955Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.4613478Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.4615008Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.4616494Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.4617872Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.4619338Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.4620805Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.4622266Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.4623730Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.4625211Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.4626705Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.4628212Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.4629744Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.4631321Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.4632941Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default
2025-12-04T11:29:55.4634018Z PASSED [0.2126s] [ 43%]
2025-12-04T11:29:55.4635283Z inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_cuda <- test/inductor/test_torchinductor.py W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.4636959Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.4638506Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.4639921Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.4641282Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.4642861Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.4644358Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.4645672Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.4647058Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.4648587Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.4650099Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.4651507Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.4652879Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.4654332Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.4655798Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.4657327Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.4658778Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.4660270Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.4661758Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.4663256Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.4664821Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.4666381Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.4668001Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default
2025-12-04T11:29:55.4669103Z PASSED [0.1764s] [ 44%]
2025-12-04T11:29:55.4670412Z inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_floordiv_cuda <- test/inductor/test_torchinductor.py W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.4672299Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.4673867Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.4675277Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.4676651Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.4678209Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.4679701Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.4681034Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.4682416Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.4683938Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.4685450Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.4686846Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.4688226Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.4689682Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.4691147Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.4692592Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.4694039Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.4695610Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.4697154Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.4698708Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.4700258Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.4701831Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.4703419Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default
2025-12-04T11:29:55.4704535Z PASSED [0.2917s] [ 44%]
2025-12-04T11:29:55.4705853Z inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_remainder_cuda <- test/inductor/test_torchinductor.py W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.4707568Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.4709055Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.4710459Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.4711813Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.4713347Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.4714854Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.4716177Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.4717557Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.4719080Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.4720581Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.4721993Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.4723373Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.4724824Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.4726316Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.4727765Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.4729228Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.4730782Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.4732267Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.4733752Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.4735299Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.4736938Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.4738531Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default
2025-12-04T11:29:55.4739598Z PASSED [0.2955s] [ 44%]
2025-12-04T11:29:55.4740395Z inductor/test_compile_subprocess.py::GPUTests::test_index_put_failed_reinplace_cuda <- test/inductor/test_torchinductor.py PASSED [0.6007s] [ 45%]
2025-12-04T11:29:55.4741746Z inductor/test_compile_subprocess.py::GPUTests::test_index_put_fallback1_cuda <- test/inductor/test_torchinductor.py PASSED [0.7557s] [ 45%]
2025-12-04T11:29:55.4743034Z inductor/test_compile_subprocess.py::GPUTests::test_index_put_index_cuda <- test/inductor/test_torchinductor.py PASSED [0.6052s] [ 45%]
2025-12-04T11:29:55.4744293Z inductor/test_compile_subprocess.py::GPUTests::test_index_select_cuda <- test/inductor/test_torchinductor.py PASSED [2.0063s] [ 46%]
2025-12-04T11:29:55.4745578Z inductor/test_compile_subprocess.py::GPUTests::test_indirect_load_broadcast_cuda <- test/inductor/test_torchinductor.py PASSED [1.8674s] [ 46%]
2025-12-04T11:29:55.4747047Z inductor/test_compile_subprocess.py::GPUTests::test_inductor_layout_optimization_input_mutations_cuda <- test/inductor/test_torchinductor.py PASSED [0.5789s] [ 46%]
2025-12-04T11:29:55.4748988Z inductor/test_compile_subprocess.py::GPUTests::test_inner_fn_str_and_stride_cuda <- test/inductor/test_torchinductor.py W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.4750666Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.4752158Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.4753545Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.4754907Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.4756507Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.4758012Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.4759326Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.4760747Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.4762314Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.4763837Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.4765241Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.4766640Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.4768107Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.4769573Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.4771248Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.4772721Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.4774198Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.4775689Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.4777256Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.4778773Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.4780336Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.4781980Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops._inductor_test.realize.default
2025-12-04T11:29:55.4783111Z PASSED [0.1885s] [ 47%]
2025-12-04T11:29:55.4783856Z inductor/test_compile_subprocess.py::GPUTests::test_inplace_add_cuda <- test/inductor/test_torchinductor.py PASSED [0.2163s] [ 47%]
2025-12-04T11:29:55.4785101Z inductor/test_compile_subprocess.py::GPUTests::test_input_mutation2_cuda <- test/inductor/test_torchinductor.py PASSED [0.2752s] [ 47%]
2025-12-04T11:29:55.4786371Z inductor/test_compile_subprocess.py::GPUTests::test_input_mutation3_cuda <- test/inductor/test_torchinductor.py PASSED [0.2832s] [ 48%]
2025-12-04T11:29:55.4787739Z inductor/test_compile_subprocess.py::GPUTests::test_input_mutation4_cuda <- test/inductor/test_torchinductor.py PASSED [0.4367s] [ 48%]
2025-12-04T11:29:55.4789049Z inductor/test_compile_subprocess.py::GPUTests::test_insignificant_strides_cuda <- test/inductor/test_torchinductor.py PASSED [0.2029s] [ 48%]
2025-12-04T11:29:55.4790308Z inductor/test_compile_subprocess.py::GPUTests::test_isinf2_cuda <- test/inductor/test_torchinductor.py PASSED [0.4578s] [ 49%]
2025-12-04T11:29:55.4791611Z inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [0.8521s] [ 49%]
2025-12-04T11:29:55.4793021Z inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [0.8322s] [ 49%]
2025-12-04T11:29:55.4794287Z inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda <- test/inductor/test_torchinductor.py FAILED [0.8161s] [ 49%]
2025-12-04T11:29:55.4794935Z 
2025-12-04T11:29:55.4795098Z ==================================== RERUNS ====================================
2025-12-04T11:29:55.4795602Z ___________________________ GPUTests.test_isinf_cuda ___________________________
2025-12-04T11:29:55.4795808Z Traceback (most recent call last):
2025-12-04T11:29:55.4796221Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T11:29:55.4796329Z     return value(self)
2025-12-04T11:29:55.4796743Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 8265, in test_isinf
2025-12-04T11:29:55.4796994Z     self.common(fn, [torch.tensor(values, dtype=dtype)], check_lowp=False)
2025-12-04T11:29:55.4797291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T11:29:55.4797410Z     return func(*args, **kwds)
2025-12-04T11:29:55.4797842Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 692, in check_model_gpu
2025-12-04T11:29:55.4797957Z     check_model(
2025-12-04T11:29:55.4798359Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 514, in check_model
2025-12-04T11:29:55.4798495Z     actual = run(*example_inputs, **kwargs)
2025-12-04T11:29:55.4798993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:29:55.4799241Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:29:55.4799766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T11:29:55.4799965Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:29:55.4800481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T11:29:55.4800646Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:29:55.4801178Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:29:55.4801515Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:29:55.4802054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 422, in codegen_and_compile
2025-12-04T11:29:55.4802264Z     output = self._send_to_child(inputs).deserialize(constants)
2025-12-04T11:29:55.4802785Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 596, in _send_to_child
2025-12-04T11:29:55.4802894Z     return f.result()
2025-12-04T11:29:55.4803254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result
2025-12-04T11:29:55.4803384Z     return self.__get_result()
2025-12-04T11:29:55.4803774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
2025-12-04T11:29:55.4803899Z     raise self._exception
2025-12-04T11:29:55.4804324Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T11:29:55.4804332Z 
2025-12-04T11:29:55.4804434Z Name=<unknown>
2025-12-04T11:29:55.4804571Z Traceback (most recent call last):
2025-12-04T11:29:55.4805109Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T11:29:55.4805209Z     result = job()
2025-12-04T11:29:55.4805829Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess
2025-12-04T11:29:55.4806009Z     result = cls._run_in_child(pickled_input, extra_env)
2025-12-04T11:29:55.4806641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child
2025-12-04T11:29:55.4806851Z     output_graph = _InProcessFxCompile().codegen_and_compile(
2025-12-04T11:29:55.4807372Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:29:55.4807516Z     _check_triton_bf16_support(graph)
2025-12-04T11:29:55.4808093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:29:55.4808234Z     warn_and_skip(node.get_device())
2025-12-04T11:29:55.4808718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:29:55.4808863Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:29:55.4809051Z torch._dynamo.exc.SkipFrame: BF16 is not supported
2025-12-04T11:29:55.4809058Z 
2025-12-04T11:29:55.4809065Z 
2025-12-04T11:29:55.4809787Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:29:55.4809794Z 
2025-12-04T11:29:55.4809798Z 
2025-12-04T11:29:55.4810032Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:55.4810476Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda
2025-12-04T11:29:55.4810483Z 
2025-12-04T11:29:55.4810758Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:55.4810998Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.4811160Z stats [('calls_captured', 8), ('unique_graphs', 3)]
2025-12-04T11:29:55.4811535Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)]
2025-12-04T11:29:55.4812079Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)]
2025-12-04T11:29:55.4812304Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.4813056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4813340Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4814076Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4814353Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4815080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4815366Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4816083Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4816501Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4817229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4817503Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4818239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4818575Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4819311Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4819586Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4819807Z ___________________________ GPUTests.test_isinf_cuda ___________________________
2025-12-04T11:29:55.4819980Z Traceback (most recent call last):
2025-12-04T11:29:55.4820384Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T11:29:55.4820505Z     return value(self)
2025-12-04T11:29:55.4820909Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 8265, in test_isinf
2025-12-04T11:29:55.4821162Z     self.common(fn, [torch.tensor(values, dtype=dtype)], check_lowp=False)
2025-12-04T11:29:55.4821453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T11:29:55.4821571Z     return func(*args, **kwds)
2025-12-04T11:29:55.4822001Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 692, in check_model_gpu
2025-12-04T11:29:55.4822113Z     check_model(
2025-12-04T11:29:55.4822515Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 514, in check_model
2025-12-04T11:29:55.4822665Z     actual = run(*example_inputs, **kwargs)
2025-12-04T11:29:55.4823156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:29:55.4823404Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:29:55.4823932Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T11:29:55.4824128Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:29:55.4824644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T11:29:55.4824807Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:29:55.4825341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:29:55.4825677Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:29:55.4826212Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 422, in codegen_and_compile
2025-12-04T11:29:55.4826421Z     output = self._send_to_child(inputs).deserialize(constants)
2025-12-04T11:29:55.4826941Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 596, in _send_to_child
2025-12-04T11:29:55.4827049Z     return f.result()
2025-12-04T11:29:55.4827420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result
2025-12-04T11:29:55.4827539Z     return self.__get_result()
2025-12-04T11:29:55.4827928Z   File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
2025-12-04T11:29:55.4828050Z     raise self._exception
2025-12-04T11:29:55.4828432Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T11:29:55.4828438Z 
2025-12-04T11:29:55.4828575Z Name=<unknown>
2025-12-04T11:29:55.4828713Z Traceback (most recent call last):
2025-12-04T11:29:55.4829258Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T11:29:55.4829372Z     result = job()
2025-12-04T11:29:55.4829949Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess
2025-12-04T11:29:55.4830158Z     result = cls._run_in_child(pickled_input, extra_env)
2025-12-04T11:29:55.4830706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child
2025-12-04T11:29:55.4830914Z     output_graph = _InProcessFxCompile().codegen_and_compile(
2025-12-04T11:29:55.4831435Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:29:55.4831577Z     _check_triton_bf16_support(graph)
2025-12-04T11:29:55.4832126Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:29:55.4832303Z     warn_and_skip(node.get_device())
2025-12-04T11:29:55.4832788Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:29:55.4832928Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:29:55.4833123Z torch._dynamo.exc.SkipFrame: BF16 is not supported
2025-12-04T11:29:55.4833129Z 
2025-12-04T11:29:55.4833134Z 
2025-12-04T11:29:55.4833855Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:29:55.4833861Z 
2025-12-04T11:29:55.4833866Z 
2025-12-04T11:29:55.4834100Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:55.4834542Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda
2025-12-04T11:29:55.4834549Z 
2025-12-04T11:29:55.4834836Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:55.4835060Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.4835221Z stats [('calls_captured', 8), ('unique_graphs', 3)]
2025-12-04T11:29:55.4835598Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)]
2025-12-04T11:29:55.4836147Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)]
2025-12-04T11:29:55.4836371Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.4837129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4837416Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4838157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4838436Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4839162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4839459Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4840180Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4840470Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4841231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4841511Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4842249Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4842558Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4843343Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4843618Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4843841Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.4844020Z stats [('calls_captured', 8), ('unique_graphs', 3)]
2025-12-04T11:29:55.4844384Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)]
2025-12-04T11:29:55.4844976Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)]
2025-12-04T11:29:55.4845194Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.4845920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4846212Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4846933Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4847223Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4847947Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4848227Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4848966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4849243Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4849979Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4850256Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4850980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4851268Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4851985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4852277Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4852426Z =================================== FAILURES ===================================
2025-12-04T11:29:55.4852645Z ___________________________ GPUTests.test_isinf_cuda ___________________________
2025-12-04T11:29:55.4852786Z Traceback (most recent call last):
2025-12-04T11:29:55.4853188Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T11:29:55.4853291Z     return value(self)
2025-12-04T11:29:55.4853759Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 8265, in test_isinf
2025-12-04T11:29:55.4854010Z     self.common(fn, [torch.tensor(values, dtype=dtype)], check_lowp=False)
2025-12-04T11:29:55.4854306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T11:29:55.4854424Z     return func(*args, **kwds)
2025-12-04T11:29:55.4854854Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 692, in check_model_gpu
2025-12-04T11:29:55.4855000Z     check_model(
2025-12-04T11:29:55.4855401Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 514, in check_model
2025-12-04T11:29:55.4855571Z     actual = run(*example_inputs, **kwargs)
2025-12-04T11:29:55.4856075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:29:55.4856324Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:29:55.4856936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T11:29:55.4857171Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:29:55.4857681Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T11:29:55.4857843Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:29:55.4858381Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:29:55.4858717Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:29:55.4859249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 422, in codegen_and_compile
2025-12-04T11:29:55.4859457Z     output = self._send_to_child(inputs).deserialize(constants)
2025-12-04T11:29:55.4859982Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 596, in _send_to_child
2025-12-04T11:29:55.4860091Z     return f.result()
2025-12-04T11:29:55.4860462Z   File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result
2025-12-04T11:29:55.4860581Z     return self.__get_result()
2025-12-04T11:29:55.4860973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
2025-12-04T11:29:55.4861099Z     raise self._exception
2025-12-04T11:29:55.4861482Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T11:29:55.4861491Z 
2025-12-04T11:29:55.4861592Z Name=<unknown>
2025-12-04T11:29:55.4861731Z Traceback (most recent call last):
2025-12-04T11:29:55.4862268Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T11:29:55.4862381Z     result = job()
2025-12-04T11:29:55.4862962Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess
2025-12-04T11:29:55.4863141Z     result = cls._run_in_child(pickled_input, extra_env)
2025-12-04T11:29:55.4863659Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child
2025-12-04T11:29:55.4863865Z     output_graph = _InProcessFxCompile().codegen_and_compile(
2025-12-04T11:29:55.4864385Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:29:55.4864529Z     _check_triton_bf16_support(graph)
2025-12-04T11:29:55.4865073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:29:55.4865209Z     warn_and_skip(node.get_device())
2025-12-04T11:29:55.4865739Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:29:55.4865883Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:29:55.4866073Z torch._dynamo.exc.SkipFrame: BF16 is not supported
2025-12-04T11:29:55.4866078Z 
2025-12-04T11:29:55.4866083Z 
2025-12-04T11:29:55.4866797Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:29:55.4866839Z 
2025-12-04T11:29:55.4866843Z 
2025-12-04T11:29:55.4867076Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:55.4867560Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda
2025-12-04T11:29:55.4867566Z 
2025-12-04T11:29:55.4867854Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:55.4868075Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.4868242Z stats [('calls_captured', 8), ('unique_graphs', 3)]
2025-12-04T11:29:55.4868648Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)]
2025-12-04T11:29:55.4869194Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)]
2025-12-04T11:29:55.4869414Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.4870170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4870454Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4871391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4871678Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4872406Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4872697Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4873421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4873714Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4874437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4874714Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4875455Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4875731Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4876472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4876748Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4876967Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.4877143Z stats [('calls_captured', 8), ('unique_graphs', 3)]
2025-12-04T11:29:55.4877504Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)]
2025-12-04T11:29:55.4878149Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)]
2025-12-04T11:29:55.4878369Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.4879099Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4879388Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4880168Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4880496Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4881219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4881499Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4882235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4882554Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4883290Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4883568Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4884294Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4884582Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4885310Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4885597Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4885812Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.4885973Z stats [('calls_captured', 8), ('unique_graphs', 3)]
2025-12-04T11:29:55.4886342Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)]
2025-12-04T11:29:55.4886885Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)]
2025-12-04T11:29:55.4887116Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.4887838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4888117Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4888859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4889133Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4889865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4890145Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4890876Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4891163Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4891924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4892216Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4892937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4893262Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4894029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4894305Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4895140Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-84a2c5e5cdda7bdd.xml -
2025-12-04T11:29:55.4895314Z =========================== short test summary info ============================
2025-12-04T11:29:55.4896121Z FAILED [0.8161s] inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T11:29:55.4896128Z 
2025-12-04T11:29:55.4896247Z Name=<unknown>
2025-12-04T11:29:55.4896443Z Traceback (most recent call last):
2025-12-04T11:29:55.4896992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T11:29:55.4897112Z     result = job()
2025-12-04T11:29:55.4897689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess
2025-12-04T11:29:55.4897882Z     result = cls._run_in_child(pickled_input, extra_env)
2025-12-04T11:29:55.4898390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child
2025-12-04T11:29:55.4898601Z     output_graph = _InProcessFxCompile().codegen_and_compile(
2025-12-04T11:29:55.4899133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:29:55.4899260Z     _check_triton_bf16_support(graph)
2025-12-04T11:29:55.4899822Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:29:55.4899948Z     warn_and_skip(node.get_device())
2025-12-04T11:29:55.4900432Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:29:55.4900588Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:29:55.4900760Z torch._dynamo.exc.SkipFrame: BF16 is not supported
2025-12-04T11:29:55.4900765Z 
2025-12-04T11:29:55.4900770Z 
2025-12-04T11:29:55.4901496Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:29:55.4901504Z 
2025-12-04T11:29:55.4901508Z 
2025-12-04T11:29:55.4901729Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:55.4902168Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda
2025-12-04T11:29:55.4902189Z 
2025-12-04T11:29:55.4902462Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:55.4902647Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:29:55.4902893Z ======== 1 failed, 118 passed, 24 skipped, 2 rerun in 156.79s (0:02:36) ========
2025-12-04T11:29:55.4902995Z Got exit code 1
2025-12-04T11:29:55.4903104Z Retrying single test...
2025-12-04T11:29:55.4903610Z W1204 11:23:50.228000 98844 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:29:55.4904257Z Test results will be stored in test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-97e49e1b6070e822.xml
2025-12-04T11:29:55.4904438Z ============================= test session starts ==============================
2025-12-04T11:29:55.4904792Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:29:55.4904939Z cachedir: .pytest_cache
2025-12-04T11:29:55.4905473Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:29:55.4905631Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:29:55.4905743Z configfile: pytest.ini
2025-12-04T11:29:55.4906299Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:29:55.4906532Z collecting ... collected 879 items / 287 deselected / 592 selected
2025-12-04T11:29:55.4907085Z stepcurrent: skipping 142 already run items. Running only test/inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda
2025-12-04T11:29:55.4907237Z Running 1 items in this shard
2025-12-04T11:29:55.4907242Z 
2025-12-04T11:29:55.4907860Z inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [18.6471s] [100%]
2025-12-04T11:29:55.4908486Z inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [1.1840s] [100%]
2025-12-04T11:29:55.4909000Z inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda <- test/inductor/test_torchinductor.py FAILED [1.1873s] [100%]
2025-12-04T11:29:55.4909006Z 
2025-12-04T11:29:55.4909167Z ==================================== RERUNS ====================================
2025-12-04T11:29:55.4909389Z ___________________________ GPUTests.test_isinf_cuda ___________________________
2025-12-04T11:29:55.4909517Z Traceback (most recent call last):
2025-12-04T11:29:55.4909940Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T11:29:55.4910048Z     return value(self)
2025-12-04T11:29:55.4910451Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 8265, in test_isinf
2025-12-04T11:29:55.4910720Z     self.common(fn, [torch.tensor(values, dtype=dtype)], check_lowp=False)
2025-12-04T11:29:55.4911006Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T11:29:55.4911139Z     return func(*args, **kwds)
2025-12-04T11:29:55.4911574Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 692, in check_model_gpu
2025-12-04T11:29:55.4911676Z     check_model(
2025-12-04T11:29:55.4912098Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 514, in check_model
2025-12-04T11:29:55.4912235Z     actual = run(*example_inputs, **kwargs)
2025-12-04T11:29:55.4912725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:29:55.4912994Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:29:55.4913509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T11:29:55.4913720Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:29:55.4914233Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T11:29:55.4914384Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:29:55.4914931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:29:55.4915253Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:29:55.4915835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 422, in codegen_and_compile
2025-12-04T11:29:55.4916048Z     output = self._send_to_child(inputs).deserialize(constants)
2025-12-04T11:29:55.4916559Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 596, in _send_to_child
2025-12-04T11:29:55.4916676Z     return f.result()
2025-12-04T11:29:55.4917068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result
2025-12-04T11:29:55.4917199Z     return self.__get_result()
2025-12-04T11:29:55.4917627Z   File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
2025-12-04T11:29:55.4917740Z     raise self._exception
2025-12-04T11:29:55.4918135Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T11:29:55.4918140Z 
2025-12-04T11:29:55.4918239Z Name=<unknown>
2025-12-04T11:29:55.4918365Z Traceback (most recent call last):
2025-12-04T11:29:55.4918916Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T11:29:55.4919055Z     result = job()
2025-12-04T11:29:55.4919641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess
2025-12-04T11:29:55.4919822Z     result = cls._run_in_child(pickled_input, extra_env)
2025-12-04T11:29:55.4920323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child
2025-12-04T11:29:55.4920546Z     output_graph = _InProcessFxCompile().codegen_and_compile(
2025-12-04T11:29:55.4921066Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:29:55.4921193Z     _check_triton_bf16_support(graph)
2025-12-04T11:29:55.4921752Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:29:55.4921876Z     warn_and_skip(node.get_device())
2025-12-04T11:29:55.4922371Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:29:55.4922511Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:29:55.4922685Z torch._dynamo.exc.SkipFrame: BF16 is not supported
2025-12-04T11:29:55.4922693Z 
2025-12-04T11:29:55.4922698Z 
2025-12-04T11:29:55.4923428Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:29:55.4923434Z 
2025-12-04T11:29:55.4923439Z 
2025-12-04T11:29:55.4923656Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:55.4924106Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda
2025-12-04T11:29:55.4924114Z 
2025-12-04T11:29:55.4924383Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:55.4924624Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.4924784Z stats [('calls_captured', 8), ('unique_graphs', 3)]
2025-12-04T11:29:55.4925329Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)]
2025-12-04T11:29:55.4925699Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)]
2025-12-04T11:29:55.4925923Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.4926656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4926982Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4927708Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4928001Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4928724Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4929031Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4929796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4930072Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4930810Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4931132Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4931851Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4932141Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4932867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4933154Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4933374Z ___________________________ GPUTests.test_isinf_cuda ___________________________
2025-12-04T11:29:55.4933500Z Traceback (most recent call last):
2025-12-04T11:29:55.4933914Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T11:29:55.4934023Z     return value(self)
2025-12-04T11:29:55.4934439Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 8265, in test_isinf
2025-12-04T11:29:55.4934687Z     self.common(fn, [torch.tensor(values, dtype=dtype)], check_lowp=False)
2025-12-04T11:29:55.4934966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T11:29:55.4935095Z     return func(*args, **kwds)
2025-12-04T11:29:55.4935533Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 692, in check_model_gpu
2025-12-04T11:29:55.4935634Z     check_model(
2025-12-04T11:29:55.4936047Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 514, in check_model
2025-12-04T11:29:55.4936181Z     actual = run(*example_inputs, **kwargs)
2025-12-04T11:29:55.4936765Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:29:55.4937019Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:29:55.4937537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T11:29:55.4937748Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:29:55.4938264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T11:29:55.4938415Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:29:55.4938969Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:29:55.4939293Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:29:55.4939906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 422, in codegen_and_compile
2025-12-04T11:29:55.4940115Z     output = self._send_to_child(inputs).deserialize(constants)
2025-12-04T11:29:55.4940624Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 596, in _send_to_child
2025-12-04T11:29:55.4940740Z     return f.result()
2025-12-04T11:29:55.4941104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result
2025-12-04T11:29:55.4941266Z     return self.__get_result()
2025-12-04T11:29:55.4941656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
2025-12-04T11:29:55.4941802Z     raise self._exception
2025-12-04T11:29:55.4942198Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T11:29:55.4942205Z 
2025-12-04T11:29:55.4942306Z Name=<unknown>
2025-12-04T11:29:55.4942431Z Traceback (most recent call last):
2025-12-04T11:29:55.4942983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T11:29:55.4943114Z     result = job()
2025-12-04T11:29:55.4943703Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess
2025-12-04T11:29:55.4943879Z     result = cls._run_in_child(pickled_input, extra_env)
2025-12-04T11:29:55.4944384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child
2025-12-04T11:29:55.4944603Z     output_graph = _InProcessFxCompile().codegen_and_compile(
2025-12-04T11:29:55.4945125Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:29:55.4945265Z     _check_triton_bf16_support(graph)
2025-12-04T11:29:55.4945812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:29:55.4945935Z     warn_and_skip(node.get_device())
2025-12-04T11:29:55.4946434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:29:55.4946579Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:29:55.4946749Z torch._dynamo.exc.SkipFrame: BF16 is not supported
2025-12-04T11:29:55.4946755Z 
2025-12-04T11:29:55.4946774Z 
2025-12-04T11:29:55.4947488Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:29:55.4947496Z 
2025-12-04T11:29:55.4947501Z 
2025-12-04T11:29:55.4947717Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:55.4948173Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda
2025-12-04T11:29:55.4948178Z 
2025-12-04T11:29:55.4948450Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:55.4948684Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.4948846Z stats [('calls_captured', 8), ('unique_graphs', 3)]
2025-12-04T11:29:55.4949388Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)]
2025-12-04T11:29:55.4949758Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)]
2025-12-04T11:29:55.4949982Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.4950726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4951008Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4951770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4952064Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4952784Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4953107Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4953859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4954135Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4954864Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4955141Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4955910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4956181Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4956904Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4957194Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4957409Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.4957568Z stats [('calls_captured', 8), ('unique_graphs', 3)]
2025-12-04T11:29:55.4957939Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)]
2025-12-04T11:29:55.4958481Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)]
2025-12-04T11:29:55.4958713Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.4959438Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4959716Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4960451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4960727Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4961466Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4961744Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4962465Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4962754Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4963484Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4963776Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4964498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4964806Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4965544Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4965820Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4965983Z =================================== FAILURES ===================================
2025-12-04T11:29:55.4966234Z ___________________________ GPUTests.test_isinf_cuda ___________________________
2025-12-04T11:29:55.4966363Z Traceback (most recent call last):
2025-12-04T11:29:55.4966815Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T11:29:55.4966923Z     return value(self)
2025-12-04T11:29:55.4967346Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 8265, in test_isinf
2025-12-04T11:29:55.4967599Z     self.common(fn, [torch.tensor(values, dtype=dtype)], check_lowp=False)
2025-12-04T11:29:55.4967879Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T11:29:55.4968038Z     return func(*args, **kwds)
2025-12-04T11:29:55.4968470Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 692, in check_model_gpu
2025-12-04T11:29:55.4968569Z     check_model(
2025-12-04T11:29:55.4968984Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 514, in check_model
2025-12-04T11:29:55.4969120Z     actual = run(*example_inputs, **kwargs)
2025-12-04T11:29:55.4969623Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:29:55.4969874Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:29:55.4970386Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T11:29:55.4970592Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:29:55.4971313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T11:29:55.4971468Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:29:55.4972027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:29:55.4972355Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:29:55.4972907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 422, in codegen_and_compile
2025-12-04T11:29:55.4973118Z     output = self._send_to_child(inputs).deserialize(constants)
2025-12-04T11:29:55.4973626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 596, in _send_to_child
2025-12-04T11:29:55.4973752Z     return f.result()
2025-12-04T11:29:55.4974114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result
2025-12-04T11:29:55.4974247Z     return self.__get_result()
2025-12-04T11:29:55.4974638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
2025-12-04T11:29:55.4974752Z     raise self._exception
2025-12-04T11:29:55.4975151Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T11:29:55.4975159Z 
2025-12-04T11:29:55.4975261Z Name=<unknown>
2025-12-04T11:29:55.4975388Z Traceback (most recent call last):
2025-12-04T11:29:55.4975949Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T11:29:55.4976052Z     result = job()
2025-12-04T11:29:55.4976707Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess
2025-12-04T11:29:55.4976977Z     result = cls._run_in_child(pickled_input, extra_env)
2025-12-04T11:29:55.4977482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child
2025-12-04T11:29:55.4977703Z     output_graph = _InProcessFxCompile().codegen_and_compile(
2025-12-04T11:29:55.4978222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:29:55.4978404Z     _check_triton_bf16_support(graph)
2025-12-04T11:29:55.4979959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:29:55.4980100Z     warn_and_skip(node.get_device())
2025-12-04T11:29:55.4980601Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:29:55.4980747Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:29:55.4980923Z torch._dynamo.exc.SkipFrame: BF16 is not supported
2025-12-04T11:29:55.4980929Z 
2025-12-04T11:29:55.4981007Z 
2025-12-04T11:29:55.4981738Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:29:55.4981746Z 
2025-12-04T11:29:55.4981750Z 
2025-12-04T11:29:55.4981969Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:55.4982433Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda
2025-12-04T11:29:55.4982439Z 
2025-12-04T11:29:55.4982713Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:55.4982953Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.4983116Z stats [('calls_captured', 8), ('unique_graphs', 3)]
2025-12-04T11:29:55.4983664Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)]
2025-12-04T11:29:55.4984038Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)]
2025-12-04T11:29:55.4984262Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.4985014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4985298Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4986024Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4986317Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4987051Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4987345Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4988064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4988344Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4989084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4989359Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4990090Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4990397Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4991118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4991404Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4991620Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.4991812Z stats [('calls_captured', 8), ('unique_graphs', 3)]
2025-12-04T11:29:55.4992214Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)]
2025-12-04T11:29:55.4992754Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)]
2025-12-04T11:29:55.4992986Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.4993714Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4994022Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4994757Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4995034Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4995771Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4996050Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4996771Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4997062Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4997786Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.4998071Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4998796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.4999071Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.4999809Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5000086Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5000320Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.5000482Z stats [('calls_captured', 8), ('unique_graphs', 3)]
2025-12-04T11:29:55.5000838Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)]
2025-12-04T11:29:55.5001389Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)]
2025-12-04T11:29:55.5001609Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.5002352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5002630Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5003385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5003678Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5004397Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5004790Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5005542Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5005816Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5006553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5006829Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5007603Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5007876Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5008600Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5008892Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5009708Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-97e49e1b6070e822.xml -
2025-12-04T11:29:55.5009895Z =========================== short test summary info ============================
2025-12-04T11:29:55.5010660Z FAILED [1.1873s] inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T11:29:55.5010669Z 
2025-12-04T11:29:55.5010769Z Name=<unknown>
2025-12-04T11:29:55.5010910Z Traceback (most recent call last):
2025-12-04T11:29:55.5011456Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T11:29:55.5011574Z     result = job()
2025-12-04T11:29:55.5012153Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess
2025-12-04T11:29:55.5012333Z     result = cls._run_in_child(pickled_input, extra_env)
2025-12-04T11:29:55.5012852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child
2025-12-04T11:29:55.5013058Z     output_graph = _InProcessFxCompile().codegen_and_compile(
2025-12-04T11:29:55.5013584Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:29:55.5013727Z     _check_triton_bf16_support(graph)
2025-12-04T11:29:55.5014278Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:29:55.5014417Z     warn_and_skip(node.get_device())
2025-12-04T11:29:55.5014900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:29:55.5015045Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:29:55.5015231Z torch._dynamo.exc.SkipFrame: BF16 is not supported
2025-12-04T11:29:55.5015236Z 
2025-12-04T11:29:55.5015241Z 
2025-12-04T11:29:55.5015957Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:29:55.5016011Z 
2025-12-04T11:29:55.5016016Z 
2025-12-04T11:29:55.5016256Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:55.5016791Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda
2025-12-04T11:29:55.5016798Z 
2025-12-04T11:29:55.5017084Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:55.5017314Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:29:55.5017524Z ================= 1 failed, 287 deselected, 2 rerun in 21.10s ==================
2025-12-04T11:29:55.5017677Z Got exit code 1
2025-12-04T11:29:55.5017788Z Retrying single test...
2025-12-04T11:29:55.5018238Z W1204 11:24:29.449000 99294 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:29:55.5018896Z Test results will be stored in test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-aaac502093c587a7.xml
2025-12-04T11:29:55.5019096Z ============================= test session starts ==============================
2025-12-04T11:29:55.5019459Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:29:55.5019571Z cachedir: .pytest_cache
2025-12-04T11:29:55.5020092Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:29:55.5020234Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:29:55.5020345Z configfile: pytest.ini
2025-12-04T11:29:55.5020889Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:29:55.5021128Z collecting ... collected 879 items / 287 deselected / 592 selected
2025-12-04T11:29:55.5021664Z stepcurrent: skipping 142 already run items. Running only test/inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda
2025-12-04T11:29:55.5021798Z Running 1 items in this shard
2025-12-04T11:29:55.5021805Z 
2025-12-04T11:29:55.5022422Z inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [18.6246s] [100%]
2025-12-04T11:29:55.5023027Z inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [1.1992s] [100%]
2025-12-04T11:29:55.5023555Z inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda <- test/inductor/test_torchinductor.py FAILED [1.1829s] [100%]
2025-12-04T11:29:55.5023560Z 
2025-12-04T11:29:55.5023707Z ==================================== RERUNS ====================================
2025-12-04T11:29:55.5023938Z ___________________________ GPUTests.test_isinf_cuda ___________________________
2025-12-04T11:29:55.5024064Z Traceback (most recent call last):
2025-12-04T11:29:55.5024470Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T11:29:55.5024589Z     return value(self)
2025-12-04T11:29:55.5024994Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 8265, in test_isinf
2025-12-04T11:29:55.5025256Z     self.common(fn, [torch.tensor(values, dtype=dtype)], check_lowp=False)
2025-12-04T11:29:55.5025535Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T11:29:55.5025654Z     return func(*args, **kwds)
2025-12-04T11:29:55.5026098Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 692, in check_model_gpu
2025-12-04T11:29:55.5026196Z     check_model(
2025-12-04T11:29:55.5026600Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 514, in check_model
2025-12-04T11:29:55.5026747Z     actual = run(*example_inputs, **kwargs)
2025-12-04T11:29:55.5027230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:29:55.5027536Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:29:55.5028051Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T11:29:55.5028248Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:29:55.5028769Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T11:29:55.5028951Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:29:55.5029532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:29:55.5029869Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:29:55.5030401Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 422, in codegen_and_compile
2025-12-04T11:29:55.5030623Z     output = self._send_to_child(inputs).deserialize(constants)
2025-12-04T11:29:55.5031159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 596, in _send_to_child
2025-12-04T11:29:55.5031265Z     return f.result()
2025-12-04T11:29:55.5031638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result
2025-12-04T11:29:55.5031755Z     return self.__get_result()
2025-12-04T11:29:55.5032161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
2025-12-04T11:29:55.5032273Z     raise self._exception
2025-12-04T11:29:55.5032655Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T11:29:55.5032661Z 
2025-12-04T11:29:55.5032776Z Name=<unknown>
2025-12-04T11:29:55.5032903Z Traceback (most recent call last):
2025-12-04T11:29:55.5033445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T11:29:55.5033561Z     result = job()
2025-12-04T11:29:55.5034142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess
2025-12-04T11:29:55.5034331Z     result = cls._run_in_child(pickled_input, extra_env)
2025-12-04T11:29:55.5034831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child
2025-12-04T11:29:55.5035038Z     output_graph = _InProcessFxCompile().codegen_and_compile(
2025-12-04T11:29:55.5035576Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:29:55.5035703Z     _check_triton_bf16_support(graph)
2025-12-04T11:29:55.5036261Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:29:55.5036384Z     warn_and_skip(node.get_device())
2025-12-04T11:29:55.5036868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:29:55.5037024Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:29:55.5037195Z torch._dynamo.exc.SkipFrame: BF16 is not supported
2025-12-04T11:29:55.5037200Z 
2025-12-04T11:29:55.5037205Z 
2025-12-04T11:29:55.5037924Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:29:55.5037946Z 
2025-12-04T11:29:55.5037951Z 
2025-12-04T11:29:55.5038172Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:55.5038611Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda
2025-12-04T11:29:55.5038617Z 
2025-12-04T11:29:55.5038897Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:55.5039165Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.5039342Z stats [('calls_captured', 8), ('unique_graphs', 3)]
2025-12-04T11:29:55.5039885Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)]
2025-12-04T11:29:55.5040243Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)]
2025-12-04T11:29:55.5040509Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.5041289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5041584Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5042315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5042625Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5043361Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5043638Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5044378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5044657Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5045379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5045673Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5046405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5046697Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5047418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5047700Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5047936Z ___________________________ GPUTests.test_isinf_cuda ___________________________
2025-12-04T11:29:55.5048065Z Traceback (most recent call last):
2025-12-04T11:29:55.5048467Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T11:29:55.5048590Z     return value(self)
2025-12-04T11:29:55.5049002Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 8265, in test_isinf
2025-12-04T11:29:55.5049269Z     self.common(fn, [torch.tensor(values, dtype=dtype)], check_lowp=False)
2025-12-04T11:29:55.5049552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T11:29:55.5049670Z     return func(*args, **kwds)
2025-12-04T11:29:55.5050123Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 692, in check_model_gpu
2025-12-04T11:29:55.5050227Z     check_model(
2025-12-04T11:29:55.5050633Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 514, in check_model
2025-12-04T11:29:55.5050785Z     actual = run(*example_inputs, **kwargs)
2025-12-04T11:29:55.5051275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:29:55.5051542Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:29:55.5052095Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T11:29:55.5052294Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:29:55.5052824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T11:29:55.5053014Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:29:55.5053563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:29:55.5053918Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:29:55.5054455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 422, in codegen_and_compile
2025-12-04T11:29:55.5054678Z     output = self._send_to_child(inputs).deserialize(constants)
2025-12-04T11:29:55.5055194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 596, in _send_to_child
2025-12-04T11:29:55.5055333Z     return f.result()
2025-12-04T11:29:55.5055705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result
2025-12-04T11:29:55.5055825Z     return self.__get_result()
2025-12-04T11:29:55.5056228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
2025-12-04T11:29:55.5056345Z     raise self._exception
2025-12-04T11:29:55.5056816Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T11:29:55.5056823Z 
2025-12-04T11:29:55.5056936Z Name=<unknown>
2025-12-04T11:29:55.5057064Z Traceback (most recent call last):
2025-12-04T11:29:55.5057606Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T11:29:55.5057723Z     result = job()
2025-12-04T11:29:55.5058301Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess
2025-12-04T11:29:55.5058499Z     result = cls._run_in_child(pickled_input, extra_env)
2025-12-04T11:29:55.5058999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child
2025-12-04T11:29:55.5059206Z     output_graph = _InProcessFxCompile().codegen_and_compile(
2025-12-04T11:29:55.5059743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:29:55.5059871Z     _check_triton_bf16_support(graph)
2025-12-04T11:29:55.5060433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:29:55.5060553Z     warn_and_skip(node.get_device())
2025-12-04T11:29:55.5061039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:29:55.5061197Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:29:55.5061367Z torch._dynamo.exc.SkipFrame: BF16 is not supported
2025-12-04T11:29:55.5061372Z 
2025-12-04T11:29:55.5061378Z 
2025-12-04T11:29:55.5062110Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:29:55.5062119Z 
2025-12-04T11:29:55.5062124Z 
2025-12-04T11:29:55.5062343Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:55.5062782Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda
2025-12-04T11:29:55.5062787Z 
2025-12-04T11:29:55.5063068Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:55.5063292Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.5063511Z stats [('calls_captured', 8), ('unique_graphs', 3)]
2025-12-04T11:29:55.5064060Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)]
2025-12-04T11:29:55.5064416Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)]
2025-12-04T11:29:55.5064677Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.5065485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5065783Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5066510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5066792Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5067560Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5067835Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5068569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5068847Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5069565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5069853Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5070573Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5070864Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5071813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5072092Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5072327Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.5072486Z stats [('calls_captured', 8), ('unique_graphs', 3)]
2025-12-04T11:29:55.5072855Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)]
2025-12-04T11:29:55.5073397Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)]
2025-12-04T11:29:55.5073617Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.5074356Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5074632Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5075370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5075647Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5076368Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5076758Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5077482Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5077776Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5078499Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5078842Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5079627Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5079904Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5080638Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5080958Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5081105Z =================================== FAILURES ===================================
2025-12-04T11:29:55.5081338Z ___________________________ GPUTests.test_isinf_cuda ___________________________
2025-12-04T11:29:55.5081467Z Traceback (most recent call last):
2025-12-04T11:29:55.5081870Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T11:29:55.5081993Z     return value(self)
2025-12-04T11:29:55.5082396Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 8265, in test_isinf
2025-12-04T11:29:55.5082656Z     self.common(fn, [torch.tensor(values, dtype=dtype)], check_lowp=False)
2025-12-04T11:29:55.5082935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T11:29:55.5083054Z     return func(*args, **kwds)
2025-12-04T11:29:55.5083504Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 692, in check_model_gpu
2025-12-04T11:29:55.5083605Z     check_model(
2025-12-04T11:29:55.5084009Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 514, in check_model
2025-12-04T11:29:55.5084158Z     actual = run(*example_inputs, **kwargs)
2025-12-04T11:29:55.5084645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:29:55.5084908Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:29:55.5085418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T11:29:55.5085610Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:29:55.5086136Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T11:29:55.5086288Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:29:55.5086821Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:29:55.5087153Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:29:55.5087688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 422, in codegen_and_compile
2025-12-04T11:29:55.5087909Z     output = self._send_to_child(inputs).deserialize(constants)
2025-12-04T11:29:55.5088416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 596, in _send_to_child
2025-12-04T11:29:55.5088520Z     return f.result()
2025-12-04T11:29:55.5088888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result
2025-12-04T11:29:55.5089038Z     return self.__get_result()
2025-12-04T11:29:55.5089442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
2025-12-04T11:29:55.5089556Z     raise self._exception
2025-12-04T11:29:55.5089936Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T11:29:55.5089942Z 
2025-12-04T11:29:55.5090054Z Name=<unknown>
2025-12-04T11:29:55.5090209Z Traceback (most recent call last):
2025-12-04T11:29:55.5090748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T11:29:55.5090891Z     result = job()
2025-12-04T11:29:55.5091466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess
2025-12-04T11:29:55.5091655Z     result = cls._run_in_child(pickled_input, extra_env)
2025-12-04T11:29:55.5092160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child
2025-12-04T11:29:55.5092393Z     output_graph = _InProcessFxCompile().codegen_and_compile(
2025-12-04T11:29:55.5092927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:29:55.5093053Z     _check_triton_bf16_support(graph)
2025-12-04T11:29:55.5093611Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:29:55.5093734Z     warn_and_skip(node.get_device())
2025-12-04T11:29:55.5094218Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:29:55.5094371Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:29:55.5094542Z torch._dynamo.exc.SkipFrame: BF16 is not supported
2025-12-04T11:29:55.5094548Z 
2025-12-04T11:29:55.5094552Z 
2025-12-04T11:29:55.5095283Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:29:55.5095290Z 
2025-12-04T11:29:55.5095296Z 
2025-12-04T11:29:55.5095515Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:55.5095950Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda
2025-12-04T11:29:55.5095958Z 
2025-12-04T11:29:55.5096240Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:55.5096532Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.5096708Z stats [('calls_captured', 8), ('unique_graphs', 3)]
2025-12-04T11:29:55.5097250Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)]
2025-12-04T11:29:55.5097610Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)]
2025-12-04T11:29:55.5097845Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.5098578Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5098875Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5099601Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5099881Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5100621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5100940Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5101680Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5101953Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5102671Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5102995Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5103748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5104035Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5104756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5105062Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5105297Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.5105458Z stats [('calls_captured', 8), ('unique_graphs', 3)]
2025-12-04T11:29:55.5105819Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)]
2025-12-04T11:29:55.5106373Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)]
2025-12-04T11:29:55.5106589Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.5107326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5107602Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5108328Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5108617Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5109343Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5109635Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5110358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5110632Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5111366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5111641Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5112370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5112662Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5113391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5113680Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5113894Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.5114104Z stats [('calls_captured', 8), ('unique_graphs', 3)]
2025-12-04T11:29:55.5114468Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)]
2025-12-04T11:29:55.5115005Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)]
2025-12-04T11:29:55.5115274Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.5116033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5116326Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5117054Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5117334Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5118104Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5118380Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5119119Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5119398Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5120122Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5120412Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5121135Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5121427Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5122150Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5122429Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5123272Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-aaac502093c587a7.xml -
2025-12-04T11:29:55.5123447Z =========================== short test summary info ============================
2025-12-04T11:29:55.5124237Z FAILED [1.1829s] inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T11:29:55.5124243Z 
2025-12-04T11:29:55.5124347Z Name=<unknown>
2025-12-04T11:29:55.5124475Z Traceback (most recent call last):
2025-12-04T11:29:55.5125038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T11:29:55.5125143Z     result = job()
2025-12-04T11:29:55.5125736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess
2025-12-04T11:29:55.5125916Z     result = cls._run_in_child(pickled_input, extra_env)
2025-12-04T11:29:55.5126422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child
2025-12-04T11:29:55.5126646Z     output_graph = _InProcessFxCompile().codegen_and_compile(
2025-12-04T11:29:55.5127169Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:29:55.5127364Z     _check_triton_bf16_support(graph)
2025-12-04T11:29:55.5127916Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:29:55.5128042Z     warn_and_skip(node.get_device())
2025-12-04T11:29:55.5128542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:29:55.5128718Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:29:55.5128890Z torch._dynamo.exc.SkipFrame: BF16 is not supported
2025-12-04T11:29:55.5128895Z 
2025-12-04T11:29:55.5128930Z 
2025-12-04T11:29:55.5129658Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:29:55.5129664Z 
2025-12-04T11:29:55.5129668Z 
2025-12-04T11:29:55.5129889Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:55.5130347Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda
2025-12-04T11:29:55.5130384Z 
2025-12-04T11:29:55.5130657Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:55.5130854Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:29:55.5131067Z ================= 1 failed, 287 deselected, 2 rerun in 21.09s ==================
2025-12-04T11:29:55.5131167Z Got exit code 1
2025-12-04T11:29:55.5131545Z FAILED CONSISTENTLY: test/inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda
2025-12-04T11:29:55.5131954Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:29:55.5132397Z W1204 11:25:08.579000 99744 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:29:55.5133051Z Test results will be stored in test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-decce829c4432557.xml
2025-12-04T11:29:55.5133219Z ============================= test session starts ==============================
2025-12-04T11:29:55.5133587Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:29:55.5133698Z cachedir: .pytest_cache
2025-12-04T11:29:55.5134218Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:29:55.5134356Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:29:55.5134468Z configfile: pytest.ini
2025-12-04T11:29:55.5135020Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:29:55.5135248Z collecting ... collected 879 items / 143 deselected / 736 selected
2025-12-04T11:29:55.5135396Z stepcurrent: skipping 143 already run items.
2025-12-04T11:29:55.5135528Z Running 145 items in this shard
2025-12-04T11:29:55.5135534Z 
2025-12-04T11:29:55.5136095Z inductor/test_compile_subprocess.py::GPUTests::test_issue102546_cuda <- test/inductor/test_torchinductor.py PASSED [18.2799s] [  0%]
2025-12-04T11:29:55.5136748Z inductor/test_compile_subprocess.py::GPUTests::test_kernel_names_cuda <- test/inductor/test_torchinductor.py PASSED [0.5008s] [  1%]
2025-12-04T11:29:55.5137289Z inductor/test_compile_subprocess.py::GPUTests::test_l1_loss_cuda <- test/inductor/test_torchinductor.py PASSED [0.6136s] [  2%]
2025-12-04T11:29:55.5137908Z inductor/test_compile_subprocess.py::GPUTests::test_large_grid_use_block_ptr_False_cuda <- test/inductor/test_torchinductor.py PASSED [0.8998s] [  2%]
2025-12-04T11:29:55.5138513Z inductor/test_compile_subprocess.py::GPUTests::test_layer_norm_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [  3%]
2025-12-04T11:29:55.5139585Z inductor/test_compile_subprocess.py::GPUTests::test_leaky_relu_cuda <- test/inductor/test_torchinductor.py W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5140064Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5140954Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5141401Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5142255Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5142836Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5143680Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5144081Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5144930Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5145481Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5146307Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5146767Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5147568Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5148103Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5148902Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5149423Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5150231Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5150787Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5151593Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5152166Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5152985Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5153661Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5154573Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.5154733Z PASSED [1.1081s] [  4%]
2025-12-04T11:29:55.5155261Z inductor/test_compile_subprocess.py::GPUTests::test_lgamma_cuda <- test/inductor/test_torchinductor.py PASSED [1.3722s] [  4%]
2025-12-04T11:29:55.5156343Z inductor/test_compile_subprocess.py::GPUTests::test_like_rands2_cuda <- test/inductor/test_torchinductor.py W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5156804Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5157703Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5158117Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5158958Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5159549Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5160335Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5160750Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5161583Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5162137Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5162977Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5163419Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5164229Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5164749Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5165558Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5166083Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5166874Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5167488Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5168277Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5168856Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5169721Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5170352Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5171440Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.inductor_seeds.default
2025-12-04T11:29:55.5171659Z PASSED [0.6016s] [  5%]
2025-12-04T11:29:55.5172256Z inductor/test_compile_subprocess.py::GPUTests::test_linear_mixed_dtype_cuda <- test/inductor/test_torchinductor.py PASSED [0.6873s] [  6%]
2025-12-04T11:29:55.5172927Z inductor/test_compile_subprocess.py::GPUTests::test_lite_regional_compile_repeated_blocks_cuda <- test/inductor/test_torchinductor.py PASSED [0.3689s] [  6%]
2025-12-04T11:29:55.5173612Z inductor/test_compile_subprocess.py::GPUTests::test_lite_triton_kernel_wrapper_functional_cuda <- test/inductor/test_torchinductor.py PASSED [0.5808s] [  7%]
2025-12-04T11:29:55.5174118Z inductor/test_compile_subprocess.py::GPUTests::test_log2_cuda <- test/inductor/test_torchinductor.py PASSED [1.0496s] [  8%]
2025-12-04T11:29:55.5174639Z inductor/test_compile_subprocess.py::GPUTests::test_log_fp64_cuda <- test/inductor/test_torchinductor.py PASSED [0.8230s] [  8%]
2025-12-04T11:29:55.5175245Z inductor/test_compile_subprocess.py::GPUTests::test_logcumsumexp_zero_dim_cuda <- test/inductor/test_torchinductor.py PASSED [0.8425s] [  9%]
2025-12-04T11:29:55.5176456Z inductor/test_compile_subprocess.py::GPUTests::test_low_memory_max_pool_dilation_1_dim_2_cuda <- test/inductor/test_torchinductor.py W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5176938Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5177840Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5178231Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5179067Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5179642Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5180443Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5180846Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5181689Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5182310Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5183143Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5183626Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5184474Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5185004Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5185799Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5186363Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5187157Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5187726Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5188513Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5189085Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5189905Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5190524Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5191503Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims._low_memory_max_pool_with_offsets.default
2025-12-04T11:29:55.5192008Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5192463Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5193358Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5193735Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5194583Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5195156Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5195986Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5196388Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5197221Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5197813Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5198672Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5199130Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5199926Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5200487Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5201289Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5201814Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5202622Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5203175Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5203979Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5204550Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5205371Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5205996Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5206963Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims._low_memory_max_pool_with_offsets.default
2025-12-04T11:29:55.5207087Z PASSED [1.7546s] [ 10%]
2025-12-04T11:29:55.5208221Z inductor/test_compile_subprocess.py::GPUTests::test_low_memory_max_pool_dilation_2_dim_3_cuda <- test/inductor/test_torchinductor.py W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5208694Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5226636Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5227452Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5228337Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5228918Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5229797Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5230214Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5231056Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5231622Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5232494Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5232950Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5233752Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5234275Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5235084Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5235607Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5236418Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5236979Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5237782Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5238354Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5239166Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5239802Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5240762Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims._low_memory_max_pool_with_offsets.default
2025-12-04T11:29:55.5241280Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5241769Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5242666Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5243044Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5243959Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5244551Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5245339Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5245789Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5246625Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5247177Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5248008Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5248449Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5249268Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5249790Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5250602Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5251123Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5251919Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5252483Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5253284Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5253867Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5254677Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5255313Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5256305Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims._low_memory_max_pool_with_offsets.default
2025-12-04T11:29:55.5256528Z PASSED [5.8952s] [ 11%]
2025-12-04T11:29:55.5257517Z inductor/test_compile_subprocess.py::GPUTests::test_mark_dynamic_with_hint_override_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipping triton backend only since not big GPU (not enough SM)) [ 11%]
2025-12-04T11:29:55.5258182Z inductor/test_compile_subprocess.py::GPUTests::test_masked_fill_promotion_cuda <- test/inductor/test_torchinductor.py PASSED [0.9916s] [ 12%]
2025-12-04T11:29:55.5259215Z inductor/test_compile_subprocess.py::GPUTests::test_matmul_layer_norm_cuda <- test/inductor/test_torchinductor.py W1204 11:25:48.239000 99929 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:29:55.5259326Z PASSED [1.9393s] [ 13%]
2025-12-04T11:29:55.5259852Z inductor/test_compile_subprocess.py::GPUTests::test_max_min_cuda <- test/inductor/test_torchinductor.py PASSED [0.9670s] [ 13%]
2025-12-04T11:29:55.5260932Z inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d3_cuda <- test/inductor/test_torchinductor.py W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5261390Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5262294Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5262673Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5263532Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5264108Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5264900Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5265320Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5266153Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5266721Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5267546Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5268003Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5268807Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5269328Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5270166Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5270693Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5271723Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5272420Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5273223Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5273798Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5274650Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5275287Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5276251Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims._low_memory_max_pool_with_offsets.default
2025-12-04T11:29:55.5276770Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5277230Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5278127Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5278509Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5279343Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5279935Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5280723Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5281144Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5281988Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5282541Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5283377Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5283818Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5284676Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5285198Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5286010Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5286592Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5287384Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5287953Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5288769Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5289349Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5290158Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5290791Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5291757Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims._low_memory_max_pool_with_offsets.default
2025-12-04T11:29:55.5291866Z PASSED [2.2417s] [ 14%]
2025-12-04T11:29:55.5292534Z inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward2_cuda <- test/inductor/test_torchinductor.py PASSED [8.1056s] [ 15%]
2025-12-04T11:29:55.5293186Z inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward4_cuda <- test/inductor/test_torchinductor.py PASSED [14.3095s] [ 15%]
2025-12-04T11:29:55.5293840Z inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward5_cuda <- test/inductor/test_torchinductor.py PASSED [0.3123s] [ 16%]
2025-12-04T11:29:55.5294470Z inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward_cuda <- test/inductor/test_torchinductor.py PASSED [2.8654s] [ 17%]
2025-12-04T11:29:55.5294981Z inductor/test_compile_subprocess.py::GPUTests::test_mean_cuda <- test/inductor/test_torchinductor.py PASSED [0.7647s] [ 17%]
2025-12-04T11:29:55.5295565Z inductor/test_compile_subprocess.py::GPUTests::test_min_max_reduction_cuda <- test/inductor/test_torchinductor.py PASSED [0.7996s] [ 18%]
2025-12-04T11:29:55.5296177Z inductor/test_compile_subprocess.py::GPUTests::test_misaligned_address_issue1_cuda <- test/inductor/test_torchinductor.py PASSED [0.4851s] [ 19%]
2025-12-04T11:29:55.5296869Z inductor/test_compile_subprocess.py::GPUTests::test_mixed_mm2_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0034s] (Requires sm80) [ 20%]
2025-12-04T11:29:55.5297484Z inductor/test_compile_subprocess.py::GPUTests::test_mixed_mm3_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0033s] (Requires sm80) [ 20%]
2025-12-04T11:29:55.5298054Z inductor/test_compile_subprocess.py::GPUTests::test_mm_mixed_dtype_cuda <- test/inductor/test_torchinductor.py PASSED [0.1391s] [ 21%]
2025-12-04T11:29:55.5299140Z inductor/test_compile_subprocess.py::GPUTests::test_mul_index_expr_cuda <- test/inductor/test_torchinductor.py W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5299604Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5300510Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5300998Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5301847Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5302425Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5303261Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5303663Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5304504Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5305068Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5305891Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5306352Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5307152Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5307673Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5308484Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5309004Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5309817Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5310369Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5311169Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5311737Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5312580Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5313216Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5314040Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default
2025-12-04T11:29:55.5314192Z PASSED [0.3643s] [ 22%]
2025-12-04T11:29:55.5314957Z inductor/test_compile_subprocess.py::GPUTests::test_multi_gpu_device_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (requires multiple cuda devices) [ 22%]
2025-12-04T11:29:55.5315529Z inductor/test_compile_subprocess.py::GPUTests::test_multi_threading_cuda <- test/inductor/test_torchinductor.py PASSED [0.2309s] [ 23%]
2025-12-04T11:29:55.5316101Z inductor/test_compile_subprocess.py::GPUTests::test_multilayer_any_cuda <- test/inductor/test_torchinductor.py PASSED [1.1302s] [ 24%]
2025-12-04T11:29:55.5316702Z inductor/test_compile_subprocess.py::GPUTests::test_multilayer_sum_low_prec_cuda <- test/inductor/test_torchinductor.py PASSED [0.4627s] [ 24%]
2025-12-04T11:29:55.5317325Z inductor/test_compile_subprocess.py::GPUTests::test_multilayer_var_lowp_cuda <- test/inductor/test_torchinductor.py PASSED [1.2566s] [ 25%]
2025-12-04T11:29:55.5318460Z inductor/test_compile_subprocess.py::GPUTests::test_mutable_custom_op_fixed_layout2_cuda <- test/inductor/test_torchinductor.py W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5318936Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5319819Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5320202Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5321053Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5321632Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5322437Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5322839Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5323698Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5324252Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5325078Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5325537Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5326339Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5326908Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5327711Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5328245Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5329109Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5329664Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5330473Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5331079Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5331904Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5332526Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5333360Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.mylib.bar.default
2025-12-04T11:29:55.5333820Z W1204 11:26:23.555000 99744 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:29:55.5333930Z PASSED [0.5215s] [ 26%]
2025-12-04T11:29:55.5334622Z inductor/test_compile_subprocess.py::GPUTests::test_nan_sort_stable_False_descending_False_cuda <- test/inductor/test_torchinductor.py PASSED [0.7984s] [ 26%]
2025-12-04T11:29:55.5335156Z inductor/test_compile_subprocess.py::GPUTests::test_new_empty_cuda <- test/inductor/test_torchinductor.py PASSED [0.2547s] [ 27%]
2025-12-04T11:29:55.5336240Z inductor/test_compile_subprocess.py::GPUTests::test_nll_loss_backward_cuda <- test/inductor/test_torchinductor.py W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5336765Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5337663Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5338061Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5338894Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5339484Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5340281Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5340699Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5341583Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5342137Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5342971Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5343478Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5344287Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5344807Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5345639Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5346171Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5346970Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5347535Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5348326Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5348910Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5349715Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5350340Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5351183Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default
2025-12-04T11:29:55.5351693Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5352158Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5353057Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5353441Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5354291Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5354867Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5355723Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5356129Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5356961Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5357591Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5358412Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5358865Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5359690Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5360208Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5361023Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5361543Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5362356Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5362909Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5363709Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5364282Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5365092Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5365733Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5366640Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.5366762Z PASSED [0.7292s] [ 28%]
2025-12-04T11:29:55.5367827Z inductor/test_compile_subprocess.py::GPUTests::test_nll_loss_forward_cuda <- test/inductor/test_torchinductor.py W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5368292Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5369210Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5369594Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5370450Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5371301Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5372181Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5372587Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5373441Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5374037Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5374856Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5375319Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5376116Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5376716Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5377513Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5378033Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5378846Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5379393Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5380194Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5380763Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5381578Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5382204Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5383104Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.5383681Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5384138Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5385028Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5385458Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5386332Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5386907Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5387695Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5388143Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5388982Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5389549Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5390366Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5390824Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5391620Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5392140Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5392950Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5393470Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5394282Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5394834Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5395620Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5396207Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5397015Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5397684Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5398588Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.5398705Z PASSED [0.9424s] [ 28%]
2025-12-04T11:29:55.5399784Z inductor/test_compile_subprocess.py::GPUTests::test_one_hot_cuda <- test/inductor/test_torchinductor.py W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5400240Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5401135Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5401559Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5402408Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5402983Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5403777Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5404179Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5405010Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5405572Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5406390Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5406850Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5407646Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5408176Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5408972Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5409487Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5410297Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5410848Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5411690Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5412260Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5413078Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5413763Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5414588Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default
2025-12-04T11:29:55.5414707Z PASSED [0.2554s] [ 29%]
2025-12-04T11:29:55.5415815Z inductor/test_compile_subprocess.py::GPUTests::test_pattern_matcher_unbacked_cuda <- test/inductor/test_torchinductor.py W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5416310Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5417273Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5417660Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5418508Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5419086Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5419883Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5420287Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5421138Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5421689Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5422514Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5422966Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5423765Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5424301Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5425093Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5425664Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5426469Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5427017Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5427881Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5428449Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5429263Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5429912Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5430829Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.5430936Z PASSED [0.5867s] [ 30%]
2025-12-04T11:29:55.5431530Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_bessel_j0_cuda <- test/inductor/test_torchinductor.py PASSED [0.9510s] [ 31%]
2025-12-04T11:29:55.5432123Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_bessel_y0_cuda <- test/inductor/test_torchinductor.py PASSED [0.3430s] [ 31%]
2025-12-04T11:29:55.5432705Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_bessel_y1_cuda <- test/inductor/test_torchinductor.py PASSED [0.3513s] [ 32%]
2025-12-04T11:29:55.5433369Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_chebyshev_polynomial_t_cuda <- test/inductor/test_torchinductor.py PASSED [0.4259s] [ 33%]
2025-12-04T11:29:55.5434016Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_chebyshev_polynomial_w_cuda <- test/inductor/test_torchinductor.py PASSED [0.1360s] [ 33%]
2025-12-04T11:29:55.5435075Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_entr_cuda <- test/inductor/test_torchinductor.py W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5435541Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5436423Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5436814Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5437649Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5438224Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5439023Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5439421Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5440296Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5440845Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5441677Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5442181Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5442982Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5443514Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5444421Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5444951Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5445746Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5446309Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5447098Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5447670Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5448490Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5449113Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5450039Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.5450150Z PASSED [0.7919s] [ 34%]
2025-12-04T11:29:55.5450739Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_erfcx_cuda <- test/inductor/test_torchinductor.py PASSED [6.1724s] [ 35%]
2025-12-04T11:29:55.5451311Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_expm1_cuda <- test/inductor/test_torchinductor.py PASSED [0.4386s] [ 35%]
2025-12-04T11:29:55.5451902Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_gammaincc_cuda <- test/inductor/test_torchinductor.py PASSED [0.1281s] [ 36%]
2025-12-04T11:29:55.5452494Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_gammaln_cuda <- test/inductor/test_torchinductor.py PASSED [0.5323s] [ 37%]
2025-12-04T11:29:55.5453143Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_laguerre_polynomial_l_cuda <- test/inductor/test_torchinductor.py PASSED [0.2462s] [ 37%]
2025-12-04T11:29:55.5454252Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_log_ndtr_cuda <- test/inductor/test_torchinductor.py W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5454721Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5455605Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5456038Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5456997Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5457589Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5458379Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5458822Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5459652Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5460206Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5461038Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5461481Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5462292Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5462811Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5463624Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5464141Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5464937Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5465501Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5466294Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5466886Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5467706Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5468362Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5469268Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.5469389Z PASSED [0.8885s] [ 38%]
2025-12-04T11:29:55.5470515Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_logit_cuda <- test/inductor/test_torchinductor.py W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5471209Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5472114Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5472582Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5473417Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5473996Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5474798Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5475201Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5476050Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5476602Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5477426Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5477894Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5478694Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5479230Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5480027Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5480564Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5481368Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5481919Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5482770Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5483344Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5484167Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5484881Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5485802Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.5485911Z PASSED [0.5333s] [ 39%]
2025-12-04T11:29:55.5487024Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_multigammaln_cuda <- test/inductor/test_torchinductor.py W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5487533Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5488425Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5488816Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5489652Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5490246Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5491035Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5491441Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5492285Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5492833Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5493667Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5494109Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5494899Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5495431Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5496221Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5496866Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5497662Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5498230Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5499093Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5499663Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5500482Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5501135Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5501982Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default
2025-12-04T11:29:55.5502492Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5502962Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5503844Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5504222Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5505068Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5505645Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5506442Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5506841Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5507685Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5508236Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5509057Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5509518Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5510312Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5510890Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5511686Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5512203Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5513088Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5513642Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5514451Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5515048Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5515867Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5516494Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5517395Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.5517517Z PASSED [0.6839s] [ 40%]
2025-12-04T11:29:55.5518090Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_ndtri_cuda <- test/inductor/test_torchinductor.py PASSED [0.1211s] [ 40%]
2025-12-04T11:29:55.5518655Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_psi_cuda <- test/inductor/test_torchinductor.py PASSED [0.2342s] [ 41%]
2025-12-04T11:29:55.5519316Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_scaled_modified_bessel_k1_cuda <- test/inductor/test_torchinductor.py PASSED [0.5711s] [ 42%]
2025-12-04T11:29:55.5519825Z inductor/test_compile_subprocess.py::GPUTests::test_pow3_cuda <- test/inductor/test_torchinductor.py PASSED [0.2314s] [ 42%]
2025-12-04T11:29:55.5520873Z inductor/test_compile_subprocess.py::GPUTests::test_pow_symfloat_cuda <- test/inductor/test_torchinductor.py W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5521332Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5522226Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5522605Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5523454Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5524029Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5524847Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5525264Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5526095Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5526694Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5527558Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5528014Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5528815Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5529359Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5530169Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5530695Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5531497Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5532047Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5532848Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5533422Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5534231Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5534864Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5535767Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.5535890Z PASSED [0.7046s] [ 43%]
2025-12-04T11:29:55.5536480Z inductor/test_compile_subprocess.py::GPUTests::test_prod_cuda <- test/inductor/test_torchinductor.py PASSED [2.7160s] [ 44%]
2025-12-04T11:29:55.5537138Z inductor/test_compile_subprocess.py::GPUTests::test_progressive SKIPPED [0.0003s] (Skipping triton backend only since not big GPU (not enough SM)) [ 44%]
2025-12-04T11:29:55.5538261Z inductor/test_compile_subprocess.py::GPUTests::test_rand_like_deterministic_cuda <- test/inductor/test_torchinductor.py W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5538718Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5539663Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5540045Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5540895Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5541532Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5542316Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5542731Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5543591Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5544154Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5544980Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5545435Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5546231Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5546747Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5547554Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5548074Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5548884Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5549436Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5550239Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5550807Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5551617Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5552253Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5553172Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.inductor_seeds.default
2025-12-04T11:29:55.5553300Z PASSED [0.4734s] [ 45%]
2025-12-04T11:29:55.5554388Z inductor/test_compile_subprocess.py::GPUTests::test_randint_distribution_cuda <- test/inductor/test_torchinductor.py W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5554878Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5555803Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5556185Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5557032Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5557651Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5558446Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5558849Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5559680Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5560249Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5561079Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5561536Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5562344Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5562877Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5563679Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5564207Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5565012Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5565565Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5566371Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5567011Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5567841Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5568463Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5569393Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.inductor_seeds.default
2025-12-04T11:29:55.5569517Z PASSED [0.5054s] [ 46%]
2025-12-04T11:29:55.5570579Z inductor/test_compile_subprocess.py::GPUTests::test_randn_generator_cuda <- test/inductor/test_torchinductor.py W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5571321Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5572311Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5572695Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5573550Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5574131Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5574944Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5575350Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5576200Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5576841Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5577666Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5578124Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5578919Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5579453Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5580258Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5580792Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5581650Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5582207Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5583006Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5583618Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5584485Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5585115Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5585995Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.inductor_seeds.default
2025-12-04T11:29:55.5586133Z PASSED [0.5853s] [ 46%]
2025-12-04T11:29:55.5587194Z inductor/test_compile_subprocess.py::GPUTests::test_randn_like_empty_cuda <- test/inductor/test_torchinductor.py W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5587673Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5588555Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5588950Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5589789Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5590364Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5591171Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5591578Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5592438Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5592993Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5593830Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5594276Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5595075Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5595606Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5596458Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5597006Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5597839Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5598431Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5599227Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5599797Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5600649Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5601274Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5602161Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.inductor_seeds.default
2025-12-04T11:29:55.5602664Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5603134Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5604021Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5604403Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5605254Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5605829Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5606629Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5607031Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5607861Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5608430Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5609255Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5609753Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5610552Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5611087Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5611949Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5612470Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5613282Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5613883Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5614681Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5615249Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5616067Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5616762Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5617635Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.inductor_seeds.default
2025-12-04T11:29:55.5617759Z PASSED [0.2413s] [ 47%]
2025-12-04T11:29:55.5618310Z inductor/test_compile_subprocess.py::GPUTests::test_reduction2_cuda <- test/inductor/test_torchinductor.py PASSED [0.5340s] [ 48%]
2025-12-04T11:29:55.5618868Z inductor/test_compile_subprocess.py::GPUTests::test_reduction5_cuda <- test/inductor/test_torchinductor.py PASSED [0.4880s] [ 48%]
2025-12-04T11:29:55.5619407Z inductor/test_compile_subprocess.py::GPUTests::test_remainder_cuda <- test/inductor/test_torchinductor.py PASSED [0.6404s] [ 49%]
2025-12-04T11:29:55.5620072Z inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [0.4479s] [ 50%]
2025-12-04T11:29:55.5620749Z inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [0.4878s] [ 50%]
2025-12-04T11:29:55.5621320Z inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_cuda <- test/inductor/test_torchinductor.py FAILED [0.2690s] [ 50%]
2025-12-04T11:29:55.5621330Z 
2025-12-04T11:29:55.5621490Z ==================================== RERUNS ====================================
2025-12-04T11:29:55.5621734Z _____________________ GPUTests.test_remove_noop_slice_cuda _____________________
2025-12-04T11:29:55.5621860Z Traceback (most recent call last):
2025-12-04T11:29:55.5622286Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T11:29:55.5622392Z     return value(self)
2025-12-04T11:29:55.5622880Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6708, in test_remove_noop_slice
2025-12-04T11:29:55.5623010Z     self.assertExpectedInline(
2025-12-04T11:29:55.5623662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3272, in assertExpectedInline
2025-12-04T11:29:55.5624116Z     return super().assertExpectedInline(actual if isinstance(actual, str) else str(actual), expect, skip + 1)
2025-12-04T11:29:55.5624608Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 413, in assertExpectedInline
2025-12-04T11:29:55.5624773Z     assert_expected_inline(
2025-12-04T11:29:55.5625269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 378, in assert_expected_inline
2025-12-04T11:29:55.5625436Z     assert_eq(expect, actual, msg=help_text)
2025-12-04T11:29:55.5626004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 450, in assertMultiLineEqualMaybeCppStack
2025-12-04T11:29:55.5626221Z     self.assertMultiLineEqual(expect, actual, *args, **kwargs)
2025-12-04T11:29:55.5626608Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 1226, in assertMultiLineEqual
2025-12-04T11:29:55.5626821Z     self.fail(self._formatMessage(msg, standardMsg))
2025-12-04T11:29:55.5627119Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 675, in fail
2025-12-04T11:29:55.5627267Z     raise self.failureException(msg)
2025-12-04T11:29:55.5627551Z AssertionError: 'def forward(self, arg0_1: "Sym(s77)", arg[333 chars]_9,)' != ''
2025-12-04T11:29:55.5627954Z - def forward(self, arg0_1: "Sym(s77)", arg1_1: "Sym(s27)", arg2_1: "Sym(s53)", arg3_1: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0"):
2025-12-04T11:29:55.5628289Z -         add: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0" = torch.ops.aten.add.Tensor(arg3_1, 1);  arg3_1 = None
2025-12-04T11:29:55.5628591Z -         add_9: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0" = torch.ops.aten.add.Tensor(add, 1);  add = None
2025-12-04T11:29:55.5629201Z -         return (add_9,) : To accept the new output, re-run test with envvar EXPECTTEST_ACCEPT=1 (we recommend staging/committing your changes before doing this)
2025-12-04T11:29:55.5629210Z 
2025-12-04T11:29:55.5629429Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:55.5629927Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_remove_noop_slice_cuda
2025-12-04T11:29:55.5629934Z 
2025-12-04T11:29:55.5630218Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:55.5630444Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.5630571Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:29:55.5630732Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T11:29:55.5631046Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:29:55.5631589Z inductor [('triton_bundler_save_kernel', 8), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:29:55.5631815Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.5632555Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5632850Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5633575Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5633869Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5634108Z _____________________ GPUTests.test_remove_noop_slice_cuda _____________________
2025-12-04T11:29:55.5634233Z Traceback (most recent call last):
2025-12-04T11:29:55.5634645Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T11:29:55.5634794Z     return value(self)
2025-12-04T11:29:55.5635291Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6708, in test_remove_noop_slice
2025-12-04T11:29:55.5635422Z     self.assertExpectedInline(
2025-12-04T11:29:55.5636012Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3272, in assertExpectedInline
2025-12-04T11:29:55.5636497Z     return super().assertExpectedInline(actual if isinstance(actual, str) else str(actual), expect, skip + 1)
2025-12-04T11:29:55.5637024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 413, in assertExpectedInline
2025-12-04T11:29:55.5637142Z     assert_expected_inline(
2025-12-04T11:29:55.5637651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 378, in assert_expected_inline
2025-12-04T11:29:55.5637792Z     assert_eq(expect, actual, msg=help_text)
2025-12-04T11:29:55.5638361Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 450, in assertMultiLineEqualMaybeCppStack
2025-12-04T11:29:55.5638616Z     self.assertMultiLineEqual(expect, actual, *args, **kwargs)
2025-12-04T11:29:55.5639001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 1226, in assertMultiLineEqual
2025-12-04T11:29:55.5639185Z     self.fail(self._formatMessage(msg, standardMsg))
2025-12-04T11:29:55.5639480Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 675, in fail
2025-12-04T11:29:55.5639622Z     raise self.failureException(msg)
2025-12-04T11:29:55.5639908Z AssertionError: 'def forward(self, arg0_1: "Sym(s77)", arg[333 chars]_9,)' != ''
2025-12-04T11:29:55.5640303Z - def forward(self, arg0_1: "Sym(s77)", arg1_1: "Sym(s27)", arg2_1: "Sym(s53)", arg3_1: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0"):
2025-12-04T11:29:55.5640636Z -         add: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0" = torch.ops.aten.add.Tensor(arg3_1, 1);  arg3_1 = None
2025-12-04T11:29:55.5640939Z -         add_9: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0" = torch.ops.aten.add.Tensor(add, 1);  add = None
2025-12-04T11:29:55.5641530Z -         return (add_9,) : To accept the new output, re-run test with envvar EXPECTTEST_ACCEPT=1 (we recommend staging/committing your changes before doing this)
2025-12-04T11:29:55.5641548Z 
2025-12-04T11:29:55.5641767Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:55.5642271Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_remove_noop_slice_cuda
2025-12-04T11:29:55.5642276Z 
2025-12-04T11:29:55.5642558Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:55.5642781Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.5642893Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:29:55.5643063Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T11:29:55.5643377Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:29:55.5643926Z inductor [('triton_bundler_save_kernel', 8), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:29:55.5644145Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.5644880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5645176Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5645902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5646195Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5646447Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.5646563Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:29:55.5646732Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T11:29:55.5647043Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:29:55.5647588Z inductor [('triton_bundler_save_kernel', 8), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:29:55.5647836Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.5648592Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5648881Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5649610Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5649928Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5650073Z =================================== FAILURES ===================================
2025-12-04T11:29:55.5650310Z _____________________ GPUTests.test_remove_noop_slice_cuda _____________________
2025-12-04T11:29:55.5650448Z Traceback (most recent call last):
2025-12-04T11:29:55.5650850Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T11:29:55.5650953Z     return value(self)
2025-12-04T11:29:55.5651448Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6708, in test_remove_noop_slice
2025-12-04T11:29:55.5651574Z     self.assertExpectedInline(
2025-12-04T11:29:55.5652179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3272, in assertExpectedInline
2025-12-04T11:29:55.5652617Z     return super().assertExpectedInline(actual if isinstance(actual, str) else str(actual), expect, skip + 1)
2025-12-04T11:29:55.5653111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 413, in assertExpectedInline
2025-12-04T11:29:55.5653237Z     assert_expected_inline(
2025-12-04T11:29:55.5653731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 378, in assert_expected_inline
2025-12-04T11:29:55.5653873Z     assert_eq(expect, actual, msg=help_text)
2025-12-04T11:29:55.5654440Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 450, in assertMultiLineEqualMaybeCppStack
2025-12-04T11:29:55.5654660Z     self.assertMultiLineEqual(expect, actual, *args, **kwargs)
2025-12-04T11:29:55.5655057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 1226, in assertMultiLineEqual
2025-12-04T11:29:55.5655226Z     self.fail(self._formatMessage(msg, standardMsg))
2025-12-04T11:29:55.5655522Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 675, in fail
2025-12-04T11:29:55.5655671Z     raise self.failureException(msg)
2025-12-04T11:29:55.5655958Z AssertionError: 'def forward(self, arg0_1: "Sym(s77)", arg[333 chars]_9,)' != ''
2025-12-04T11:29:55.5656460Z - def forward(self, arg0_1: "Sym(s77)", arg1_1: "Sym(s27)", arg2_1: "Sym(s53)", arg3_1: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0"):
2025-12-04T11:29:55.5656788Z -         add: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0" = torch.ops.aten.add.Tensor(arg3_1, 1);  arg3_1 = None
2025-12-04T11:29:55.5657095Z -         add_9: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0" = torch.ops.aten.add.Tensor(add, 1);  add = None
2025-12-04T11:29:55.5657702Z -         return (add_9,) : To accept the new output, re-run test with envvar EXPECTTEST_ACCEPT=1 (we recommend staging/committing your changes before doing this)
2025-12-04T11:29:55.5657709Z 
2025-12-04T11:29:55.5657977Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:55.5658494Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_remove_noop_slice_cuda
2025-12-04T11:29:55.5658502Z 
2025-12-04T11:29:55.5658775Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:55.5658997Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.5659159Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:29:55.5659320Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T11:29:55.5659770Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:29:55.5660319Z inductor [('triton_bundler_save_kernel', 8), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:29:55.5660538Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.5661289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5661608Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5662332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5662627Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5662846Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.5662976Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:29:55.5663134Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T11:29:55.5663445Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:29:55.5663992Z inductor [('triton_bundler_save_kernel', 8), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:29:55.5664210Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.5664952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5665231Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5665954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5666244Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5666468Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:55.5666594Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:29:55.5666756Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T11:29:55.5667069Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:29:55.5667618Z inductor [('triton_bundler_save_kernel', 8), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:29:55.5667839Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:29:55.5668571Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema
2025-12-04T11:29:55.5668865Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5669592Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema
2025-12-04T11:29:55.5669929Z   warnings.warn(f"undefined OpHandler.{name}, please add missing op schema")
2025-12-04T11:29:55.5670750Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-decce829c4432557.xml -
2025-12-04T11:29:55.5670928Z =========================== short test summary info ============================
2025-12-04T11:29:55.5671906Z FAILED [0.2690s] inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_cuda - AssertionError: 'def forward(self, arg0_1: "Sym(s77)", arg[333 chars]_9,)' != ''
2025-12-04T11:29:55.5672442Z - def forward(self, arg0_1: "Sym(s77)", arg1_1: "Sym(s27)", arg2_1: "Sym(s53)", arg3_1: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0"):
2025-12-04T11:29:55.5672780Z -         add: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0" = torch.ops.aten.add.Tensor(arg3_1, 1);  arg3_1 = None
2025-12-04T11:29:55.5673084Z -         add_9: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0" = torch.ops.aten.add.Tensor(add, 1);  add = None
2025-12-04T11:29:55.5673677Z -         return (add_9,) : To accept the new output, re-run test with envvar EXPECTTEST_ACCEPT=1 (we recommend staging/committing your changes before doing this)
2025-12-04T11:29:55.5673751Z 
2025-12-04T11:29:55.5673972Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:55.5674469Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_remove_noop_slice_cuda
2025-12-04T11:29:55.5674477Z 
2025-12-04T11:29:55.5674766Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:55.5674951Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:29:55.5675223Z = 1 failed, 66 passed, 6 skipped, 143 deselected, 2 rerun in 98.92s (0:01:38) ==
2025-12-04T11:29:55.5675328Z Got exit code 1
2025-12-04T11:29:55.5675440Z Retrying single test...
2025-12-04T11:29:55.5675909Z W1204 11:27:03.841000 101781 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:29:55.5676559Z Test results will be stored in test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-491de48d6c983340.xml
2025-12-04T11:29:55.5676729Z ============================= test session starts ==============================
2025-12-04T11:29:55.5677096Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:29:55.5677212Z cachedir: .pytest_cache
2025-12-04T11:29:55.5677745Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:29:55.5677873Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:29:55.5677985Z configfile: pytest.ini
2025-12-04T11:29:55.5678543Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:29:55.5678773Z collecting ... collected 879 items / 287 deselected / 592 selected
2025-12-04T11:29:55.5679372Z stepcurrent: skipping 215 already run items. Running only test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_cuda
2025-12-04T11:29:55.5679509Z Running 1 items in this shard
2025-12-04T11:29:55.5679514Z 
2025-12-04T11:29:55.5680096Z inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_cuda <- test/inductor/test_torchinductor.py PASSED [18.5649s] [100%]
2025-12-04T11:29:55.5680105Z 
2025-12-04T11:29:55.5680937Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-491de48d6c983340.xml -
2025-12-04T11:29:55.5681134Z ====================== 1 passed, 287 deselected in 18.64s ======================
2025-12-04T11:29:55.5681239Z Got exit code 0
2025-12-04T11:29:55.5681501Z Test succeeded in new process, continuing with the rest of the tests
2025-12-04T11:29:55.5681950Z W1204 11:27:44.063000 102081 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:29:55.5682648Z Test results will be stored in test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-35b1cdd46f4129e6.xml
2025-12-04T11:29:55.5682822Z ============================= test session starts ==============================
2025-12-04T11:29:55.5683172Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:29:55.5683334Z cachedir: .pytest_cache
2025-12-04T11:29:55.5683855Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:29:55.5684025Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:29:55.5684139Z configfile: pytest.ini
2025-12-04T11:29:55.5684677Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:29:55.5684917Z collecting ... collected 879 items / 216 deselected / 663 selected
2025-12-04T11:29:55.5685070Z stepcurrent: skipping 216 already run items.
2025-12-04T11:29:55.5685220Z Running 72 items in this shard
2025-12-04T11:29:55.5685225Z 
2025-12-04T11:29:55.5687254Z inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_scatter_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0011s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/151378 for platform(s) linux, rocm, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [  1%]
2025-12-04T11:29:55.5687789Z inductor/test_compile_subprocess.py::GPUTests::test_repeat_cuda <- test/inductor/test_torchinductor.py PASSED [19.3486s] [  2%]
2025-12-04T11:29:55.5689004Z inductor/test_compile_subprocess.py::GPUTests::test_repeat_interleave_decomposition_has_clamp_cuda <- test/inductor/test_torchinductor.py W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5689468Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5690375Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5690761Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5691613Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5692192Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5692987Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5693407Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5694245Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5694815Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5695636Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5696131Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5697003Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5697529Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5698412Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5698940Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5699750Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5700339Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5701130Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5701722Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5702529Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5703170Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5703997Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default
2025-12-04T11:29:55.5704114Z PASSED [1.0688s] [  4%]
2025-12-04T11:29:55.5705177Z inductor/test_compile_subprocess.py::GPUTests::test_require_stride_expanded_cuda <- test/inductor/test_torchinductor.py W1204 11:28:07.992000 102266 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:29:55.5705282Z PASSED [2.3217s] [  5%]
2025-12-04T11:29:55.5705819Z inductor/test_compile_subprocess.py::GPUTests::test_resize_cuda <- test/inductor/test_torchinductor.py PASSED [7.1319s] [  6%]
2025-12-04T11:29:55.5706841Z inductor/test_compile_subprocess.py::GPUTests::test_roi_align_cuda <- test/inductor/test_torchinductor.py W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5707314Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5708202Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5708591Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5709444Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5710054Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5710858Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5711264Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5712195Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5712753Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5713576Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5714069Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5714870Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5715412Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5716213Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5716751Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5717552Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5718111Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5718920Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5719496Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5720318Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5720947Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5721841Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.torchvision.roi_align.default
2025-12-04T11:29:55.5722352Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5722809Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5723705Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5724123Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5724975Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5725553Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5726409Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5726815Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5727649Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5728247Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5729065Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5729523Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5730323Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5730849Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5731660Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5732188Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5732999Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5733553Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5734357Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5734930Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5735748Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5736485Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5737375Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.torchvision.roi_align.default
2025-12-04T11:29:55.5737497Z PASSED [0.5133s] [  8%]
2025-12-04T11:29:55.5738540Z inductor/test_compile_subprocess.py::GPUTests::test_roll_cuda <- test/inductor/test_torchinductor.py W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5739019Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5739902Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5740360Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5741210Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5741793Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5742618Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5743022Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5743878Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5744429Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5745252Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5745713Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5746519Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5747056Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5747855Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5748383Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5749195Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5749750Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5750557Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5751133Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5751987Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5752613Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5753442Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default
2025-12-04T11:29:55.5754021Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5754506Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5755406Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5755791Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5756678Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5757254Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5758044Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5758469Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5759307Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5759877Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5760699Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5761159Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5761958Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5762488Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5763307Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5763832Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5764646Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5765201Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5766053Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5766628Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5767434Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5768132Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5768957Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default
2025-12-04T11:29:55.5769078Z PASSED [1.2033s] [  9%]
2025-12-04T11:29:55.5769603Z inductor/test_compile_subprocess.py::GPUTests::test_round_cuda <- test/inductor/test_torchinductor.py PASSED [0.6584s] [ 11%]
2025-12-04T11:29:55.5770150Z inductor/test_compile_subprocess.py::GPUTests::test_rsqrt_cuda <- test/inductor/test_torchinductor.py PASSED [0.5388s] [ 12%]
2025-12-04T11:29:55.5771429Z inductor/test_compile_subprocess.py::GPUTests::test_rsqrt_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5771896Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5772798Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5773185Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5774041Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5774625Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5775420Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5775844Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5776758Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5777335Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5778163Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5778614Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5779436Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5779963Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5780862Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5781395Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5782210Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5782863Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5783654Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5784244Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5785102Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5785739Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5786652Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.5787129Z W1204 11:28:19.045000 102081 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:29:55.5787640Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5788102Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5789004Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5789395Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5790258Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5790842Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5791643Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5792053Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5792894Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5793470Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5794295Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5794788Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5795595Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5796161Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5796992Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5797515Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5798329Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5798916Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5799721Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5800295Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5801104Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5801754Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5802657Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.5802778Z PASSED [2.0946s] [ 13%]
2025-12-04T11:29:55.5803616Z inductor/test_compile_subprocess.py::GPUTests::test_scaled_dot_product_attention_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (Can't run flash attention on this platform) [ 15%]
2025-12-04T11:29:55.5804161Z inductor/test_compile_subprocess.py::GPUTests::test_scatter3_cuda <- test/inductor/test_torchinductor.py PASSED [0.6554s] [ 16%]
2025-12-04T11:29:55.5804688Z inductor/test_compile_subprocess.py::GPUTests::test_scatter4_cuda <- test/inductor/test_torchinductor.py PASSED [1.2915s] [ 18%]
2025-12-04T11:29:55.5805239Z inductor/test_compile_subprocess.py::GPUTests::test_scatter_add3_cuda <- test/inductor/test_torchinductor.py PASSED [0.9734s] [ 19%]
2025-12-04T11:29:55.5805820Z inductor/test_compile_subprocess.py::GPUTests::test_scatter_reduce3_cuda <- test/inductor/test_torchinductor.py PASSED [1.0933s] [ 20%]
2025-12-04T11:29:55.5806724Z inductor/test_compile_subprocess.py::GPUTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_True_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Does not support SDPA or pre-SM80 hardware) [ 22%]
2025-12-04T11:29:55.5807643Z inductor/test_compile_subprocess.py::GPUTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_False_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (Does not support SDPA or pre-SM80 hardware) [ 23%]
2025-12-04T11:29:55.5808795Z inductor/test_compile_subprocess.py::GPUTests::test_sdpa_unaligned_mask_freezing_cuda <- test/inductor/test_torchinductor.py W1204 11:28:25.124000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5809272Z W1204 11:28:25.124000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5810161Z W1204 11:28:25.124000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5810574Z W1204 11:28:25.124000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5811456Z W1204 11:28:25.124000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5812033Z W1204 11:28:25.124000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5812836Z W1204 11:28:25.124000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5813309Z W1204 11:28:25.124000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5814101Z W1204 11:28:25.124000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] AttributeError: Can't pickle local object 'CommonTemplate.test_sdpa_unaligned_mask_freezing.<locals>.Mod'
2025-12-04T11:29:55.5814226Z PASSED [0.2284s] [ 25%]
2025-12-04T11:29:55.5814786Z inductor/test_compile_subprocess.py::GPUTests::test_shape_padding_cuda <- test/inductor/test_torchinductor.py PASSED [2.8750s] [ 26%]
2025-12-04T11:29:55.5815328Z inductor/test_compile_subprocess.py::GPUTests::test_signbit_cuda <- test/inductor/test_torchinductor.py PASSED [0.5661s] [ 27%]
2025-12-04T11:29:55.5816339Z inductor/test_compile_subprocess.py::GPUTests::test_silu_cuda <- test/inductor/test_torchinductor.py W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5816896Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5817790Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5818177Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5819033Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5819613Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5820418Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5820823Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5821663Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5822230Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5823098Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5823562Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5824363Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5824929Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5825759Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5826285Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5827100Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5827689Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5828496Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5829069Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5829888Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5830511Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5831419Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.5831539Z PASSED [0.4053s] [ 29%]
2025-12-04T11:29:55.5832048Z inductor/test_compile_subprocess.py::GPUTests::test_sin_cuda <- test/inductor/test_torchinductor.py PASSED [0.9362s] [ 30%]
2025-12-04T11:29:55.5832585Z inductor/test_compile_subprocess.py::GPUTests::test_slice1_cuda <- test/inductor/test_torchinductor.py PASSED [0.6578s] [ 31%]
2025-12-04T11:29:55.5833098Z inductor/test_compile_subprocess.py::GPUTests::test_slice2_cuda <- test/inductor/test_torchinductor.py PASSED [0.8658s] [ 33%]
2025-12-04T11:29:55.5833667Z inductor/test_compile_subprocess.py::GPUTests::test_slice_mutation1_cuda <- test/inductor/test_torchinductor.py PASSED [0.6682s] [ 34%]
2025-12-04T11:29:55.5834235Z inductor/test_compile_subprocess.py::GPUTests::test_slice_scatter5_cuda <- test/inductor/test_torchinductor.py PASSED [0.5721s] [ 36%]
2025-12-04T11:29:55.5834737Z inductor/test_compile_subprocess.py::GPUTests::test_sort_cuda <- test/inductor/test_torchinductor.py PASSED [1.9188s] [ 37%]
2025-12-04T11:29:55.5835579Z inductor/test_compile_subprocess.py::GPUTests::test_sort_stable_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0007s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 38%]
2025-12-04T11:29:55.5836140Z inductor/test_compile_subprocess.py::GPUTests::test_sort_transpose_cuda <- test/inductor/test_torchinductor.py PASSED [27.8336s] [ 40%]
2025-12-04T11:29:55.5836728Z inductor/test_compile_subprocess.py::GPUTests::test_special_polygamma_cuda <- test/inductor/test_torchinductor.py PASSED [3.6925s] [ 41%]
2025-12-04T11:29:55.5837434Z inductor/test_compile_subprocess.py::GPUTests::test_split_cumprod_low_prec_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (Requires sm80) [ 43%]
2025-12-04T11:29:55.5838101Z inductor/test_compile_subprocess.py::GPUTests::test_split_cumsum_low_prec_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (Requires sm80) [ 44%]
2025-12-04T11:29:55.5838883Z inductor/test_compile_subprocess.py::GPUTests::test_split_reduction_with_int64_size_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.2654s] (Insufficient cuda memory) [ 45%]
2025-12-04T11:29:55.5840060Z inductor/test_compile_subprocess.py::GPUTests::test_split_with_unbacked_symints_cuda <- test/inductor/test_torchinductor.py W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5840536Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5841429Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5841858Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5842697Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5843281Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5844088Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5844498Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5845345Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5845898Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5846723Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5847179Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5847980Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5848521Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5849320Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5849864Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5850663Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5851249Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5852050Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5852622Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5853503Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5854127Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5854966Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default
2025-12-04T11:29:55.5855101Z PASSED [1.0720s] [ 47%]
2025-12-04T11:29:55.5855884Z inductor/test_compile_subprocess.py::GPUTests::test_sqrt_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0036s] (sqrt dynamic shapes only supports cpu) [ 48%]
2025-12-04T11:29:55.5856499Z inductor/test_compile_subprocess.py::GPUTests::test_squeeze1_cuda <- test/inductor/test_torchinductor.py PASSED [0.4750s] [ 50%]
2025-12-04T11:29:55.5857036Z inductor/test_compile_subprocess.py::GPUTests::test_squeeze2_cuda <- test/inductor/test_torchinductor.py PASSED [0.5063s] [ 51%]
2025-12-04T11:29:55.5857614Z inductor/test_compile_subprocess.py::GPUTests::test_squeeze_varargs_cuda <- test/inductor/test_torchinductor.py PASSED [0.7119s] [ 52%]
2025-12-04T11:29:55.5858122Z inductor/test_compile_subprocess.py::GPUTests::test_stack_cuda <- test/inductor/test_torchinductor.py PASSED [0.6632s] [ 54%]
2025-12-04T11:29:55.5859120Z inductor/test_compile_subprocess.py::GPUTests::test_std_cuda <- test/inductor/test_torchinductor.py W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5859596Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5860482Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5860884Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5861723Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5862316Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5863102Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5863506Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5864357Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5864908Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5865780Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5866230Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5867043Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5867648Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5868448Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5868988Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5869819Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5870387Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5871356Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5871930Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5872763Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5873389Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5874316Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.5874424Z PASSED [1.8510s] [ 55%]
2025-12-04T11:29:55.5875001Z inductor/test_compile_subprocess.py::GPUTests::test_strided_inputs_cuda <- test/inductor/test_torchinductor.py PASSED [0.2326s] [ 56%]
2025-12-04T11:29:55.5875510Z inductor/test_compile_subprocess.py::GPUTests::test_sum2_cuda <- test/inductor/test_torchinductor.py PASSED [2.3694s] [ 58%]
2025-12-04T11:29:55.5876016Z inductor/test_compile_subprocess.py::GPUTests::test_sum3_cuda <- test/inductor/test_torchinductor.py PASSED [0.7126s] [ 59%]
2025-12-04T11:29:55.5876535Z inductor/test_compile_subprocess.py::GPUTests::test_sum4_cuda <- test/inductor/test_torchinductor.py PASSED [1.1273s] [ 61%]
2025-12-04T11:29:55.5877040Z inductor/test_compile_subprocess.py::GPUTests::test_sum5_cuda <- test/inductor/test_torchinductor.py PASSED [1.3352s] [ 62%]
2025-12-04T11:29:55.5877595Z inductor/test_compile_subprocess.py::GPUTests::test_sum_keepdims_cuda <- test/inductor/test_torchinductor.py PASSED [0.6257s] [ 63%]
2025-12-04T11:29:55.5878100Z inductor/test_compile_subprocess.py::GPUTests::test_tanh_cuda <- test/inductor/test_torchinductor.py PASSED [0.7866s] [ 65%]
2025-12-04T11:29:55.5879273Z inductor/test_compile_subprocess.py::GPUTests::test_tmp_not_defined_issue1_use_block_ptr_True_cuda <- test/inductor/test_torchinductor.py W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5879817Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5880712Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5881109Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5882107Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5882706Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5883496Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5883944Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5884791Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5885346Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5886188Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5886635Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5887452Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5887977Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5888775Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5889317Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5890110Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5890680Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5891471Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5892061Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5892883Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5893507Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5894471Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.5894583Z PASSED [1.0038s] [ 66%]
2025-12-04T11:29:55.5895693Z inductor/test_compile_subprocess.py::GPUTests::test_tmp_not_defined_issue3_cuda <- test/inductor/test_torchinductor.py W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5896216Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5897176Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5897582Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5898464Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5899057Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5899847Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5900267Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5901104Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5901662Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5902505Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5902954Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5903771Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5904300Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5905118Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5905646Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5906448Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5907024Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5907820Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5908459Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5909272Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5909943Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5910800Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default
2025-12-04T11:29:55.5910911Z PASSED [2.5343s] [ 68%]
2025-12-04T11:29:55.5911957Z inductor/test_compile_subprocess.py::GPUTests::test_to_dtype_cuda <- test/inductor/test_torchinductor.py W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5912447Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5913347Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5913734Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5914572Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5915169Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5915959Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5916379Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5917217Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5917788Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5918618Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5919063Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5919884Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5920407Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5921219Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5921742Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5922583Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5923141Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5923930Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5924595Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5925401Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5926042Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5926981Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.5927503Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5927965Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5928858Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5929255Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5930090Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5930681Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5931472Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5931878Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5932730Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5933283Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5934124Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5934573Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5935388Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5935951Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5936816Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5937362Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5938205Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5938808Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5939602Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5940190Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5941033Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5941661Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5942586Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.5942695Z PASSED [0.4930s] [ 69%]
2025-12-04T11:29:55.5943425Z inductor/test_compile_subprocess.py::GPUTests::test_triton_argmin_argmax_transpose_logical_index_cuda <- test/inductor/test_torchinductor.py PASSED [4.4349s] [ 70%]
2025-12-04T11:29:55.5944489Z inductor/test_compile_subprocess.py::GPUTests::test_uint4x2_mixed_mm_cuda <- test/inductor/test_torchinductor.py W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5944964Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5945854Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5946238Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5947091Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5947670Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5948469Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5948879Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5949709Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5950307Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5951133Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5951590Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5952456Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5952993Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5953794Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5954350Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5955160Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5955714Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5956522Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5957093Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5957918Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5958543Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5959456Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.5959979Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5960437Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last):
2025-12-04T11:29:55.5961343Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5961727Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     ).serialize()
2025-12-04T11:29:55.5962574Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5963159Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5963943Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5964406Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     pickler.dump(obj)
2025-12-04T11:29:55.5965241Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5965806Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5966718Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5967183Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     cls(obj, pickler.options),
2025-12-04T11:29:55.5967983Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5968535Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5969340Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5969871Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5970682Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5971485Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5972281Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5972869Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5973685Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5974330Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.5975241Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.5975365Z PASSED [0.8081s] [ 72%]
2025-12-04T11:29:55.5975993Z inductor/test_compile_subprocess.py::GPUTests::test_unbacked_floordiv_simplify_cuda <- test/inductor/test_torchinductor.py PASSED [1.0403s] [ 73%]
2025-12-04T11:29:55.5976720Z inductor/test_compile_subprocess.py::GPUTests::test_unbacked_floordiv_simplify_errors_cuda <- test/inductor/test_torchinductor.py PASSED [0.0238s] [ 75%]
2025-12-04T11:29:55.5977331Z inductor/test_compile_subprocess.py::GPUTests::test_unroll_small_reduction_cuda <- test/inductor/test_torchinductor.py PASSED [2.3556s] [ 76%]
2025-12-04T11:29:55.5977918Z inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_float16_cuda <- test/inductor/test_torchinductor.py PASSED [0.7524s] [ 77%]
2025-12-04T11:29:55.5978508Z inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_int32_cuda <- test/inductor/test_torchinductor.py PASSED [1.1009s] [ 79%]
2025-12-04T11:29:55.5979183Z inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_int8_cuda <- test/inductor/test_torchinductor.py PASSED [0.5682s] [ 80%]
2025-12-04T11:29:55.5979811Z inductor/test_compile_subprocess.py::GPUTests::test_upsample_nearest2d_backward_cuda <- test/inductor/test_torchinductor.py PASSED [2.8734s] [ 81%]
2025-12-04T11:29:55.5980389Z inductor/test_compile_subprocess.py::GPUTests::test_var_correction_cuda <- test/inductor/test_torchinductor.py PASSED [1.3559s] [ 83%]
2025-12-04T11:29:55.5981026Z inductor/test_compile_subprocess.py::GPUTests::test_var_mean_div_by_cuda <- test/inductor/test_torchinductor.py PASSED [0.7491s] [ 84%]
2025-12-04T11:29:55.5981657Z inductor/test_compile_subprocess.py::GPUTests::test_var_mean_tile_reduction_False_cuda <- test/inductor/test_torchinductor.py PASSED [0.8185s] [ 86%]
2025-12-04T11:29:55.5982268Z inductor/test_compile_subprocess.py::GPUTests::test_var_mean_tile_reduction_True_cuda <- test/inductor/test_torchinductor.py PASSED [0.7944s] [ 87%]
2025-12-04T11:29:55.5982851Z inductor/test_compile_subprocess.py::GPUTests::test_vertical_fusion1_cuda <- test/inductor/test_torchinductor.py PASSED [0.8372s] [ 88%]
2025-12-04T11:29:55.5983443Z inductor/test_compile_subprocess.py::GPUTests::test_view_as_complex_cuda <- test/inductor/test_torchinductor.py PASSED [0.2878s] [ 90%]
2025-12-04T11:29:55.5983961Z inductor/test_compile_subprocess.py::GPUTests::test_views2_cuda <- test/inductor/test_torchinductor.py PASSED [2.6466s] [ 91%]
2025-12-04T11:29:55.5985038Z inductor/test_compile_subprocess.py::GPUTests::test_weight_norm_bwd_cuda <- test/inductor/test_torchinductor.py W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.5985504Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] Traceback (most recent call last):
2025-12-04T11:29:55.5986414Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.5986810Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]     ).serialize()
2025-12-04T11:29:55.5987665Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.5988265Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.5989055Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.5989492Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]     pickler.dump(obj)
2025-12-04T11:29:55.5990334Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.5990906Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.5991738Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.5992192Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]     cls(obj, pickler.options),
2025-12-04T11:29:55.5993054Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.5993589Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.5994403Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.5994967Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.5995814Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.5996385Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.5997181Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.5997804Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.5998624Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.5999270Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.6000185Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.6000709Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.6001174Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] Traceback (most recent call last):
2025-12-04T11:29:55.6002070Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.6002481Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]     ).serialize()
2025-12-04T11:29:55.6003321Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.6003919Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.6004707Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.6005137Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]     pickler.dump(obj)
2025-12-04T11:29:55.6005985Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.6006544Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.6007423Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.6007876Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]     cls(obj, pickler.options),
2025-12-04T11:29:55.6008692Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.6009283Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.6010086Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.6010629Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.6011462Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.6012035Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.6012833Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.6013430Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.6014251Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.6014882Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.6015814Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.6015926Z PASSED [1.3275s] [ 93%]
2025-12-04T11:29:55.6016648Z inductor/test_compile_subprocess.py::GPUTests::test_weight_norm_conv2d_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 94%]
2025-12-04T11:29:55.6017707Z inductor/test_compile_subprocess.py::GPUTests::test_where_broadcast_cuda <- test/inductor/test_torchinductor.py W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] Unable to pickle input graph or example inputs
2025-12-04T11:29:55.6018172Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] Traceback (most recent call last):
2025-12-04T11:29:55.6019049Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile
2025-12-04T11:29:55.6019409Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493]     ).serialize()
2025-12-04T11:29:55.6020255Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize
2025-12-04T11:29:55.6020816Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493]     return _WireProtocolPickledInput(GraphPickler.dumps(self))
2025-12-04T11:29:55.6021657Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps
2025-12-04T11:29:55.6022044Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493]     pickler.dump(obj)
2025-12-04T11:29:55.6022883Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override
2025-12-04T11:29:55.6023493Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493]     return _GraphModulePickleData.reduce_helper(self, obj)
2025-12-04T11:29:55.6024307Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper
2025-12-04T11:29:55.6024751Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493]     cls(obj, pickler.options),
2025-12-04T11:29:55.6025596Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__
2025-12-04T11:29:55.6026126Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493]     self.graph = _GraphPickleData(gm._graph, options)
2025-12-04T11:29:55.6026919Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__
2025-12-04T11:29:55.6027430Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493]     nodes[node] = _NodePickleData(node, nodes, options)
2025-12-04T11:29:55.6028237Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__
2025-12-04T11:29:55.6028784Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493]     self.target = _OpPickleData.pickle(node.target, options)
2025-12-04T11:29:55.6029580Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle
2025-12-04T11:29:55.6030145Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493]     return cls._pickle_op(name, _OpOverloadPickleData, options)
2025-12-04T11:29:55.6030968Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op
2025-12-04T11:29:55.6031585Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493]     raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}")
2025-12-04T11:29:55.6032484Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default
2025-12-04T11:29:55.6032606Z PASSED [1.3723s] [ 95%]
2025-12-04T11:29:55.6033196Z inductor/test_compile_subprocess.py::GPUTests::test_where_with_logical_op_cuda <- test/inductor/test_torchinductor.py PASSED [0.8848s] [ 97%]
2025-12-04T11:29:55.6033807Z inductor/test_compile_subprocess.py::GPUTests::test_xblock_divides_xnumel_cuda <- test/inductor/test_torchinductor.py PASSED [1.0013s] [ 98%]
2025-12-04T11:29:55.6034382Z inductor/test_compile_subprocess.py::GPUTests::test_zero_dim_reductions_cuda <- test/inductor/test_torchinductor.py PASSED [0.3938s] [100%]
2025-12-04T11:29:55.6034390Z 
2025-12-04T11:29:55.6035222Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-35b1cdd46f4129e6.xml -
2025-12-04T11:29:55.6035517Z ========== 62 passed, 10 skipped, 216 deselected in 123.52s (0:02:03) ==========
2025-12-04T11:29:55.6036166Z The following tests failed and then succeeded when run in a new process['test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_cuda']
2025-12-04T11:29:55.6036642Z The following tests failed consistently: ['test/inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda']
2025-12-04T11:29:55.6036683Z 
2025-12-04T11:29:55.6037304Z FINISHED PRINTING LOG FILE of inductor/test_compile_subprocess 3/3 (test/test-reports/inductor.test_compile_subprocess_3.3_92ce494afd455b37_.log)
2025-12-04T11:29:55.6037339Z 
2025-12-04T11:29:55.6037743Z Finished inductor/test_compile_subprocess 3/3 ... [2025-12-04 11:29:55.314123][8223.697013523], took 9.14min
2025-12-04T11:29:55.6038616Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-84a2c5e5cdda7bdd.xml
2025-12-04T11:29:55.6039526Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-97e49e1b6070e822.xml
2025-12-04T11:29:55.6040451Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-aaac502093c587a7.xml
2025-12-04T11:29:55.6041322Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-decce829c4432557.xml
2025-12-04T11:29:55.6042203Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-491de48d6c983340.xml
2025-12-04T11:29:55.6043072Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-35b1cdd46f4129e6.xml
2025-12-04T11:29:55.8960730Z Uploading logs for 57119749248 to S3
2025-12-04T11:29:56.0366644Z Uploading artifacts took 0.45 seconds
2025-12-04T11:29:56.0367082Z inductor/test_compile_subprocess 3/3 failed!
2025-12-04T11:29:56.0371230Z Running inductor/test_flex_decoding 1/1 ... [2025-12-04 11:29:56.036923][8224.419818157]
2025-12-04T11:29:56.0371819Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:29:56.0376473Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_flex_decoding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:29:56.037405]
2025-12-04T11:30:01.3805262Z 
2025-12-04T11:30:01.3806325Z inductor/test_flex_decoding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_flex_decoding_1.1_a47e1c88f2ff3c9a_.log
2025-12-04T11:30:01.3807310Z Running 0 items in this shard:
2025-12-04T11:30:01.3807529Z 
2025-12-04T11:30:01.3807941Z Finished inductor/test_flex_decoding 1/1 ... [2025-12-04 11:30:01.380338][8229.763233694], took 0.09min
2025-12-04T11:30:01.3899925Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_flex_decoding/inductor.test_flex_decoding-4523fe803428b665.xml
2025-12-04T11:30:01.4186275Z Running inductor/test_deterministic 5/8 ... [2025-12-04 11:30:01.418353][8229.801249028]
2025-12-04T11:30:01.4186883Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:30:01.4190077Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_deterministic.py', '--shard-id=5', '--num-shards=8', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:30:01.418771]
2025-12-04T11:42:56.1686884Z 
2025-12-04T11:42:56.1688140Z PRINTING LOG FILE of inductor/test_deterministic 5/8 (test/test-reports/inductor.test_deterministic_5.8_04041ff7a6ce6208_.log)
2025-12-04T11:42:56.1691756Z W1204 11:30:10.639000 105004 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:42:56.1693266Z Test results will be stored in test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-ccc55353a2e77d8f.xml
2025-12-04T11:42:56.1694168Z ============================= test session starts ==============================
2025-12-04T11:42:56.1695016Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:42:56.1695628Z cachedir: .pytest_cache
2025-12-04T11:42:56.1696535Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:42:56.1697350Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:42:56.1697716Z configfile: pytest.ini
2025-12-04T11:42:56.1698469Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:42:56.1699264Z collecting ... collected 32 items
2025-12-04T11:42:56.1699770Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T11:42:56.1702507Z Running 3 items in this shard: test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_GoogleFnet_training_or_inference_inference_precision_amp, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_GoogleFnet_training_or_inference_inference_precision_float16
2025-12-04T11:42:56.1705082Z 
2025-12-04T11:42:56.1705953Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 ('RERUN', {'yellow': True}) [70.9025s] [ 33%]
2025-12-04T11:42:56.1707848Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 ('RERUN', {'yellow': True}) [46.2098s] [ 33%]
2025-12-04T11:42:56.1709639Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 FAILED [46.7201s] [ 33%]
2025-12-04T11:42:56.1710546Z 
2025-12-04T11:42:56.1710708Z ==================================== RERUNS ====================================
2025-12-04T11:42:56.1711514Z _ DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 _
2025-12-04T11:42:56.1712310Z Traceback (most recent call last):
2025-12-04T11:42:56.1713036Z   File "/var/lib/jenkins/workspace/test/inductor/test_deterministic.py", line 166, in test_run2run_determinism
2025-12-04T11:42:56.1713760Z     self.assertTrue(
2025-12-04T11:42:56.1714272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue
2025-12-04T11:42:56.1714874Z     raise self.failureException(msg)
2025-12-04T11:42:56.1715451Z AssertionError: False is not true : stdout: cuda eval  DistillGPT2                        
2025-12-04T11:42:56.1716165Z TorchDynamo optimized model failed to run because of following error
2025-12-04T11:42:56.1716953Z fail_to_run
2025-12-04T11:42:56.1717201Z , stderr: 
2025-12-04T11:42:56.1717818Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.1719029Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.1719747Z 
2025-12-04T11:42:56.1719867Z loading model: 0it [00:03, ?it/s]
2025-12-04T11:42:56.1720574Z W1204 11:31:16.465000 105261 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:42:56.1721789Z W1204 11:31:19.116000 105261 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmp1ko3ckfr/tmplc0o_fjw
2025-12-04T11:42:56.1722608Z ERROR:common:
2025-12-04T11:42:56.1722892Z Traceback (most recent call last):
2025-12-04T11:42:56.1723528Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy
2025-12-04T11:42:56.1724183Z     new_result = self.run_n_iterations(
2025-12-04T11:42:56.1724879Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations
2025-12-04T11:42:56.1725589Z     model_iter_fn(mod, inputs, collect_outputs=False)
2025-12-04T11:42:56.1726403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:42:56.1727285Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:42:56.1728185Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T11:42:56.1729034Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:42:56.1729897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T11:42:56.1730692Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:42:56.1731510Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:42:56.1732514Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:42:56.1733495Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:42:56.1734291Z     _check_triton_bf16_support(graph)
2025-12-04T11:42:56.1735094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:42:56.1735892Z     warn_and_skip(node.get_device())
2025-12-04T11:42:56.1736722Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:42:56.1737496Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:42:56.1738021Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T11:42:56.1738404Z 
2025-12-04T11:42:56.1739117Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:42:56.1739979Z 
2025-12-04T11:42:56.1739983Z 
2025-12-04T11:42:56.1739988Z 
2025-12-04T11:42:56.1740213Z To execute this test, run the following from the base repo dir:
2025-12-04T11:42:56.1741461Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16
2025-12-04T11:42:56.1742480Z 
2025-12-04T11:42:56.1742765Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:42:56.1743415Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:42:56.1745003Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpgw838ry8/saved.pkl
2025-12-04T11:42:56.1747601Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpgw838ry8/saved.pkl
2025-12-04T11:42:56.1749497Z _ DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 _
2025-12-04T11:42:56.1750293Z Traceback (most recent call last):
2025-12-04T11:42:56.1751074Z   File "/var/lib/jenkins/workspace/test/inductor/test_deterministic.py", line 166, in test_run2run_determinism
2025-12-04T11:42:56.1751806Z     self.assertTrue(
2025-12-04T11:42:56.1752306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue
2025-12-04T11:42:56.1752901Z     raise self.failureException(msg)
2025-12-04T11:42:56.1753459Z AssertionError: False is not true : stdout: cuda eval  DistillGPT2                        
2025-12-04T11:42:56.1754231Z TorchDynamo optimized model failed to run because of following error
2025-12-04T11:42:56.1754730Z fail_to_run
2025-12-04T11:42:56.1754958Z , stderr: 
2025-12-04T11:42:56.1755621Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.1756827Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.1757530Z 
2025-12-04T11:42:56.1757666Z loading model: 0it [00:03, ?it/s]
2025-12-04T11:42:56.1758362Z W1204 11:32:02.694000 105463 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:42:56.1759586Z W1204 11:32:05.334000 105463 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmppgs9zs5u/tmp5vkay3a5
2025-12-04T11:42:56.1760402Z ERROR:common:
2025-12-04T11:42:56.1760688Z Traceback (most recent call last):
2025-12-04T11:42:56.1761309Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy
2025-12-04T11:42:56.1761982Z     new_result = self.run_n_iterations(
2025-12-04T11:42:56.1762639Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations
2025-12-04T11:42:56.1763339Z     model_iter_fn(mod, inputs, collect_outputs=False)
2025-12-04T11:42:56.1764139Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:42:56.1765021Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:42:56.1765927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T11:42:56.1766766Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:42:56.1767606Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T11:42:56.1768406Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:42:56.1769228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:42:56.1770211Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:42:56.1771673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:42:56.1772479Z     _check_triton_bf16_support(graph)
2025-12-04T11:42:56.1773287Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:42:56.1774092Z     warn_and_skip(node.get_device())
2025-12-04T11:42:56.1774832Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:42:56.1775610Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:42:56.1776126Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T11:42:56.1776608Z 
2025-12-04T11:42:56.1777324Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:42:56.1778182Z 
2025-12-04T11:42:56.1778187Z 
2025-12-04T11:42:56.1778193Z 
2025-12-04T11:42:56.1778410Z To execute this test, run the following from the base repo dir:
2025-12-04T11:42:56.1779773Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16
2025-12-04T11:42:56.1780804Z 
2025-12-04T11:42:56.1781090Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:42:56.1781711Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:42:56.1783438Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpgw838ry8/saved.pkl
2025-12-04T11:42:56.1786034Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpgw838ry8/saved.pkl
2025-12-04T11:42:56.1787644Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:42:56.1789286Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpzbb58pzn/saved.pkl
2025-12-04T11:42:56.1791870Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpzbb58pzn/saved.pkl
2025-12-04T11:42:56.1793412Z =================================== FAILURES ===================================
2025-12-04T11:42:56.1794236Z _ DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 _
2025-12-04T11:42:56.1795029Z Traceback (most recent call last):
2025-12-04T11:42:56.1795754Z   File "/var/lib/jenkins/workspace/test/inductor/test_deterministic.py", line 166, in test_run2run_determinism
2025-12-04T11:42:56.1796498Z     self.assertTrue(
2025-12-04T11:42:56.1797006Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue
2025-12-04T11:42:56.1797589Z     raise self.failureException(msg)
2025-12-04T11:42:56.1798167Z AssertionError: False is not true : stdout: cuda eval  DistillGPT2                        
2025-12-04T11:42:56.1798958Z TorchDynamo optimized model failed to run because of following error
2025-12-04T11:42:56.1799475Z fail_to_run
2025-12-04T11:42:56.1799706Z , stderr: 
2025-12-04T11:42:56.1800337Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.1801538Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.1802250Z 
2025-12-04T11:42:56.1802381Z loading model: 0it [00:03, ?it/s]
2025-12-04T11:42:56.1803080Z W1204 11:32:49.445000 105661 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:42:56.1804267Z W1204 11:32:52.081000 105661 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmp3t6l4de6/tmp4psly7t8
2025-12-04T11:42:56.1805079Z ERROR:common:
2025-12-04T11:42:56.1805362Z Traceback (most recent call last):
2025-12-04T11:42:56.1805982Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy
2025-12-04T11:42:56.1806654Z     new_result = self.run_n_iterations(
2025-12-04T11:42:56.1807312Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations
2025-12-04T11:42:56.1808015Z     model_iter_fn(mod, inputs, collect_outputs=False)
2025-12-04T11:42:56.1808813Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:42:56.1809741Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:42:56.1810651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T11:42:56.1811485Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:42:56.1812331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T11:42:56.1813173Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:42:56.1814014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:42:56.1815032Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:42:56.1816022Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:42:56.1816917Z     _check_triton_bf16_support(graph)
2025-12-04T11:42:56.1817761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:42:56.1818583Z     warn_and_skip(node.get_device())
2025-12-04T11:42:56.1819319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:42:56.1820095Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:42:56.1820613Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T11:42:56.1821012Z 
2025-12-04T11:42:56.1821738Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:42:56.1822600Z 
2025-12-04T11:42:56.1822605Z 
2025-12-04T11:42:56.1822610Z 
2025-12-04T11:42:56.1822829Z To execute this test, run the following from the base repo dir:
2025-12-04T11:42:56.1824082Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16
2025-12-04T11:42:56.1825116Z 
2025-12-04T11:42:56.1825397Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:42:56.1826022Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:42:56.1827625Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpgw838ry8/saved.pkl
2025-12-04T11:42:56.1830247Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpgw838ry8/saved.pkl
2025-12-04T11:42:56.1831848Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:42:56.1833425Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpzbb58pzn/saved.pkl
2025-12-04T11:42:56.1836030Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpzbb58pzn/saved.pkl
2025-12-04T11:42:56.1837649Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:42:56.1839276Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpvb4nn8d7/saved.pkl
2025-12-04T11:42:56.1841894Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpvb4nn8d7/saved.pkl
2025-12-04T11:42:56.1844040Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-ccc55353a2e77d8f.xml -
2025-12-04T11:42:56.1845159Z =========================== short test summary info ============================
2025-12-04T11:42:56.1846703Z FAILED [46.7201s] inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 - AssertionError: False is not true : stdout: cuda eval  DistillGPT2                        
2025-12-04T11:42:56.1848284Z TorchDynamo optimized model failed to run because of following error
2025-12-04T11:42:56.1848774Z fail_to_run
2025-12-04T11:42:56.1849021Z , stderr: 
2025-12-04T11:42:56.1849692Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.1850894Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.1851604Z 
2025-12-04T11:42:56.1851725Z loading model: 0it [00:03, ?it/s]
2025-12-04T11:42:56.1852436Z W1204 11:32:49.445000 105661 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:42:56.1853629Z W1204 11:32:52.081000 105661 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmp3t6l4de6/tmp4psly7t8
2025-12-04T11:42:56.1854446Z ERROR:common:
2025-12-04T11:42:56.1854716Z Traceback (most recent call last):
2025-12-04T11:42:56.1855362Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy
2025-12-04T11:42:56.1856030Z     new_result = self.run_n_iterations(
2025-12-04T11:42:56.1856763Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations
2025-12-04T11:42:56.1857485Z     model_iter_fn(mod, inputs, collect_outputs=False)
2025-12-04T11:42:56.1858286Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:42:56.1859173Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:42:56.1860067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T11:42:56.1860921Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:42:56.1861761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T11:42:56.1862545Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:42:56.1863373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:42:56.1864386Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:42:56.1865381Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:42:56.1866171Z     _check_triton_bf16_support(graph)
2025-12-04T11:42:56.1866981Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:42:56.1867802Z     warn_and_skip(node.get_device())
2025-12-04T11:42:56.1868537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:42:56.1869296Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:42:56.1869879Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T11:42:56.1870270Z 
2025-12-04T11:42:56.1871178Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:42:56.1872037Z 
2025-12-04T11:42:56.1872042Z 
2025-12-04T11:42:56.1872046Z 
2025-12-04T11:42:56.1872282Z To execute this test, run the following from the base repo dir:
2025-12-04T11:42:56.1873666Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16
2025-12-04T11:42:56.1874709Z 
2025-12-04T11:42:56.1874980Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:42:56.1875593Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:42:56.1876110Z ==================== 1 failed, 2 rerun in 163.86s (0:02:43) ====================
2025-12-04T11:42:56.1876529Z Got exit code 1
2025-12-04T11:42:56.1876859Z Retrying single test...
2025-12-04T11:42:56.1877508Z W1204 11:33:05.027000 105760 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:42:56.1878688Z Test results will be stored in test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-cbc1aeff512c7b0d.xml
2025-12-04T11:42:56.1879604Z ============================= test session starts ==============================
2025-12-04T11:42:56.1880276Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:42:56.1880884Z cachedir: .pytest_cache
2025-12-04T11:42:56.1881588Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:42:56.1882381Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:42:56.1882746Z configfile: pytest.ini
2025-12-04T11:42:56.1883529Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T11:42:56.1884471Z collecting ... collected 32 items / 2 deselected / 30 selected
2025-12-04T11:42:56.1885813Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16
2025-12-04T11:42:56.1887034Z Running 1 items in this shard
2025-12-04T11:42:56.1887247Z 
2025-12-04T11:42:56.1888140Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 ('RERUN', {'yellow': True}) [77.2762s] [100%]
2025-12-04T11:42:56.1890015Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 ('RERUN', {'yellow': True}) [77.9137s] [100%]
2025-12-04T11:42:56.1891811Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 FAILED [77.9025s] [100%]
2025-12-04T11:42:56.1892734Z 
2025-12-04T11:42:56.1892881Z ==================================== RERUNS ====================================
2025-12-04T11:42:56.1893686Z _ DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 _
2025-12-04T11:42:56.1894471Z Traceback (most recent call last):
2025-12-04T11:42:56.1895205Z   File "/var/lib/jenkins/workspace/test/inductor/test_deterministic.py", line 166, in test_run2run_determinism
2025-12-04T11:42:56.1895941Z     self.assertTrue(
2025-12-04T11:42:56.1896520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue
2025-12-04T11:42:56.1897115Z     raise self.failureException(msg)
2025-12-04T11:42:56.1897780Z AssertionError: False is not true : stdout: cuda eval  DistillGPT2                        
2025-12-04T11:42:56.1898511Z TorchDynamo optimized model failed to run because of following error
2025-12-04T11:42:56.1899000Z fail_to_run
2025-12-04T11:42:56.1899248Z , stderr: 
2025-12-04T11:42:56.1899891Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.1901097Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.1901880Z 
2025-12-04T11:42:56.1901998Z loading model: 0it [00:03, ?it/s]
2025-12-04T11:42:56.1902792Z [W1204 11:34:00.134044832 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1903446Z 
2025-12-04T11:42:56.1903977Z [W1204 11:34:16.418483514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1904632Z 
2025-12-04T11:42:56.1905156Z [W1204 11:34:16.422208999 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1905855Z 
2025-12-04T11:42:56.1906368Z [W1204 11:34:16.422477222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1907030Z 
2025-12-04T11:42:56.1907539Z [W1204 11:34:16.423316135 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1908195Z 
2025-12-04T11:42:56.1908705Z [W1204 11:34:16.423543629 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1909359Z 
2025-12-04T11:42:56.1909886Z [W1204 11:34:16.424286866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1910530Z 
2025-12-04T11:42:56.1911060Z [W1204 11:34:16.424487043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1911713Z 
2025-12-04T11:42:56.1912224Z [W1204 11:34:16.425526362 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1912884Z 
2025-12-04T11:42:56.1913401Z [W1204 11:34:16.425722561 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1914061Z 
2025-12-04T11:42:56.1914573Z [W1204 11:34:16.426208761 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1915231Z 
2025-12-04T11:42:56.1915748Z [W1204 11:34:16.426404658 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1916394Z 
2025-12-04T11:42:56.1916920Z [W1204 11:34:16.427075644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1917570Z 
2025-12-04T11:42:56.1918098Z [W1204 11:34:16.427260432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1918744Z 
2025-12-04T11:42:56.1919259Z [W1204 11:34:16.428194822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1919923Z 
2025-12-04T11:42:56.1920442Z [W1204 11:34:16.428378324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1921106Z 
2025-12-04T11:42:56.1921618Z [W1204 11:34:16.428854921 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1922266Z 
2025-12-04T11:42:56.1922848Z [W1204 11:34:16.429038075 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1923506Z 
2025-12-04T11:42:56.1924033Z [W1204 11:34:16.429660189 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1924686Z 
2025-12-04T11:42:56.1925200Z [W1204 11:34:16.429841597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1925904Z 
2025-12-04T11:42:56.1926449Z [W1204 11:34:16.430767030 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1927110Z 
2025-12-04T11:42:56.1927622Z [W1204 11:34:16.430951198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1928269Z 
2025-12-04T11:42:56.1928798Z [W1204 11:34:16.431392291 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1929477Z 
2025-12-04T11:42:56.1930001Z [W1204 11:34:16.431576952 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1930689Z 
2025-12-04T11:42:56.1931198Z [W1204 11:34:16.432198753 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1931863Z 
2025-12-04T11:42:56.1932377Z [W1204 11:34:16.432379672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1933037Z 
2025-12-04T11:42:56.1933555Z [W1204 11:34:16.433283657 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1934201Z 
2025-12-04T11:42:56.1934731Z [W1204 11:34:16.433466400 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1935378Z 
2025-12-04T11:42:56.1935909Z [W1204 11:34:16.433906459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1936659Z 
2025-12-04T11:42:56.1937173Z [W1204 11:34:16.434093470 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1937842Z 
2025-12-04T11:42:56.1938356Z [W1204 11:34:16.434701643 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1939021Z 
2025-12-04T11:42:56.1939536Z [W1204 11:34:16.434883296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1940188Z 
2025-12-04T11:42:56.1940714Z [W1204 11:34:16.435766411 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1941364Z 
2025-12-04T11:42:56.1941892Z [W1204 11:34:16.435950648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1942544Z 
2025-12-04T11:42:56.1943055Z [W1204 11:34:16.436394102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1943726Z 
2025-12-04T11:42:56.1944239Z [W1204 11:34:16.436584023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1944904Z 
2025-12-04T11:42:56.1945417Z [W1204 11:34:16.437205111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1946078Z 
2025-12-04T11:42:56.1946589Z [W1204 11:34:16.437386770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1947241Z 
2025-12-04T11:42:56.1947780Z W1204 11:34:17.235000 105974 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:42:56.1948961Z W1204 11:34:19.893000 105974 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmpvdzzse9j/tmp2ba4p3re
2025-12-04T11:42:56.1949777Z ERROR:common:
2025-12-04T11:42:56.1950093Z Traceback (most recent call last):
2025-12-04T11:42:56.1950733Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy
2025-12-04T11:42:56.1951382Z     new_result = self.run_n_iterations(
2025-12-04T11:42:56.1952071Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations
2025-12-04T11:42:56.1952783Z     model_iter_fn(mod, inputs, collect_outputs=False)
2025-12-04T11:42:56.1953562Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:42:56.1954442Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:42:56.1955379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T11:42:56.1956222Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:42:56.1957046Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T11:42:56.1957847Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:42:56.1958667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:42:56.1959662Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:42:56.1960634Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:42:56.1961425Z     _check_triton_bf16_support(graph)
2025-12-04T11:42:56.1962225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:42:56.1963035Z     warn_and_skip(node.get_device())
2025-12-04T11:42:56.1963763Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:42:56.1964532Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:42:56.1965058Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T11:42:56.1965443Z 
2025-12-04T11:42:56.1966157Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:42:56.1967015Z 
2025-12-04T11:42:56.1967020Z 
2025-12-04T11:42:56.1967024Z 
2025-12-04T11:42:56.1967242Z To execute this test, run the following from the base repo dir:
2025-12-04T11:42:56.1968486Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16
2025-12-04T11:42:56.1969508Z 
2025-12-04T11:42:56.1969791Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:42:56.1970431Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:42:56.1972204Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmp2kpkd29m/saved.pkl
2025-12-04T11:42:56.1974809Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmp2kpkd29m/saved.pkl
2025-12-04T11:42:56.1976872Z _ DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 _
2025-12-04T11:42:56.1977669Z Traceback (most recent call last):
2025-12-04T11:42:56.1978386Z   File "/var/lib/jenkins/workspace/test/inductor/test_deterministic.py", line 166, in test_run2run_determinism
2025-12-04T11:42:56.1979123Z     self.assertTrue(
2025-12-04T11:42:56.1979678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue
2025-12-04T11:42:56.1980275Z     raise self.failureException(msg)
2025-12-04T11:42:56.1980887Z AssertionError: False is not true : stdout: cuda eval  DistillGPT2                        
2025-12-04T11:42:56.1981619Z TorchDynamo optimized model failed to run because of following error
2025-12-04T11:42:56.1982120Z fail_to_run
2025-12-04T11:42:56.1982351Z , stderr: 
2025-12-04T11:42:56.1982986Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.1984185Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.1984938Z 
2025-12-04T11:42:56.1985068Z loading model: 0it [00:03, ?it/s]
2025-12-04T11:42:56.1985804Z [W1204 11:35:18.805346653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1986476Z 
2025-12-04T11:42:56.1986991Z [W1204 11:35:33.242724749 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1987653Z 
2025-12-04T11:42:56.1988167Z [W1204 11:35:33.246424966 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1988813Z 
2025-12-04T11:42:56.1989343Z [W1204 11:35:33.246679430 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1989997Z 
2025-12-04T11:42:56.1990528Z [W1204 11:35:33.247503571 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1991176Z 
2025-12-04T11:42:56.1991689Z [W1204 11:35:33.247721804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1992350Z 
2025-12-04T11:42:56.1992860Z [W1204 11:35:33.248461285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1993521Z 
2025-12-04T11:42:56.1994034Z [W1204 11:35:33.248674728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1994678Z 
2025-12-04T11:42:56.1995207Z [W1204 11:35:33.249712715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1995860Z 
2025-12-04T11:42:56.1996384Z [W1204 11:35:33.249910494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1997038Z 
2025-12-04T11:42:56.1997548Z [W1204 11:35:33.250409881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1998210Z 
2025-12-04T11:42:56.1998722Z [W1204 11:35:33.250610267 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.1999384Z 
2025-12-04T11:42:56.1999894Z [W1204 11:35:33.251295545 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2000540Z 
2025-12-04T11:42:56.2001062Z [W1204 11:35:33.251487298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2001708Z 
2025-12-04T11:42:56.2002305Z [W1204 11:35:33.252407313 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2002962Z 
2025-12-04T11:42:56.2003468Z [W1204 11:35:33.252599799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2004130Z 
2025-12-04T11:42:56.2004673Z [W1204 11:35:33.253053871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2005335Z 
2025-12-04T11:42:56.2005885Z [W1204 11:35:33.253233353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2006533Z 
2025-12-04T11:42:56.2007058Z [W1204 11:35:33.253857047 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2007703Z 
2025-12-04T11:42:56.2008228Z [W1204 11:35:33.254041636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2008909Z 
2025-12-04T11:42:56.2009422Z [W1204 11:35:33.254956806 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2010083Z 
2025-12-04T11:42:56.2010592Z [W1204 11:35:33.255141144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2011257Z 
2025-12-04T11:42:56.2011771Z [W1204 11:35:33.255593279 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2012432Z 
2025-12-04T11:42:56.2012949Z [W1204 11:35:33.255774354 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2013595Z 
2025-12-04T11:42:56.2014121Z [W1204 11:35:33.256390753 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2014771Z 
2025-12-04T11:42:56.2015284Z [W1204 11:35:33.256582111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2015944Z 
2025-12-04T11:42:56.2016523Z [W1204 11:35:33.257483215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2017191Z 
2025-12-04T11:42:56.2017704Z [W1204 11:35:33.257666071 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2018367Z 
2025-12-04T11:42:56.2018873Z [W1204 11:35:33.258101242 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2019523Z 
2025-12-04T11:42:56.2020049Z [W1204 11:35:33.258282531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2020696Z 
2025-12-04T11:42:56.2021221Z [W1204 11:35:33.258894598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2021870Z 
2025-12-04T11:42:56.2022383Z [W1204 11:35:33.259074825 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2023045Z 
2025-12-04T11:42:56.2023561Z [W1204 11:35:33.259971216 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2024218Z 
2025-12-04T11:42:56.2024730Z [W1204 11:35:33.260174176 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2025375Z 
2025-12-04T11:42:56.2025946Z [W1204 11:35:33.260644428 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2026604Z 
2025-12-04T11:42:56.2027129Z [W1204 11:35:33.260823852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2027774Z 
2025-12-04T11:42:56.2028284Z [W1204 11:35:33.261435926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2028980Z 
2025-12-04T11:42:56.2029492Z [W1204 11:35:33.261616600 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2030198Z 
2025-12-04T11:42:56.2030676Z W1204 11:35:35.067000 106186 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:42:56.2031861Z W1204 11:35:37.713000 106186 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmpfvtsyn16/tmpc1et893t
2025-12-04T11:42:56.2032663Z ERROR:common:
2025-12-04T11:42:56.2032951Z Traceback (most recent call last):
2025-12-04T11:42:56.2033624Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy
2025-12-04T11:42:56.2034291Z     new_result = self.run_n_iterations(
2025-12-04T11:42:56.2034934Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations
2025-12-04T11:42:56.2035657Z     model_iter_fn(mod, inputs, collect_outputs=False)
2025-12-04T11:42:56.2036459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:42:56.2037329Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:42:56.2038236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T11:42:56.2039091Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:42:56.2039933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T11:42:56.2040722Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:42:56.2041547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:42:56.2042547Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:42:56.2043543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:42:56.2044325Z     _check_triton_bf16_support(graph)
2025-12-04T11:42:56.2045129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:42:56.2045944Z     warn_and_skip(node.get_device())
2025-12-04T11:42:56.2046669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:42:56.2047440Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:42:56.2047965Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T11:42:56.2048352Z 
2025-12-04T11:42:56.2049074Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:42:56.2049916Z 
2025-12-04T11:42:56.2049921Z 
2025-12-04T11:42:56.2049926Z 
2025-12-04T11:42:56.2050153Z To execute this test, run the following from the base repo dir:
2025-12-04T11:42:56.2051392Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16
2025-12-04T11:42:56.2052421Z 
2025-12-04T11:42:56.2052691Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:42:56.2053370Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:42:56.2054966Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmp2kpkd29m/saved.pkl
2025-12-04T11:42:56.2057650Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmp2kpkd29m/saved.pkl
2025-12-04T11:42:56.2059325Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:42:56.2060908Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpaak6li6p/saved.pkl
2025-12-04T11:42:56.2063494Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpaak6li6p/saved.pkl
2025-12-04T11:42:56.2065060Z =================================== FAILURES ===================================
2025-12-04T11:42:56.2065867Z _ DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 _
2025-12-04T11:42:56.2066654Z Traceback (most recent call last):
2025-12-04T11:42:56.2067385Z   File "/var/lib/jenkins/workspace/test/inductor/test_deterministic.py", line 166, in test_run2run_determinism
2025-12-04T11:42:56.2068128Z     self.assertTrue(
2025-12-04T11:42:56.2068618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue
2025-12-04T11:42:56.2069212Z     raise self.failureException(msg)
2025-12-04T11:42:56.2069788Z AssertionError: False is not true : stdout: cuda eval  DistillGPT2                        
2025-12-04T11:42:56.2070503Z TorchDynamo optimized model failed to run because of following error
2025-12-04T11:42:56.2071192Z fail_to_run
2025-12-04T11:42:56.2071496Z , stderr: 
2025-12-04T11:42:56.2072114Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.2073324Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.2074048Z 
2025-12-04T11:42:56.2074168Z loading model: 0it [00:03, ?it/s]
2025-12-04T11:42:56.2074933Z [W1204 11:36:36.734921876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2075588Z 
2025-12-04T11:42:56.2076121Z [W1204 11:36:51.183827512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2076768Z 
2025-12-04T11:42:56.2077281Z [W1204 11:36:51.187535330 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2077942Z 
2025-12-04T11:42:56.2078449Z [W1204 11:36:51.187801493 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2079114Z 
2025-12-04T11:42:56.2079627Z [W1204 11:36:51.188644431 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2080278Z 
2025-12-04T11:42:56.2080804Z [W1204 11:36:51.188885600 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2081453Z 
2025-12-04T11:42:56.2081977Z [W1204 11:36:51.189641292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2082712Z 
2025-12-04T11:42:56.2083229Z [W1204 11:36:51.189852199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2083895Z 
2025-12-04T11:42:56.2084409Z [W1204 11:36:51.190907768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2085125Z 
2025-12-04T11:42:56.2085637Z [W1204 11:36:51.191099871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2086284Z 
2025-12-04T11:42:56.2086851Z [W1204 11:36:51.191568086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2087501Z 
2025-12-04T11:42:56.2088025Z [W1204 11:36:51.191751770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2088672Z 
2025-12-04T11:42:56.2089186Z [W1204 11:36:51.192419226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2089912Z 
2025-12-04T11:42:56.2090422Z [W1204 11:36:51.192612590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2091085Z 
2025-12-04T11:42:56.2091598Z [W1204 11:36:51.193563867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2092244Z 
2025-12-04T11:42:56.2092774Z [W1204 11:36:51.193748141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2093422Z 
2025-12-04T11:42:56.2093948Z [W1204 11:36:51.194196765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2094596Z 
2025-12-04T11:42:56.2095111Z [W1204 11:36:51.194375316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2095776Z 
2025-12-04T11:42:56.2096358Z [W1204 11:36:51.195003082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2097022Z 
2025-12-04T11:42:56.2097533Z [W1204 11:36:51.195183188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2098197Z 
2025-12-04T11:42:56.2098717Z [W1204 11:36:51.196083443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2099365Z 
2025-12-04T11:42:56.2099887Z [W1204 11:36:51.196265674 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2100533Z 
2025-12-04T11:42:56.2101061Z [W1204 11:36:51.196718880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2101708Z 
2025-12-04T11:42:56.2102220Z [W1204 11:36:51.196896697 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2102877Z 
2025-12-04T11:42:56.2103387Z [W1204 11:36:51.197511368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2104046Z 
2025-12-04T11:42:56.2104556Z [W1204 11:36:51.197694399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2105200Z 
2025-12-04T11:42:56.2105723Z [W1204 11:36:51.198589728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2106368Z 
2025-12-04T11:42:56.2106934Z [W1204 11:36:51.198769355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2107590Z 
2025-12-04T11:42:56.2108101Z [W1204 11:36:51.199209908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2108763Z 
2025-12-04T11:42:56.2109275Z [W1204 11:36:51.199391433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2109972Z 
2025-12-04T11:42:56.2110519Z [W1204 11:36:51.200031627 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2111171Z 
2025-12-04T11:42:56.2111702Z [W1204 11:36:51.200212638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2112350Z 
2025-12-04T11:42:56.2112878Z [W1204 11:36:51.201109535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2113635Z 
2025-12-04T11:42:56.2114146Z [W1204 11:36:51.201288822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2114809Z 
2025-12-04T11:42:56.2115318Z [W1204 11:36:51.201715185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2115977Z 
2025-12-04T11:42:56.2116488Z [W1204 11:36:51.201891767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2117136Z 
2025-12-04T11:42:56.2117662Z [W1204 11:36:51.202485114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2118310Z 
2025-12-04T11:42:56.2118835Z [W1204 11:36:51.202663644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2119483Z 
2025-12-04T11:42:56.2119957Z W1204 11:36:52.996000 106394 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:42:56.2121141Z W1204 11:36:55.640000 106394 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmpewfowa37/tmpai252t84
2025-12-04T11:42:56.2121954Z ERROR:common:
2025-12-04T11:42:56.2122235Z Traceback (most recent call last):
2025-12-04T11:42:56.2122860Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy
2025-12-04T11:42:56.2123529Z     new_result = self.run_n_iterations(
2025-12-04T11:42:56.2124181Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations
2025-12-04T11:42:56.2124879Z     model_iter_fn(mod, inputs, collect_outputs=False)
2025-12-04T11:42:56.2125674Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:42:56.2126555Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:42:56.2127459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T11:42:56.2128294Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:42:56.2129134Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T11:42:56.2129935Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:42:56.2130761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:42:56.2131751Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:42:56.2132744Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:42:56.2133591Z     _check_triton_bf16_support(graph)
2025-12-04T11:42:56.2134388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:42:56.2135208Z     warn_and_skip(node.get_device())
2025-12-04T11:42:56.2135939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:42:56.2136823Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:42:56.2137337Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T11:42:56.2137741Z 
2025-12-04T11:42:56.2138502Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:42:56.2139362Z 
2025-12-04T11:42:56.2139367Z 
2025-12-04T11:42:56.2139371Z 
2025-12-04T11:42:56.2139590Z To execute this test, run the following from the base repo dir:
2025-12-04T11:42:56.2140839Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16
2025-12-04T11:42:56.2141890Z 
2025-12-04T11:42:56.2142172Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:42:56.2142796Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:42:56.2144380Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmp2kpkd29m/saved.pkl
2025-12-04T11:42:56.2146978Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmp2kpkd29m/saved.pkl
2025-12-04T11:42:56.2148594Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:42:56.2150156Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpaak6li6p/saved.pkl
2025-12-04T11:42:56.2152750Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpaak6li6p/saved.pkl
2025-12-04T11:42:56.2154355Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:42:56.2155931Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpi6g9v71o/saved.pkl
2025-12-04T11:42:56.2158519Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpi6g9v71o/saved.pkl
2025-12-04T11:42:56.2160670Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-cbc1aeff512c7b0d.xml -
2025-12-04T11:42:56.2161747Z =========================== short test summary info ============================
2025-12-04T11:42:56.2163248Z FAILED [77.9025s] inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 - AssertionError: False is not true : stdout: cuda eval  DistillGPT2                        
2025-12-04T11:42:56.2164804Z TorchDynamo optimized model failed to run because of following error
2025-12-04T11:42:56.2165370Z fail_to_run
2025-12-04T11:42:56.2165604Z , stderr: 
2025-12-04T11:42:56.2166241Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.2167442Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.2168176Z 
2025-12-04T11:42:56.2168304Z loading model: 0it [00:03, ?it/s]
2025-12-04T11:42:56.2169082Z [W1204 11:36:36.734921876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2169746Z 
2025-12-04T11:42:56.2170263Z [W1204 11:36:51.183827512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2170909Z 
2025-12-04T11:42:56.2171617Z [W1204 11:36:51.187535330 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2172345Z 
2025-12-04T11:42:56.2172866Z [W1204 11:36:51.187801493 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2173510Z 
2025-12-04T11:42:56.2174017Z [W1204 11:36:51.188644431 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2174677Z 
2025-12-04T11:42:56.2175192Z [W1204 11:36:51.188885600 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2175857Z 
2025-12-04T11:42:56.2176440Z [W1204 11:36:51.189641292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2177106Z 
2025-12-04T11:42:56.2177619Z [W1204 11:36:51.189852199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2178268Z 
2025-12-04T11:42:56.2178799Z [W1204 11:36:51.190907768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2179448Z 
2025-12-04T11:42:56.2179961Z [W1204 11:36:51.191099871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2180628Z 
2025-12-04T11:42:56.2181143Z [W1204 11:36:51.191568086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2181807Z 
2025-12-04T11:42:56.2182321Z [W1204 11:36:51.191751770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2182989Z 
2025-12-04T11:42:56.2183502Z [W1204 11:36:51.192419226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2184151Z 
2025-12-04T11:42:56.2184675Z [W1204 11:36:51.192612590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2185321Z 
2025-12-04T11:42:56.2185846Z [W1204 11:36:51.193563867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2186495Z 
2025-12-04T11:42:56.2187003Z [W1204 11:36:51.193748141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2187661Z 
2025-12-04T11:42:56.2188173Z [W1204 11:36:51.194196765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2188831Z 
2025-12-04T11:42:56.2189346Z [W1204 11:36:51.194375316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2189992Z 
2025-12-04T11:42:56.2190579Z [W1204 11:36:51.195003082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2191226Z 
2025-12-04T11:42:56.2191747Z [W1204 11:36:51.195183188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2192396Z 
2025-12-04T11:42:56.2192953Z [W1204 11:36:51.196083443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2193616Z 
2025-12-04T11:42:56.2194194Z [W1204 11:36:51.196265674 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2194857Z 
2025-12-04T11:42:56.2195367Z [W1204 11:36:51.196718880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2196016Z 
2025-12-04T11:42:56.2196544Z [W1204 11:36:51.196896697 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2197222Z 
2025-12-04T11:42:56.2197746Z [W1204 11:36:51.197511368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2198395Z 
2025-12-04T11:42:56.2198909Z [W1204 11:36:51.197694399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2199573Z 
2025-12-04T11:42:56.2200088Z [W1204 11:36:51.198589728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2200745Z 
2025-12-04T11:42:56.2201259Z [W1204 11:36:51.198769355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2201903Z 
2025-12-04T11:42:56.2202431Z [W1204 11:36:51.199209908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2203081Z 
2025-12-04T11:42:56.2203604Z [W1204 11:36:51.199391433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2204252Z 
2025-12-04T11:42:56.2204760Z [W1204 11:36:51.200031627 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2205423Z 
2025-12-04T11:42:56.2205932Z [W1204 11:36:51.200212638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2206595Z 
2025-12-04T11:42:56.2207105Z [W1204 11:36:51.201109535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2207748Z 
2025-12-04T11:42:56.2208271Z [W1204 11:36:51.201288822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2208920Z 
2025-12-04T11:42:56.2209444Z [W1204 11:36:51.201715185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2210093Z 
2025-12-04T11:42:56.2210608Z [W1204 11:36:51.201891767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2211271Z 
2025-12-04T11:42:56.2211785Z [W1204 11:36:51.202485114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2212449Z 
2025-12-04T11:42:56.2212959Z [W1204 11:36:51.202663644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2213618Z 
2025-12-04T11:42:56.2214135Z W1204 11:36:52.996000 106394 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:42:56.2215325Z W1204 11:36:55.640000 106394 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmpewfowa37/tmpai252t84
2025-12-04T11:42:56.2216124Z ERROR:common:
2025-12-04T11:42:56.2216475Z Traceback (most recent call last):
2025-12-04T11:42:56.2217114Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy
2025-12-04T11:42:56.2217805Z     new_result = self.run_n_iterations(
2025-12-04T11:42:56.2218466Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations
2025-12-04T11:42:56.2219205Z     model_iter_fn(mod, inputs, collect_outputs=False)
2025-12-04T11:42:56.2220000Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:42:56.2220863Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:42:56.2221762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T11:42:56.2222637Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:42:56.2223472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T11:42:56.2224256Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:42:56.2225078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:42:56.2226080Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:42:56.2227078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:42:56.2227872Z     _check_triton_bf16_support(graph)
2025-12-04T11:42:56.2228679Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:42:56.2229504Z     warn_and_skip(node.get_device())
2025-12-04T11:42:56.2230223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:42:56.2230997Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:42:56.2231526Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T11:42:56.2231921Z 
2025-12-04T11:42:56.2232652Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:42:56.2233497Z 
2025-12-04T11:42:56.2233501Z 
2025-12-04T11:42:56.2233506Z 
2025-12-04T11:42:56.2233726Z To execute this test, run the following from the base repo dir:
2025-12-04T11:42:56.2234977Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16
2025-12-04T11:42:56.2236020Z 
2025-12-04T11:42:56.2236292Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:42:56.2236891Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:42:56.2237435Z ============= 1 failed, 2 deselected, 2 rerun in 233.12s (0:03:53) =============
2025-12-04T11:42:56.2237889Z Got exit code 1
2025-12-04T11:42:56.2238169Z Retrying single test...
2025-12-04T11:42:56.2238819Z W1204 11:37:08.548000 106498 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:42:56.2239987Z Test results will be stored in test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-b35d65d1a2e42e4e.xml
2025-12-04T11:42:56.2240894Z ============================= test session starts ==============================
2025-12-04T11:42:56.2241617Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:42:56.2242231Z cachedir: .pytest_cache
2025-12-04T11:42:56.2242936Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:42:56.2243726Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:42:56.2244088Z configfile: pytest.ini
2025-12-04T11:42:56.2244852Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T11:42:56.2245836Z collecting ... collected 32 items / 2 deselected / 30 selected
2025-12-04T11:42:56.2247200Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16
2025-12-04T11:42:56.2248421Z Running 1 items in this shard
2025-12-04T11:42:56.2248632Z 
2025-12-04T11:42:56.2249515Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 ('RERUN', {'yellow': True}) [78.6372s] [100%]
2025-12-04T11:42:56.2251428Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 ('RERUN', {'yellow': True}) [79.2654s] [100%]
2025-12-04T11:42:56.2253211Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 FAILED [78.1277s] [100%]
2025-12-04T11:42:56.2254117Z 
2025-12-04T11:42:56.2254276Z ==================================== RERUNS ====================================
2025-12-04T11:42:56.2255090Z _ DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 _
2025-12-04T11:42:56.2255865Z Traceback (most recent call last):
2025-12-04T11:42:56.2256660Z   File "/var/lib/jenkins/workspace/test/inductor/test_deterministic.py", line 166, in test_run2run_determinism
2025-12-04T11:42:56.2257399Z     self.assertTrue(
2025-12-04T11:42:56.2257896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue
2025-12-04T11:42:56.2258494Z     raise self.failureException(msg)
2025-12-04T11:42:56.2259069Z AssertionError: False is not true : stdout: cuda eval  DistillGPT2                        
2025-12-04T11:42:56.2259794Z TorchDynamo optimized model failed to run because of following error
2025-12-04T11:42:56.2260278Z fail_to_run
2025-12-04T11:42:56.2260519Z , stderr: 
2025-12-04T11:42:56.2261149Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.2262339Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.2263057Z 
2025-12-04T11:42:56.2263177Z loading model: 0it [00:03, ?it/s]
2025-12-04T11:42:56.2263932Z [W1204 11:38:04.269755840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2264582Z 
2025-12-04T11:42:56.2265106Z [W1204 11:38:20.296686305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2265759Z 
2025-12-04T11:42:56.2266282Z [W1204 11:38:20.300418821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2266933Z 
2025-12-04T11:42:56.2267448Z [W1204 11:38:20.300693788 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2268109Z 
2025-12-04T11:42:56.2268620Z [W1204 11:38:20.301508541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2269281Z 
2025-12-04T11:42:56.2269841Z [W1204 11:38:20.301733923 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2270491Z 
2025-12-04T11:42:56.2271228Z [W1204 11:38:20.302461176 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2271875Z 
2025-12-04T11:42:56.2272468Z [W1204 11:38:20.302671949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2273118Z 
2025-12-04T11:42:56.2273678Z [W1204 11:38:20.303695684 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2274342Z 
2025-12-04T11:42:56.2274853Z [W1204 11:38:20.303885420 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2275513Z 
2025-12-04T11:42:56.2276029Z [W1204 11:38:20.304348242 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2276721Z 
2025-12-04T11:42:56.2277249Z [W1204 11:38:20.304529859 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2277896Z 
2025-12-04T11:42:56.2278425Z [W1204 11:38:20.305192665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2279078Z 
2025-12-04T11:42:56.2279594Z [W1204 11:38:20.305377277 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2280253Z 
2025-12-04T11:42:56.2280766Z [W1204 11:38:20.306297294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2281422Z 
2025-12-04T11:42:56.2281939Z [W1204 11:38:20.306478871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2282588Z 
2025-12-04T11:42:56.2283116Z [W1204 11:38:20.306917597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2283764Z 
2025-12-04T11:42:56.2284289Z [W1204 11:38:20.307100173 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2284942Z 
2025-12-04T11:42:56.2285454Z [W1204 11:38:20.307723531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2286113Z 
2025-12-04T11:42:56.2286624Z [W1204 11:38:20.307906637 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2287285Z 
2025-12-04T11:42:56.2287802Z [W1204 11:38:20.308826418 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2288446Z 
2025-12-04T11:42:56.2288967Z [W1204 11:38:20.309007889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2289612Z 
2025-12-04T11:42:56.2290136Z [W1204 11:38:20.309444992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2290787Z 
2025-12-04T11:42:56.2291303Z [W1204 11:38:20.309627573 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2291961Z 
2025-12-04T11:42:56.2292478Z [W1204 11:38:20.310273616 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2293140Z 
2025-12-04T11:42:56.2293702Z [W1204 11:38:20.310452885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2294355Z 
2025-12-04T11:42:56.2294881Z [W1204 11:38:20.311350837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2295527Z 
2025-12-04T11:42:56.2296049Z [W1204 11:38:20.311534762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2296827Z 
2025-12-04T11:42:56.2297336Z [W1204 11:38:20.311967356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2298029Z 
2025-12-04T11:42:56.2298539Z [W1204 11:38:20.312145343 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2299197Z 
2025-12-04T11:42:56.2299709Z [W1204 11:38:20.312766683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2300368Z 
2025-12-04T11:42:56.2300918Z [W1204 11:38:20.312945908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2301564Z 
2025-12-04T11:42:56.2302089Z [W1204 11:38:20.313825878 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2302735Z 
2025-12-04T11:42:56.2303260Z [W1204 11:38:20.314005577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2303905Z 
2025-12-04T11:42:56.2304418Z [W1204 11:38:20.314434292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2305078Z 
2025-12-04T11:42:56.2305586Z [W1204 11:38:20.314610913 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2306250Z 
2025-12-04T11:42:56.2306764Z [W1204 11:38:20.315212887 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2307413Z 
2025-12-04T11:42:56.2307940Z [W1204 11:38:20.315392202 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2308591Z 
2025-12-04T11:42:56.2309083Z W1204 11:38:22.125000 106712 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:42:56.2310258Z W1204 11:38:24.764000 106712 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmpuhhbdw1k/tmpq_n70fjv
2025-12-04T11:42:56.2311072Z ERROR:common:
2025-12-04T11:42:56.2311352Z Traceback (most recent call last):
2025-12-04T11:42:56.2311974Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy
2025-12-04T11:42:56.2324416Z     new_result = self.run_n_iterations(
2025-12-04T11:42:56.2325135Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations
2025-12-04T11:42:56.2325859Z     model_iter_fn(mod, inputs, collect_outputs=False)
2025-12-04T11:42:56.2326673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:42:56.2327562Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:42:56.2328458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T11:42:56.2329317Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:42:56.2330161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T11:42:56.2330959Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:42:56.2331873Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:42:56.2332880Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:42:56.2333879Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:42:56.2334680Z     _check_triton_bf16_support(graph)
2025-12-04T11:42:56.2335519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:42:56.2336478Z     warn_and_skip(node.get_device())
2025-12-04T11:42:56.2337224Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:42:56.2337982Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:42:56.2338512Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T11:42:56.2338916Z 
2025-12-04T11:42:56.2339636Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:42:56.2340525Z 
2025-12-04T11:42:56.2340530Z 
2025-12-04T11:42:56.2340535Z 
2025-12-04T11:42:56.2340764Z To execute this test, run the following from the base repo dir:
2025-12-04T11:42:56.2342013Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16
2025-12-04T11:42:56.2343039Z 
2025-12-04T11:42:56.2343312Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:42:56.2343953Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:42:56.2345553Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpezjp3dnh/saved.pkl
2025-12-04T11:42:56.2348176Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpezjp3dnh/saved.pkl
2025-12-04T11:42:56.2350075Z _ DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 _
2025-12-04T11:42:56.2350852Z Traceback (most recent call last):
2025-12-04T11:42:56.2351587Z   File "/var/lib/jenkins/workspace/test/inductor/test_deterministic.py", line 166, in test_run2run_determinism
2025-12-04T11:42:56.2352336Z     self.assertTrue(
2025-12-04T11:42:56.2352832Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue
2025-12-04T11:42:56.2353434Z     raise self.failureException(msg)
2025-12-04T11:42:56.2354019Z AssertionError: False is not true : stdout: cuda eval  DistillGPT2                        
2025-12-04T11:42:56.2354755Z TorchDynamo optimized model failed to run because of following error
2025-12-04T11:42:56.2355244Z fail_to_run
2025-12-04T11:42:56.2355491Z , stderr: 
2025-12-04T11:42:56.2356130Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.2357319Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.2358039Z 
2025-12-04T11:42:56.2358159Z loading model: 0it [00:03, ?it/s]
2025-12-04T11:42:56.2358915Z [W1204 11:39:23.314064910 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2359566Z 
2025-12-04T11:42:56.2360132Z [W1204 11:39:40.565949462 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2360782Z 
2025-12-04T11:42:56.2361307Z [W1204 11:39:40.569705105 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2361954Z 
2025-12-04T11:42:56.2362471Z [W1204 11:39:40.569975724 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2363217Z 
2025-12-04T11:42:56.2363729Z [W1204 11:39:40.570857629 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2364401Z 
2025-12-04T11:42:56.2364944Z [W1204 11:39:40.571082099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2365595Z 
2025-12-04T11:42:56.2366125Z [W1204 11:39:40.571821794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2366772Z 
2025-12-04T11:42:56.2367301Z [W1204 11:39:40.572026357 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2367987Z 
2025-12-04T11:42:56.2368503Z [W1204 11:39:40.573101463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2369172Z 
2025-12-04T11:42:56.2369682Z [W1204 11:39:40.573299754 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2370342Z 
2025-12-04T11:42:56.2370860Z [W1204 11:39:40.573773047 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2371712Z 
2025-12-04T11:42:56.2372244Z [W1204 11:39:40.573972126 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2372898Z 
2025-12-04T11:42:56.2373427Z [W1204 11:39:40.574648553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2374082Z 
2025-12-04T11:42:56.2374595Z [W1204 11:39:40.574851069 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2375263Z 
2025-12-04T11:42:56.2375780Z [W1204 11:39:40.575786385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2376512Z 
2025-12-04T11:42:56.2377028Z [W1204 11:39:40.575971820 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2377673Z 
2025-12-04T11:42:56.2378206Z [W1204 11:39:40.576428810 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2378862Z 
2025-12-04T11:42:56.2379394Z [W1204 11:39:40.576624024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2380047Z 
2025-12-04T11:42:56.2380563Z [W1204 11:39:40.577277314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2381229Z 
2025-12-04T11:42:56.2381745Z [W1204 11:39:40.577459192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2382411Z 
2025-12-04T11:42:56.2382926Z [W1204 11:39:40.578371322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2383577Z 
2025-12-04T11:42:56.2384100Z [W1204 11:39:40.578555530 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2384750Z 
2025-12-04T11:42:56.2385386Z [W1204 11:39:40.579011618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2386039Z 
2025-12-04T11:42:56.2386555Z [W1204 11:39:40.579192613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2387218Z 
2025-12-04T11:42:56.2387731Z [W1204 11:39:40.579821653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2388455Z 
2025-12-04T11:42:56.2389006Z [W1204 11:39:40.580031664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2389659Z 
2025-12-04T11:42:56.2390184Z [W1204 11:39:40.580952719 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2390833Z 
2025-12-04T11:42:56.2391364Z [W1204 11:39:40.581135288 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2392054Z 
2025-12-04T11:42:56.2392566Z [W1204 11:39:40.581586691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2393227Z 
2025-12-04T11:42:56.2393741Z [W1204 11:39:40.581769034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2394405Z 
2025-12-04T11:42:56.2394921Z [W1204 11:39:40.582412826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2395586Z 
2025-12-04T11:42:56.2396099Z [W1204 11:39:40.582595397 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2396745Z 
2025-12-04T11:42:56.2397272Z [W1204 11:39:40.583484478 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2397918Z 
2025-12-04T11:42:56.2398446Z [W1204 11:39:40.583664781 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2399097Z 
2025-12-04T11:42:56.2399611Z [W1204 11:39:40.584090230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2400278Z 
2025-12-04T11:42:56.2400792Z [W1204 11:39:40.584269122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2401453Z 
2025-12-04T11:42:56.2401966Z [W1204 11:39:40.584894333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2402616Z 
2025-12-04T11:42:56.2403148Z [W1204 11:39:40.585078921 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2403797Z 
2025-12-04T11:42:56.2404295Z W1204 11:39:41.394000 106924 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:42:56.2405476Z W1204 11:39:44.038000 106924 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmp3ldklfp5/tmpcq8m_h56
2025-12-04T11:42:56.2406294Z ERROR:common:
2025-12-04T11:42:56.2406579Z Traceback (most recent call last):
2025-12-04T11:42:56.2407204Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy
2025-12-04T11:42:56.2407870Z     new_result = self.run_n_iterations(
2025-12-04T11:42:56.2408533Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations
2025-12-04T11:42:56.2409250Z     model_iter_fn(mod, inputs, collect_outputs=False)
2025-12-04T11:42:56.2410071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:42:56.2410948Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:42:56.2411854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T11:42:56.2412684Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:42:56.2413523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T11:42:56.2414357Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:42:56.2415204Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:42:56.2416194Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:42:56.2417270Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:42:56.2418070Z     _check_triton_bf16_support(graph)
2025-12-04T11:42:56.2418913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:42:56.2419721Z     warn_and_skip(node.get_device())
2025-12-04T11:42:56.2420472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:42:56.2421239Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:42:56.2421771Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T11:42:56.2422172Z 
2025-12-04T11:42:56.2422890Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:42:56.2423733Z 
2025-12-04T11:42:56.2423738Z 
2025-12-04T11:42:56.2423743Z 
2025-12-04T11:42:56.2423982Z To execute this test, run the following from the base repo dir:
2025-12-04T11:42:56.2425233Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16
2025-12-04T11:42:56.2426263Z 
2025-12-04T11:42:56.2426533Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:42:56.2427173Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:42:56.2428779Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpezjp3dnh/saved.pkl
2025-12-04T11:42:56.2431404Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpezjp3dnh/saved.pkl
2025-12-04T11:42:56.2433020Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:42:56.2434581Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpr80swaia/saved.pkl
2025-12-04T11:42:56.2437165Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpr80swaia/saved.pkl
2025-12-04T11:42:56.2438695Z =================================== FAILURES ===================================
2025-12-04T11:42:56.2439506Z _ DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 _
2025-12-04T11:42:56.2440287Z Traceback (most recent call last):
2025-12-04T11:42:56.2441064Z   File "/var/lib/jenkins/workspace/test/inductor/test_deterministic.py", line 166, in test_run2run_determinism
2025-12-04T11:42:56.2441808Z     self.assertTrue(
2025-12-04T11:42:56.2442316Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue
2025-12-04T11:42:56.2442895Z     raise self.failureException(msg)
2025-12-04T11:42:56.2443468Z AssertionError: False is not true : stdout: cuda eval  DistillGPT2                        
2025-12-04T11:42:56.2444228Z TorchDynamo optimized model failed to run because of following error
2025-12-04T11:42:56.2444713Z fail_to_run
2025-12-04T11:42:56.2444983Z , stderr: 
2025-12-04T11:42:56.2445618Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.2446815Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.2447519Z 
2025-12-04T11:42:56.2447638Z loading model: 0it [00:03, ?it/s]
2025-12-04T11:42:56.2448427Z [W1204 11:40:42.245142406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2449079Z 
2025-12-04T11:42:56.2449609Z [W1204 11:40:58.664357169 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2450259Z 
2025-12-04T11:42:56.2450782Z [W1204 11:40:58.668049808 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2451439Z 
2025-12-04T11:42:56.2451954Z [W1204 11:40:58.668307211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2452613Z 
2025-12-04T11:42:56.2453124Z [W1204 11:40:58.669145931 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2453785Z 
2025-12-04T11:42:56.2454299Z [W1204 11:40:58.669359432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2454953Z 
2025-12-04T11:42:56.2455481Z [W1204 11:40:58.670139203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2456131Z 
2025-12-04T11:42:56.2456742Z [W1204 11:40:58.670333024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2457392Z 
2025-12-04T11:42:56.2457907Z [W1204 11:40:58.671387134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2458565Z 
2025-12-04T11:42:56.2459075Z [W1204 11:40:58.671584322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2459731Z 
2025-12-04T11:42:56.2460244Z [W1204 11:40:58.672047813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2460906Z 
2025-12-04T11:42:56.2461420Z [W1204 11:40:58.672230908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2462067Z 
2025-12-04T11:42:56.2462595Z [W1204 11:40:58.672907023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2463245Z 
2025-12-04T11:42:56.2463777Z [W1204 11:40:58.673093850 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2464427Z 
2025-12-04T11:42:56.2464939Z [W1204 11:40:58.674023154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2465601Z 
2025-12-04T11:42:56.2466190Z [W1204 11:40:58.674207499 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2466854Z 
2025-12-04T11:42:56.2467371Z [W1204 11:40:58.674664175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2468020Z 
2025-12-04T11:42:56.2468547Z [W1204 11:40:58.674845109 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2469231Z 
2025-12-04T11:42:56.2469791Z [W1204 11:40:58.675475067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2470444Z 
2025-12-04T11:42:56.2471128Z [W1204 11:40:58.675657274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2471792Z 
2025-12-04T11:42:56.2472309Z [W1204 11:40:58.676592500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2473049Z 
2025-12-04T11:42:56.2473565Z [W1204 11:40:58.676774803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2474217Z 
2025-12-04T11:42:56.2474746Z [W1204 11:40:58.677222049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2475399Z 
2025-12-04T11:42:56.2475931Z [W1204 11:40:58.677400417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2476582Z 
2025-12-04T11:42:56.2477095Z [W1204 11:40:58.678021426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2477761Z 
2025-12-04T11:42:56.2478275Z [W1204 11:40:58.678200770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2478937Z 
2025-12-04T11:42:56.2479449Z [W1204 11:40:58.679078870 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2480099Z 
2025-12-04T11:42:56.2480630Z [W1204 11:40:58.679258360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2481281Z 
2025-12-04T11:42:56.2481809Z [W1204 11:40:58.679700447 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2482459Z 
2025-12-04T11:42:56.2482971Z [W1204 11:40:58.679878086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2483632Z 
2025-12-04T11:42:56.2484147Z [W1204 11:40:58.680514917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2484810Z 
2025-12-04T11:42:56.2485319Z [W1204 11:40:58.680704722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2485970Z 
2025-12-04T11:42:56.2486495Z [W1204 11:40:58.681592525 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2487150Z 
2025-12-04T11:42:56.2487676Z [W1204 11:40:58.681775925 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2488326Z 
2025-12-04T11:42:56.2488837Z [W1204 11:40:58.682209951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2489496Z 
2025-12-04T11:42:56.2490073Z [W1204 11:40:58.682388769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2490738Z 
2025-12-04T11:42:56.2491258Z [W1204 11:40:58.683003251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2491905Z 
2025-12-04T11:42:56.2492428Z [W1204 11:40:58.683184131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2493125Z 
2025-12-04T11:42:56.2493614Z W1204 11:40:59.501000 107132 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:42:56.2494834Z W1204 11:41:02.162000 107132 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmpdoi89lc7/tmptfkayx7_
2025-12-04T11:42:56.2495651Z ERROR:common:
2025-12-04T11:42:56.2495929Z Traceback (most recent call last):
2025-12-04T11:42:56.2496636Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy
2025-12-04T11:42:56.2497286Z     new_result = self.run_n_iterations(
2025-12-04T11:42:56.2498062Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations
2025-12-04T11:42:56.2498772Z     model_iter_fn(mod, inputs, collect_outputs=False)
2025-12-04T11:42:56.2499556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:42:56.2500445Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:42:56.2501347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T11:42:56.2502192Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:42:56.2503013Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T11:42:56.2503809Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:42:56.2504625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:42:56.2505626Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:42:56.2506604Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:42:56.2507397Z     _check_triton_bf16_support(graph)
2025-12-04T11:42:56.2508199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:42:56.2509001Z     warn_and_skip(node.get_device())
2025-12-04T11:42:56.2509731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:42:56.2510499Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:42:56.2511034Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T11:42:56.2511434Z 
2025-12-04T11:42:56.2512146Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:42:56.2512994Z 
2025-12-04T11:42:56.2512998Z 
2025-12-04T11:42:56.2513003Z 
2025-12-04T11:42:56.2513241Z To execute this test, run the following from the base repo dir:
2025-12-04T11:42:56.2514492Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16
2025-12-04T11:42:56.2515519Z 
2025-12-04T11:42:56.2515789Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:42:56.2516434Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:42:56.2518074Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpezjp3dnh/saved.pkl
2025-12-04T11:42:56.2520687Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpezjp3dnh/saved.pkl
2025-12-04T11:42:56.2522314Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:42:56.2523933Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpr80swaia/saved.pkl
2025-12-04T11:42:56.2526530Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpr80swaia/saved.pkl
2025-12-04T11:42:56.2528168Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:42:56.2529741Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmp4n6zeiot/saved.pkl
2025-12-04T11:42:56.2532335Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmp4n6zeiot/saved.pkl
2025-12-04T11:42:56.2534487Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-b35d65d1a2e42e4e.xml -
2025-12-04T11:42:56.2535576Z =========================== short test summary info ============================
2025-12-04T11:42:56.2537188Z FAILED [78.1277s] inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 - AssertionError: False is not true : stdout: cuda eval  DistillGPT2                        
2025-12-04T11:42:56.2538754Z TorchDynamo optimized model failed to run because of following error
2025-12-04T11:42:56.2539246Z fail_to_run
2025-12-04T11:42:56.2539492Z , stderr: 
2025-12-04T11:42:56.2540135Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.2541324Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
2025-12-04T11:42:56.2542038Z 
2025-12-04T11:42:56.2542158Z loading model: 0it [00:03, ?it/s]
2025-12-04T11:42:56.2542914Z [W1204 11:40:42.245142406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2543569Z 
2025-12-04T11:42:56.2544099Z [W1204 11:40:58.664357169 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2544750Z 
2025-12-04T11:42:56.2545275Z [W1204 11:40:58.668049808 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2545925Z 
2025-12-04T11:42:56.2546437Z [W1204 11:40:58.668307211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2547097Z 
2025-12-04T11:42:56.2547612Z [W1204 11:40:58.669145931 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2548270Z 
2025-12-04T11:42:56.2548829Z [W1204 11:40:58.669359432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2549481Z 
2025-12-04T11:42:56.2550005Z [W1204 11:40:58.670139203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2550649Z 
2025-12-04T11:42:56.2551178Z [W1204 11:40:58.670333024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2551861Z 
2025-12-04T11:42:56.2552370Z [W1204 11:40:58.671387134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2553063Z 
2025-12-04T11:42:56.2553579Z [W1204 11:40:58.671584322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2554240Z 
2025-12-04T11:42:56.2554754Z [W1204 11:40:58.672047813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2555406Z 
2025-12-04T11:42:56.2555929Z [W1204 11:40:58.672230908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2556637Z 
2025-12-04T11:42:56.2557163Z [W1204 11:40:58.672907023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2557815Z 
2025-12-04T11:42:56.2558326Z [W1204 11:40:58.673093850 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2558985Z 
2025-12-04T11:42:56.2559501Z [W1204 11:40:58.674023154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2560161Z 
2025-12-04T11:42:56.2560673Z [W1204 11:40:58.674207499 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2561334Z 
2025-12-04T11:42:56.2561847Z [W1204 11:40:58.674664175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2562497Z 
2025-12-04T11:42:56.2563018Z [W1204 11:40:58.674845109 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2563669Z 
2025-12-04T11:42:56.2564192Z [W1204 11:40:58.675475067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2564842Z 
2025-12-04T11:42:56.2565355Z [W1204 11:40:58.675657274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2566012Z 
2025-12-04T11:42:56.2566523Z [W1204 11:40:58.676592500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2567181Z 
2025-12-04T11:42:56.2567696Z [W1204 11:40:58.676774803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2568351Z 
2025-12-04T11:42:56.2568873Z [W1204 11:40:58.677222049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2569522Z 
2025-12-04T11:42:56.2570044Z [W1204 11:40:58.677400417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2570696Z 
2025-12-04T11:42:56.2571384Z [W1204 11:40:58.678021426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2572052Z 
2025-12-04T11:42:56.2572563Z [W1204 11:40:58.678200770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2573227Z 
2025-12-04T11:42:56.2573817Z [W1204 11:40:58.679078870 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2574473Z 
2025-12-04T11:42:56.2575001Z [W1204 11:40:58.679258360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2575649Z 
2025-12-04T11:42:56.2576179Z [W1204 11:40:58.679700447 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2576949Z 
2025-12-04T11:42:56.2577508Z [W1204 11:40:58.679878086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2578169Z 
2025-12-04T11:42:56.2578682Z [W1204 11:40:58.680514917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2579339Z 
2025-12-04T11:42:56.2579853Z [W1204 11:40:58.680704722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2580543Z 
2025-12-04T11:42:56.2581069Z [W1204 11:40:58.681592525 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2581728Z 
2025-12-04T11:42:56.2582249Z [W1204 11:40:58.681775925 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2582899Z 
2025-12-04T11:42:56.2583410Z [W1204 11:40:58.682209951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2584076Z 
2025-12-04T11:42:56.2584589Z [W1204 11:40:58.682388769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2585251Z 
2025-12-04T11:42:56.2585764Z [W1204 11:40:58.683003251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2586411Z 
2025-12-04T11:42:56.2586934Z [W1204 11:40:58.683184131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:42:56.2587588Z 
2025-12-04T11:42:56.2588077Z W1204 11:40:59.501000 107132 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:42:56.2589253Z W1204 11:41:02.162000 107132 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmpdoi89lc7/tmptfkayx7_
2025-12-04T11:42:56.2590070Z ERROR:common:
2025-12-04T11:42:56.2590352Z Traceback (most recent call last):
2025-12-04T11:42:56.2590989Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy
2025-12-04T11:42:56.2591638Z     new_result = self.run_n_iterations(
2025-12-04T11:42:56.2592296Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations
2025-12-04T11:42:56.2593009Z     model_iter_fn(mod, inputs, collect_outputs=False)
2025-12-04T11:42:56.2593791Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T11:42:56.2594669Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T11:42:56.2595570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T11:42:56.2596423Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T11:42:56.2597247Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T11:42:56.2598180Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T11:42:56.2599002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T11:42:56.2600060Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T11:42:56.2601046Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T11:42:56.2601841Z     _check_triton_bf16_support(graph)
2025-12-04T11:42:56.2602641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T11:42:56.2603495Z     warn_and_skip(node.get_device())
2025-12-04T11:42:56.2604270Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T11:42:56.2605045Z     raise SkipFrame("BF16 is not supported")
2025-12-04T11:42:56.2605573Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T11:42:56.2605958Z 
2025-12-04T11:42:56.2606675Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T11:42:56.2607567Z 
2025-12-04T11:42:56.2607572Z 
2025-12-04T11:42:56.2607576Z 
2025-12-04T11:42:56.2607794Z To execute this test, run the following from the base repo dir:
2025-12-04T11:42:56.2609041Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16
2025-12-04T11:42:56.2610064Z 
2025-12-04T11:42:56.2610345Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:42:56.2610939Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:42:56.2611470Z ============= 1 failed, 2 deselected, 2 rerun in 236.06s (0:03:56) =============
2025-12-04T11:42:56.2611927Z Got exit code 1
2025-12-04T11:42:56.2612924Z FAILED CONSISTENTLY: test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16
2025-12-04T11:42:56.2614265Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:42:56.2615277Z W1204 11:41:14.982000 107236 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:42:56.2616536Z Test results will be stored in test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-feba5ff46dbc30dd.xml
2025-12-04T11:42:56.2617449Z ============================= test session starts ==============================
2025-12-04T11:42:56.2618112Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:42:56.2618724Z cachedir: .pytest_cache
2025-12-04T11:42:56.2619444Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:42:56.2620234Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:42:56.2620586Z configfile: pytest.ini
2025-12-04T11:42:56.2621375Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T11:42:56.2622326Z collecting ... collected 32 items / 1 deselected / 31 selected
2025-12-04T11:42:56.2622814Z stepcurrent: skipping 1 already run items.
2025-12-04T11:42:56.2623210Z Running 2 items in this shard
2025-12-04T11:42:56.2623440Z 
2025-12-04T11:42:56.2624205Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_GoogleFnet_training_or_inference_inference_precision_amp PASSED [50.2427s] [ 50%]
2025-12-04T11:42:56.2625864Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_GoogleFnet_training_or_inference_inference_precision_float16 PASSED [49.8039s] [100%]
2025-12-04T11:42:56.2626764Z 
2025-12-04T11:42:56.2627606Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-feba5ff46dbc30dd.xml -
2025-12-04T11:42:56.2628720Z ================= 2 passed, 1 deselected in 100.07s (0:01:40) ==================
2025-12-04T11:42:56.2629989Z The following tests failed consistently: ['test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16']
2025-12-04T11:42:56.2631070Z 
2025-12-04T11:42:56.2631656Z FINISHED PRINTING LOG FILE of inductor/test_deterministic 5/8 (test/test-reports/inductor.test_deterministic_5.8_04041ff7a6ce6208_.log)
2025-12-04T11:42:56.2632357Z 
2025-12-04T11:42:56.2632780Z Finished inductor/test_deterministic 5/8 ... [2025-12-04 11:42:56.169381][9004.552272517], took 12.91min
2025-12-04T11:42:56.2634141Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-ccc55353a2e77d8f.xml
2025-12-04T11:42:56.2659912Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-cbc1aeff512c7b0d.xml
2025-12-04T11:42:56.3114942Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-b35d65d1a2e42e4e.xml
2025-12-04T11:42:56.3489784Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-feba5ff46dbc30dd.xml
2025-12-04T11:42:56.6646203Z Uploading logs for 57119749248 to S3
2025-12-04T11:42:56.7516770Z Uploading artifacts took 0.37 seconds
2025-12-04T11:42:56.7517200Z inductor/test_deterministic 5/8 failed!
2025-12-04T11:42:56.7521647Z Running inductor/test_fp8 1/1 ... [2025-12-04 11:42:56.751998][9005.134892596]
2025-12-04T11:42:56.7522176Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:42:56.7527059Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_fp8.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:42:56.752479]
2025-12-04T12:15:04.7378265Z 
2025-12-04T12:15:04.7379343Z PRINTING LOG FILE of inductor/test_fp8 1/1 (test/test-reports/inductor.test_fp8_1.1_5b24deb545871ee8_.log)
2025-12-04T12:15:04.7380950Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dff864e79f1bf91b.xml
2025-12-04T12:15:04.7382116Z ============================= test session starts ==============================
2025-12-04T12:15:04.7383054Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:04.7384090Z cachedir: .pytest_cache
2025-12-04T12:15:04.7385223Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:04.7386404Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:04.7387022Z configfile: pytest.ini
2025-12-04T12:15:04.7388358Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:04.7389480Z collecting ... collected 188 items
2025-12-04T12:15:04.7390144Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T12:15:04.7564729Z Running 188 items in this shard: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_bad_cast_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_False_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_True_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_False_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_True_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e5m2_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fp8_max_autotune_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fusion_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_True_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_True_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_scaled_mm_preserves_strides_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_True_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_True_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_True_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_True_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_input_dims_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_scale_dims_rowwise_scaling_cuda
2025-12-04T12:15:04.7747065Z 
2025-12-04T12:15:04.7749352Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:04.7753374Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.7756134Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:04.7758125Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:04.7759792Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:04.7761827Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:04.7763900Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:04.7766341Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.7768664Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:04.7770936Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:04.7772774Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:04.7774585Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:04.7776257Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:04.7777731Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:04.7779359Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:04.7780865Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:04.7782677Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:04.7784749Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.7786484Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:04.7788009Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:04.7789706Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:04.7791595Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:04.7793690Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:04.7795557Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T12:15:04.7797277Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:04.7799095Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:04.7800963Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:04.7802771Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:04.7804741Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:04.7806595Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:04.7808684Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T12:15:04.7811141Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T12:15:04.7813138Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.7816829Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.7821099Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.7823705Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.7826266Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.7828340Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.7831422Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.7834116Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.7836762Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.7838687Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.7840760Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.7842935Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.7844965Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.7846666Z ('RERUN', {'yellow': True}) [3.2912s] [  0%]
2025-12-04T12:15:04.7848952Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:04.7852510Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.7854613Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:04.7855875Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:04.7857624Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:04.7859009Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:04.7860624Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:04.7862341Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.7863923Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:04.7865865Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:04.7867838Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:04.7869444Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:04.7871132Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:04.7872734Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:04.7874407Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:04.7875969Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:04.7877668Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:04.7879605Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.7881215Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:04.7882818Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:04.7884821Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:04.7886893Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:04.7888993Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:04.7890878Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T12:15:04.7892586Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:04.7894340Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:04.7896141Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:04.7897913Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:04.7899823Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:04.7901976Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:04.7904221Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T12:15:04.7906219Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T12:15:04.7907901Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.7911527Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.7915455Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.7917576Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.7920473Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.7922629Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.7924964Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.7927642Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.7930470Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.7932598Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.7935446Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.7937841Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.7941197Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.7943253Z ('RERUN', {'yellow': True}) [0.3483s] [  0%]
2025-12-04T12:15:04.7945355Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:04.7948782Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.7950321Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:04.7951463Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:04.7952567Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:04.7953672Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:04.7954811Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:04.7956027Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.7957301Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:04.7958604Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:04.7959891Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:04.7961036Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:04.7962151Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:04.7963274Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:04.7964354Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:04.7965416Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:04.7966664Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:04.7968038Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.7969250Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:04.7970445Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:04.7971908Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:04.7973348Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:04.7974682Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:04.7975970Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T12:15:04.7977163Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:04.7978279Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:04.7979429Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:04.7980583Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:04.7981745Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:04.7982990Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:04.7984355Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T12:15:04.7985900Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T12:15:04.7987113Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.7989712Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.7992447Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.7994166Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.7995970Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.7997630Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.7999391Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.8001105Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.8002930Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.8004471Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.8006183Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8007695Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.8009099Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.8010238Z FAILED [0.3498s] [  0%]
2025-12-04T12:15:04.8010420Z 
2025-12-04T12:15:04.8010565Z ==================================== RERUNS ====================================
2025-12-04T12:15:04.8011181Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _
2025-12-04T12:15:04.8011769Z Traceback (most recent call last):
2025-12-04T12:15:04.8012453Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.8013296Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.8014166Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.8015052Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.8015945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.8016876Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.8017725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.8018538Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.8019349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.8020354Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.8021352Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.8022170Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.8022921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.8023671Z     return self._compile_to_module()
2025-12-04T12:15:04.8024406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.8025188Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.8026008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.8026798Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.8027600Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.8028463Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.8029430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.8030293Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.8031059Z   File "/tmp/tmp7wdl8vg8/ha/chatiivoxdb5gtbqpamfs2lmbuetlnhbvebwzol5gw2sywjeo333.py", line 62, in <module>
2025-12-04T12:15:04.8032219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.8032938Z     kernel.precompile(
2025-12-04T12:15:04.8033688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.8034491Z     self._precompile_worker()
2025-12-04T12:15:04.8035315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.8036264Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.8037171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.8038101Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.8038898Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.8039739Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.8040572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.8041487Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.8042196Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.8043080Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8043820Z ^
2025-12-04T12:15:04.8044410Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.8045012Z 
2025-12-04T12:15:04.8045723Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.8046574Z 
2025-12-04T12:15:04.8046579Z 
2025-12-04T12:15:04.8046811Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.8047799Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:04.8048561Z 
2025-12-04T12:15:04.8048832Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.8049475Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.8049957Z frames [('total', 1)]
2025-12-04T12:15:04.8050244Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.8050699Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.8051299Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.8051769Z graph_break []
2025-12-04T12:15:04.8052250Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _
2025-12-04T12:15:04.8052840Z Traceback (most recent call last):
2025-12-04T12:15:04.8053539Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.8054369Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.8055293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.8056180Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.8057289Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.8058202Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.8059051Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.8059919Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.8060770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.8061779Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.8062780Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.8063601Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.8064418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.8065173Z     return self._compile_to_module()
2025-12-04T12:15:04.8065915Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.8066718Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.8067532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.8068335Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.8069102Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.8069979Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.8070928Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.8071991Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.8072759Z   File "/tmp/tmpzyu2a4d5/uh/cuhgwijot7lhtmot4esqh5jijysnud4eeu6s4bqkh2ficdnykgeq.py", line 62, in <module>
2025-12-04T12:15:04.8073871Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.8074594Z     kernel.precompile(
2025-12-04T12:15:04.8075347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.8076162Z     self._precompile_worker()
2025-12-04T12:15:04.8076970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.8077893Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.8078807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.8079751Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.8080532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.8081377Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.8082219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.8083127Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.8083889Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.8084863Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8085623Z ^
2025-12-04T12:15:04.8086306Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.8086912Z 
2025-12-04T12:15:04.8087625Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.8088535Z 
2025-12-04T12:15:04.8088541Z 
2025-12-04T12:15:04.8088761Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.8089791Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:04.8090557Z 
2025-12-04T12:15:04.8090838Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.8091468Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.8091947Z frames [('total', 1)]
2025-12-04T12:15:04.8092247Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.8092737Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.8093341Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.8093807Z graph_break []
2025-12-04T12:15:04.8094176Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.8094650Z frames [('total', 1)]
2025-12-04T12:15:04.8094957Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.8095387Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.8095980Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.8096524Z graph_break []
2025-12-04T12:15:04.8096832Z =================================== FAILURES ===================================
2025-12-04T12:15:04.8097433Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _
2025-12-04T12:15:04.8098020Z Traceback (most recent call last):
2025-12-04T12:15:04.8098720Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.8099545Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.8100416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.8101296Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.8102203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.8103037Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.8103880Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.8104678Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.8105495Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.8106486Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.8107477Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.8108293Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.8109060Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.8109795Z     return self._compile_to_module()
2025-12-04T12:15:04.8110523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.8112014Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.8112921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.8113721Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.8114482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.8115357Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.8116350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.8117213Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.8118021Z   File "/tmp/tmpym30s4rg/f7/cf73uwgybamxghyudgisrciw3ukevb3eyyij2dnyozakiw2bi4a7.py", line 62, in <module>
2025-12-04T12:15:04.8119138Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.8119846Z     kernel.precompile(
2025-12-04T12:15:04.8120600Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.8121458Z     self._precompile_worker()
2025-12-04T12:15:04.8122264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.8219729Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.8220735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.8221671Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.8222455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.8223285Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.8224115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.8225025Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.8225725Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.8226587Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8227319Z ^
2025-12-04T12:15:04.8227898Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.8229088Z 
2025-12-04T12:15:04.8229847Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.8230684Z 
2025-12-04T12:15:04.8230689Z 
2025-12-04T12:15:04.8230914Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.8232207Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:04.8233369Z 
2025-12-04T12:15:04.8233644Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.8234274Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.8234748Z frames [('total', 1)]
2025-12-04T12:15:04.8235026Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.8235479Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.8236077Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.8236536Z graph_break []
2025-12-04T12:15:04.8236895Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.8237352Z frames [('total', 1)]
2025-12-04T12:15:04.8237639Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.8238280Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.8238870Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.8239332Z graph_break []
2025-12-04T12:15:04.8239685Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.8240140Z frames [('total', 1)]
2025-12-04T12:15:04.8240425Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.8240936Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.8241507Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.8242043Z graph_break []
2025-12-04T12:15:04.8242845Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dff864e79f1bf91b.xml -
2025-12-04T12:15:04.8243782Z =========================== short test summary info ============================
2025-12-04T12:15:04.8244880Z FAILED [0.3498s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.8246342Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8247085Z ^
2025-12-04T12:15:04.8247646Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.8248244Z 
2025-12-04T12:15:04.8248952Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.8249792Z 
2025-12-04T12:15:04.8249797Z 
2025-12-04T12:15:04.8250008Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.8250974Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:04.8251740Z 
2025-12-04T12:15:04.8252013Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.8252581Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:04.8253053Z ========================== 1 failed, 2 rerun in 4.03s ==========================
2025-12-04T12:15:04.8253771Z Got exit code 1
2025-12-04T12:15:04.8254032Z Retrying single test...
2025-12-04T12:15:04.8254683Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-053a0e10a178eff6.xml
2025-12-04T12:15:04.8255639Z ============================= test session starts ==============================
2025-12-04T12:15:04.8256383Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:04.8256972Z cachedir: .pytest_cache
2025-12-04T12:15:04.8257675Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:04.8258467Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:04.8258809Z configfile: pytest.ini
2025-12-04T12:15:04.8259573Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:04.8260518Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:04.8261585Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:04.8262526Z Running 1 items in this shard
2025-12-04T12:15:04.8262745Z 
2025-12-04T12:15:04.8264043Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:04.8266372Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8267887Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:04.8268938Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:04.8270044Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:04.8271412Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:04.8272765Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:04.8273974Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.8275315Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:04.8276618Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:04.8278057Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:04.8279231Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:04.8280316Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:04.8281428Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:04.8282497Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:04.8283534Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:04.8284751Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:04.8286046Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.8287237Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:04.8288417Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:04.8289619Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:04.8290902Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:04.8292227Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:04.8293485Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T12:15:04.8294600Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:04.8295719Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:04.8296939Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:04.8298090Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:04.8299290Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:04.8300558Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:04.8301932Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T12:15:04.8303484Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T12:15:04.8304719Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.8307316Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.8310071Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.8311781Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.8313583Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.8315249Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.8316962Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.8318665Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.8320456Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.8321975Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.8323689Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8325151Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.8326592Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.8327746Z ('RERUN', {'yellow': True}) [3.3089s] [100%]
2025-12-04T12:15:04.8329261Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:04.8331630Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8333152Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:04.8334159Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:04.8335287Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:04.8336473Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:04.8337620Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:04.8338818Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.8340083Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:04.8341401Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:04.8342688Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:04.8343827Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:04.8344930Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:04.8346066Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:04.8347140Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:04.8348197Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:04.8349423Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:04.8350727Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.8351937Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:04.8353121Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:04.8354326Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:04.8355682Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:04.8357024Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:04.8358292Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T12:15:04.8359429Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:04.8360515Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:04.8361674Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:04.8362823Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:04.8363981Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:04.8365244Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:04.8366616Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T12:15:04.8368165Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T12:15:04.8369368Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.8372147Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.8374906Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.8376683Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.8378494Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.8380155Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.8381865Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.8383558Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.8385352Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.8386956Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.8388663Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8390166Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.8391597Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.8392762Z ('RERUN', {'yellow': True}) [0.3470s] [100%]
2025-12-04T12:15:04.8394263Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:04.8396613Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8398140Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:04.8399145Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:04.8400239Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:04.8401362Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:04.8402501Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:04.8403707Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.8404967Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:04.8406274Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:04.8407550Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:04.8408693Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:04.8409782Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:04.8410919Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:04.8411995Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:04.8413041Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:04.8414263Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:04.8415566Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.8416879Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:04.8418074Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:04.8419284Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:04.8420665Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:04.8422009Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:04.8423279Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T12:15:04.8424389Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:04.8425477Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:04.8426630Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:04.8427781Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:04.8428922Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:04.8430173Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:04.8431557Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T12:15:04.8433105Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T12:15:04.8434315Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.8436901Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.8439628Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.8441351Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.8443163Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.8444811Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.8446560Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.8448245Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.8450032Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.8451599Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.8453295Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8454747Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.8456182Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.8457407Z FAILED [0.3458s] [100%]
2025-12-04T12:15:04.8457599Z 
2025-12-04T12:15:04.8457747Z ==================================== RERUNS ====================================
2025-12-04T12:15:04.8458358Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _
2025-12-04T12:15:04.8458933Z Traceback (most recent call last):
2025-12-04T12:15:04.8459635Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.8460478Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.8461353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.8462221Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.8463128Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.8463981Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.8464808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.8465606Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.8466420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.8467410Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.8468387Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.8469197Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.8469960Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.8470702Z     return self._compile_to_module()
2025-12-04T12:15:04.8471619Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.8472406Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.8473222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.8473993Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.8474748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.8475707Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.8476674Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.8477515Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.8478261Z   File "/tmp/tmpu_c4fj5y/nq/cnqwkjdxubfnokpzrjleqkzb7cjglpvbgvld3zxp2eqfsu3gtp6g.py", line 62, in <module>
2025-12-04T12:15:04.8479393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.8480142Z     kernel.precompile(
2025-12-04T12:15:04.8480881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.8481693Z     self._precompile_worker()
2025-12-04T12:15:04.8482511Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.8483412Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.8484364Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.8485306Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.8486094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.8486918Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.8487747Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.8488673Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.8489379Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.8490242Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8490988Z ^
2025-12-04T12:15:04.8491574Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.8492163Z 
2025-12-04T12:15:04.8492888Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.8493729Z 
2025-12-04T12:15:04.8493734Z 
2025-12-04T12:15:04.8493951Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.8494937Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:04.8495704Z 
2025-12-04T12:15:04.8495971Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.8496685Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.8497152Z frames [('total', 1)]
2025-12-04T12:15:04.8497453Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.8497911Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.8498497Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.8498963Z graph_break []
2025-12-04T12:15:04.8499437Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _
2025-12-04T12:15:04.8500019Z Traceback (most recent call last):
2025-12-04T12:15:04.8500697Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.8501530Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.8502403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.8503309Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.8504211Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.8505056Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.8505893Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.8506715Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.8507565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.8508555Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.8509540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.8510338Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.8511130Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.8511874Z     return self._compile_to_module()
2025-12-04T12:15:04.8512589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.8513379Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.8514195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.8514986Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.8515728Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.8516597Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.8517556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.8518411Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.8519151Z   File "/tmp/tmpv1k5hwll/zf/czfxhseglpjumrbiwe4oh6fpaq24w2prrcngy25ie3d6cjz2k4tw.py", line 62, in <module>
2025-12-04T12:15:04.8520254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.8520970Z     kernel.precompile(
2025-12-04T12:15:04.8521705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.8522515Z     self._precompile_worker()
2025-12-04T12:15:04.8523330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.8524242Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.8525141Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.8526081Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.8526870Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.8527713Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.8528535Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.8529468Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.8530168Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.8531033Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8531788Z ^
2025-12-04T12:15:04.8532422Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.8533022Z 
2025-12-04T12:15:04.8533746Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.8534623Z 
2025-12-04T12:15:04.8534628Z 
2025-12-04T12:15:04.8534854Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.8535853Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:04.8536698Z 
2025-12-04T12:15:04.8536968Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.8537606Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.8538079Z frames [('total', 1)]
2025-12-04T12:15:04.8538369Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.8538857Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.8539456Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.8539904Z graph_break []
2025-12-04T12:15:04.8540278Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.8540747Z frames [('total', 1)]
2025-12-04T12:15:04.8541032Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.8541471Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.8542070Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.8542540Z graph_break []
2025-12-04T12:15:04.8542840Z =================================== FAILURES ===================================
2025-12-04T12:15:04.8543453Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _
2025-12-04T12:15:04.8544030Z Traceback (most recent call last):
2025-12-04T12:15:04.8544712Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.8545548Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.8546416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.8547291Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.8548184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.8549032Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.8549869Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.8550656Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.8551467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.8552460Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.8553444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.8554250Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.8555019Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.8555762Z     return self._compile_to_module()
2025-12-04T12:15:04.8556491Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.8557271Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.8558118Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.8558913Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.8559656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.8560524Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.8561483Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.8562376Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.8563170Z   File "/tmp/tmpohvbi7jj/i2/ci22fz46s6ajnyspd3wh56hubwynourlnmmhtqsrjubrvo46svo5.py", line 62, in <module>
2025-12-04T12:15:04.8564269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.8564987Z     kernel.precompile(
2025-12-04T12:15:04.8565736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.8566570Z     self._precompile_worker()
2025-12-04T12:15:04.8567398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.8568317Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.8569219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.8570152Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.8571174Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.8572050Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.8572873Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.8573797Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.8574508Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.8575384Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8576115Z ^
2025-12-04T12:15:04.8576760Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.8577352Z 
2025-12-04T12:15:04.8578080Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.8578924Z 
2025-12-04T12:15:04.8578929Z 
2025-12-04T12:15:04.8579158Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.8580131Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:04.8580904Z 
2025-12-04T12:15:04.8581171Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.8581804Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.8582272Z frames [('total', 1)]
2025-12-04T12:15:04.8582561Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.8583014Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.8583621Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.8584070Z graph_break []
2025-12-04T12:15:04.8584447Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.8584912Z frames [('total', 1)]
2025-12-04T12:15:04.8585193Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.8585718Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.8586319Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.8586799Z graph_break []
2025-12-04T12:15:04.8587161Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.8587628Z frames [('total', 1)]
2025-12-04T12:15:04.8587924Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.8588348Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.8588987Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.8589461Z graph_break []
2025-12-04T12:15:04.8590305Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-053a0e10a178eff6.xml -
2025-12-04T12:15:04.8591270Z =========================== short test summary info ============================
2025-12-04T12:15:04.8592367Z FAILED [0.3458s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.8593834Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8594567Z ^
2025-12-04T12:15:04.8595159Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.8595768Z 
2025-12-04T12:15:04.8596481Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.8597318Z 
2025-12-04T12:15:04.8597322Z 
2025-12-04T12:15:04.8597558Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.8598543Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:04.8599299Z 
2025-12-04T12:15:04.8599570Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.8600172Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:04.8600708Z ================== 1 failed, 187 deselected, 2 rerun in 4.05s ==================
2025-12-04T12:15:04.8601165Z Got exit code 1
2025-12-04T12:15:04.8601434Z Retrying single test...
2025-12-04T12:15:04.8602101Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-966288eeb3fe785e.xml
2025-12-04T12:15:04.8602885Z ============================= test session starts ==============================
2025-12-04T12:15:04.8603540Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:04.8604148Z cachedir: .pytest_cache
2025-12-04T12:15:04.8604867Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:04.8605657Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:04.8606005Z configfile: pytest.ini
2025-12-04T12:15:04.8606787Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:04.8607749Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:04.8608803Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:04.8609757Z Running 1 items in this shard
2025-12-04T12:15:04.8609987Z 
2025-12-04T12:15:04.8611225Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:04.8613582Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8615106Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:04.8616140Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:04.8617342Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:04.8618464Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:04.8619608Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:04.8620811Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.8622102Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:04.8623416Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:04.8624704Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:04.8625827Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:04.8626934Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:04.8628065Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:04.8629147Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:04.8630179Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:04.8631412Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:04.8632722Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.8633930Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:04.8635113Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:04.8636326Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:04.8637620Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:04.8638958Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:04.8640232Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T12:15:04.8641337Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:04.8642431Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:04.8643594Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:04.8644743Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:04.8645915Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:04.8647194Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:04.8648577Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T12:15:04.8650125Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T12:15:04.8651359Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.8653925Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.8656734Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.8658462Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.8660265Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.8661928Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.8663633Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.8665316Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.8667119Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.8668641Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.8670336Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8671985Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.8673458Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.8674627Z ('RERUN', {'yellow': True}) [3.3138s] [100%]
2025-12-04T12:15:04.8676121Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:04.8678532Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8680053Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:04.8681057Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:04.8682199Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:04.8683312Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:04.8684455Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:04.8685661Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.8686921Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:04.8688235Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:04.8689523Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:04.8690650Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:04.8691753Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:04.8692887Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:04.8693967Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:04.8695005Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:04.8696948Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:04.8698262Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.8699470Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:04.8700645Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:04.8701868Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:04.8703418Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:04.8704887Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:04.8706166Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T12:15:04.8707273Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:04.8708403Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:04.8709571Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:04.8710730Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:04.8711881Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:04.8713167Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:04.8714626Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T12:15:04.8716298Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T12:15:04.8717806Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.8720543Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.8723418Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.8725288Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.8727216Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.8728955Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.8730821Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.8732640Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.8734527Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.8736269Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.8738126Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8739718Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.8741330Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.8742624Z ('RERUN', {'yellow': True}) [0.3597s] [100%]
2025-12-04T12:15:04.8744256Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:04.8746689Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8748331Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:04.8749494Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:04.8750736Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:04.8751967Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:04.8753195Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:04.8754556Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.8755934Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:04.8757403Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:04.8758749Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:04.8759997Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:04.8761256Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:04.8762499Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:04.8763636Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:04.8764839Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:04.8766181Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:04.8767588Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.8768997Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:04.8770249Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:04.8771770Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:04.8773330Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:04.8774851Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:04.8776176Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T12:15:04.8777535Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:04.8778774Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:04.8780068Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:04.8781368Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:04.8782594Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:04.8783990Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:04.8785509Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T12:15:04.8787178Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T12:15:04.8788463Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.8791208Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.8794055Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.8795941Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.8797858Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.8799606Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.8801534Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.8803297Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.8805182Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.8806946Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.8808751Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8810354Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.8811867Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.8813113Z FAILED [0.3631s] [100%]
2025-12-04T12:15:04.8813358Z 
2025-12-04T12:15:04.8813604Z ==================================== RERUNS ====================================
2025-12-04T12:15:04.8814347Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _
2025-12-04T12:15:04.8815009Z Traceback (most recent call last):
2025-12-04T12:15:04.8815828Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.8816921Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.8817928Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.8818900Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.8819938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.8820909Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.8821902Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.8822780Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.8823706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.8824843Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.8825952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.8826824Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.8827732Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.8828596Z     return self._compile_to_module()
2025-12-04T12:15:04.8829386Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.8830321Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.8831255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.8832137Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.8833019Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.8834046Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.8835073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.8836167Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.8836990Z   File "/tmp/tmpxkncwm92/wf/cwfemw2uybep42tuf6ibn5fijwno3ejhyf2getnhpigbh6jbhmmr.py", line 62, in <module>
2025-12-04T12:15:04.8838227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.8839134Z     kernel.precompile(
2025-12-04T12:15:04.8840022Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.8840891Z     self._precompile_worker()
2025-12-04T12:15:04.8841885Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.8851169Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.8852363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.8853391Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.8854307Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.8855270Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.8856241Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.8857340Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.8858184Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.8859204Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8860062Z ^
2025-12-04T12:15:04.8860730Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.8861441Z 
2025-12-04T12:15:04.8862192Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.8863120Z 
2025-12-04T12:15:04.8863126Z 
2025-12-04T12:15:04.8863381Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.8864510Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:04.8865311Z 
2025-12-04T12:15:04.8865667Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.8866370Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.8867002Z frames [('total', 1)]
2025-12-04T12:15:04.8867413Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.8867931Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.8868685Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.8869255Z graph_break []
2025-12-04T12:15:04.8869820Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _
2025-12-04T12:15:04.8870543Z Traceback (most recent call last):
2025-12-04T12:15:04.8871543Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.8872478Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.8873494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.8874590Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.8875598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.8876622Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.8877529Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.8878489Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.8879530Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.8880649Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.8881682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.8882685Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.8883573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.8884515Z     return self._compile_to_module()
2025-12-04T12:15:04.8885327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.8886249Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.8887191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.8888120Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.8888937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.8889933Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.8891035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.8892007Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.8892837Z   File "/tmp/tmpwz2g9nlg/hn/chnls7qs2snlkm5mkd36mzn7rstdzdwu3uhl2dj7efuafbxas3am.py", line 62, in <module>
2025-12-04T12:15:04.8894088Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.8894920Z     kernel.precompile(
2025-12-04T12:15:04.8895742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.8896798Z     self._precompile_worker()
2025-12-04T12:15:04.8897726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.8898807Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.8899786Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.8900842Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.8901794Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.8902743Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.8903645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.8904738Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.8905556Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.8906522Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8907407Z ^
2025-12-04T12:15:04.8908156Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.8908790Z 
2025-12-04T12:15:04.8909578Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.8910498Z 
2025-12-04T12:15:04.8910502Z 
2025-12-04T12:15:04.8910868Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.8911949Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:04.8912812Z 
2025-12-04T12:15:04.8913097Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.8913899Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.8914495Z frames [('total', 1)]
2025-12-04T12:15:04.8914832Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.8915450Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.8916204Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.8916790Z graph_break []
2025-12-04T12:15:04.8917248Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.8917833Z frames [('total', 1)]
2025-12-04T12:15:04.8918266Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.8918793Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.8919504Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.8920112Z graph_break []
2025-12-04T12:15:04.8920554Z =================================== FAILURES ===================================
2025-12-04T12:15:04.8921230Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _
2025-12-04T12:15:04.8921956Z Traceback (most recent call last):
2025-12-04T12:15:04.8922793Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.8923694Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.8924693Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.8925716Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.8926729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.8927654Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.8928716Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.8929625Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.8930627Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.8931695Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.8932796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.8933774Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.8934658Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.8935470Z     return self._compile_to_module()
2025-12-04T12:15:04.8936473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.8937430Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.8938350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.8939328Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.8940271Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.8941247Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.8942330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.8943352Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.8944236Z   File "/tmp/tmpqsmwxf2t/cr/ccrgz6yksh52d4pljsiu454p36bnmquvjmz2guh5wajrhj4ezynv.py", line 62, in <module>
2025-12-04T12:15:04.8945517Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.8946312Z     kernel.precompile(
2025-12-04T12:15:04.8947154Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.8948200Z     self._precompile_worker()
2025-12-04T12:15:04.8949139Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.8950105Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.8951253Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.8952312Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.8953237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.8954179Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.8955129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.8956186Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.8957037Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.8957977Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8958852Z ^
2025-12-04T12:15:04.8959587Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.8960220Z 
2025-12-04T12:15:04.8961022Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.8961881Z 
2025-12-04T12:15:04.8961887Z 
2025-12-04T12:15:04.8962273Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.8963394Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:04.8964245Z 
2025-12-04T12:15:04.8964554Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.8965363Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.8965900Z frames [('total', 1)]
2025-12-04T12:15:04.8966316Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.8966931Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.8967646Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.8968185Z graph_break []
2025-12-04T12:15:04.8969390Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.8969980Z frames [('total', 1)]
2025-12-04T12:15:04.8970355Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.8971191Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.8972037Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.8972632Z graph_break []
2025-12-04T12:15:04.8973141Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.8973729Z frames [('total', 1)]
2025-12-04T12:15:04.8974117Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.8974679Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.8975457Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.8976030Z graph_break []
2025-12-04T12:15:04.8977105Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-966288eeb3fe785e.xml -
2025-12-04T12:15:04.8978209Z =========================== short test summary info ============================
2025-12-04T12:15:04.8979470Z FAILED [0.3631s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.8981121Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.8982185Z ^
2025-12-04T12:15:04.8982872Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.8983692Z 
2025-12-04T12:15:04.8984446Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.8985378Z 
2025-12-04T12:15:04.8985383Z 
2025-12-04T12:15:04.8985645Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.8986818Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:04.8987624Z 
2025-12-04T12:15:04.8987939Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.8988696Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:04.8989417Z ================== 1 failed, 187 deselected, 2 rerun in 4.08s ==================
2025-12-04T12:15:04.8989982Z Got exit code 1
2025-12-04T12:15:04.8990825Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:04.8992084Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:04.8993232Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-47dd8058babbbd0d.xml
2025-12-04T12:15:04.8994107Z ============================= test session starts ==============================
2025-12-04T12:15:04.8994911Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:04.8995632Z cachedir: .pytest_cache
2025-12-04T12:15:04.8996444Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:04.8997442Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:04.8997865Z configfile: pytest.ini
2025-12-04T12:15:04.8998742Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:04.8999872Z collecting ... collected 188 items / 1 deselected / 187 selected
2025-12-04T12:15:04.9000520Z stepcurrent: skipping 1 already run items.
2025-12-04T12:15:04.9001021Z Running 187 items in this shard
2025-12-04T12:15:04.9001380Z 
2025-12-04T12:15:04.9002746Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:04.9005236Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9006902Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:04.9008046Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:04.9008700Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:04.9009202Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:04.9009780Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:04.9010604Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.9011249Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:04.9011936Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:04.9012534Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:04.9013068Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:04.9013605Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:04.9014239Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:04.9014826Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:04.9015314Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:04.9016061Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:04.9016731Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.9017392Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:04.9017957Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:04.9018598Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:04.9019260Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:04.9019931Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:04.9020507Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T12:15:04.9021459Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:04.9022087Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:04.9022701Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:04.9023182Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:04.9023886Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:04.9024462Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:04.9025335Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T12:15:04.9026081Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T12:15:04.9026515Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.9028698Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.9029316Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.9030517Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9031211Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9032193Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9032914Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9033898Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9034688Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9035449Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.9036441Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9036880Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.9037873Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9038049Z ('RERUN', {'yellow': True}) [3.3269s] [  0%]
2025-12-04T12:15:04.9039396Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:04.9040465Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9041001Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:04.9041517Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:04.9042079Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:04.9042609Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:04.9043234Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:04.9043893Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.9044521Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:04.9045267Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:04.9045865Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:04.9046322Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:04.9047004Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:04.9047532Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:04.9048079Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:04.9048567Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:04.9049249Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:04.9049879Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.9050488Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:04.9051079Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:04.9051701Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:04.9052355Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:04.9053047Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:04.9053640Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T12:15:04.9054260Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:04.9054769Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:04.9055427Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:04.9055909Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:04.9056645Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:04.9057352Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:04.9058168Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T12:15:04.9058963Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T12:15:04.9059368Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.9061539Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.9062162Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.9063308Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9063975Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9064902Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9065684Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9066577Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9067580Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9068230Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.9069280Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9069716Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.9070672Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9071123Z ('RERUN', {'yellow': True}) [0.3543s] [  0%]
2025-12-04T12:15:04.9072439Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:04.9073594Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9074069Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:04.9074606Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:04.9075143Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:04.9075776Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:04.9076439Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:04.9077021Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.9077701Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:04.9078323Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:04.9078993Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:04.9079506Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:04.9080064Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:04.9080625Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:04.9081129Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:04.9081646Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:04.9082390Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:04.9083037Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.9083672Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:04.9084217Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:04.9084893Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:04.9085575Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:04.9086360Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:04.9086914Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T12:15:04.9087418Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:04.9088062Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:04.9088686Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:04.9089242Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:04.9089876Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:04.9090451Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:04.9091251Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T12:15:04.9092005Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T12:15:04.9092439Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.9094645Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.9095290Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.9096455Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9097200Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9098135Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9098970Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9099916Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9100762Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9101503Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.9102502Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9102968Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.9103946Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9104175Z FAILED [0.3581s] [  0%]
2025-12-04T12:15:04.9104182Z 
2025-12-04T12:15:04.9104432Z ==================================== RERUNS ====================================
2025-12-04T12:15:04.9104792Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _
2025-12-04T12:15:04.9105475Z Traceback (most recent call last):
2025-12-04T12:15:04.9105960Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.9106303Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.9106923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.9107214Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.9107816Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.9108057Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.9108586Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.9108902Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.9109479Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.9109887Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.9110451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.9110643Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.9111228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.9111426Z     return self._compile_to_module()
2025-12-04T12:15:04.9112002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.9112208Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.9112761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.9112959Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.9113603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.9114010Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.9114642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.9114809Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.9115405Z   File "/tmp/tmpynpbzzr4/vs/cvscihiarpeopxwvjhysbzu5j4jpryak7kmli74mev3svog7a2j3.py", line 62, in <module>
2025-12-04T12:15:04.9115964Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.9116168Z     kernel.precompile(
2025-12-04T12:15:04.9117064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.9117227Z     self._precompile_worker()
2025-12-04T12:15:04.9117938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.9118209Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.9118820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9119189Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9119734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9120082Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9120565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9120943Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9217240Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9218080Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9218188Z ^
2025-12-04T12:15:04.9218663Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9218671Z 
2025-12-04T12:15:04.9219415Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9219429Z 
2025-12-04T12:15:04.9219434Z 
2025-12-04T12:15:04.9219666Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9220339Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:04.9220345Z 
2025-12-04T12:15:04.9220631Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9220882Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9221000Z frames [('total', 1)]
2025-12-04T12:15:04.9221127Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9221388Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.9221618Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9221729Z graph_break []
2025-12-04T12:15:04.9222073Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _
2025-12-04T12:15:04.9222210Z Traceback (most recent call last):
2025-12-04T12:15:04.9222679Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.9222948Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.9223684Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.9223958Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.9224484Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.9224688Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.9225225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.9225442Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.9226060Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.9226389Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.9226923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.9227095Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.9227720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.9227868Z     return self._compile_to_module()
2025-12-04T12:15:04.9228364Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.9228538Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.9229082Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.9229221Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.9229721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.9229981Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.9230577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.9230731Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.9231220Z   File "/tmp/tmpna5tog_g/33/c33whvclvjiosqjm2uamnjzooutjccj2sxcdh66y6lxcjtjcdm4e.py", line 62, in <module>
2025-12-04T12:15:04.9231686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.9231820Z     kernel.precompile(
2025-12-04T12:15:04.9232382Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.9232525Z     self._precompile_worker()
2025-12-04T12:15:04.9233133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.9233321Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.9233940Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9234151Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9234629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9234895Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9235347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9235723Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9235957Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9236508Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9236639Z ^
2025-12-04T12:15:04.9237107Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9237113Z 
2025-12-04T12:15:04.9237882Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9237978Z 
2025-12-04T12:15:04.9237984Z 
2025-12-04T12:15:04.9238227Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9238894Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:04.9238917Z 
2025-12-04T12:15:04.9239192Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9239422Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9239555Z frames [('total', 1)]
2025-12-04T12:15:04.9239707Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9239955Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.9240231Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9240344Z graph_break []
2025-12-04T12:15:04.9240610Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9240816Z frames [('total', 1)]
2025-12-04T12:15:04.9240966Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9241302Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9246121Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.9246268Z graph_break []
2025-12-04T12:15:04.9246440Z =================================== FAILURES ===================================
2025-12-04T12:15:04.9246763Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _
2025-12-04T12:15:04.9246899Z Traceback (most recent call last):
2025-12-04T12:15:04.9247381Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.9247627Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.9248115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.9248365Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.9248888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.9249081Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.9249597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.9249742Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.9250272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.9250605Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.9251172Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.9251329Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.9251811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.9251951Z     return self._compile_to_module()
2025-12-04T12:15:04.9252455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.9252614Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.9253177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.9253322Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.9253815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.9254050Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.9254664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.9254823Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.9255332Z   File "/tmp/tmpw4uqtrom/ln/clnj7ur7ftz7hq4rsfhr7vbecbpxqqlzgyyhrmyesywx5pk4ngsu.py", line 62, in <module>
2025-12-04T12:15:04.9255795Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.9276557Z     kernel.precompile(
2025-12-04T12:15:04.9277215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.9277462Z     self._precompile_worker()
2025-12-04T12:15:04.9278111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.9278295Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.9278892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9279141Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9279595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9279843Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9280281Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9280615Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9280848Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9281358Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9281455Z ^
2025-12-04T12:15:04.9281908Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9281915Z 
2025-12-04T12:15:04.9282631Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9282638Z 
2025-12-04T12:15:04.9282653Z 
2025-12-04T12:15:04.9282870Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9283504Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:04.9283512Z 
2025-12-04T12:15:04.9283788Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9284011Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9284120Z frames [('total', 1)]
2025-12-04T12:15:04.9284244Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9284479Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.9284710Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9284808Z graph_break []
2025-12-04T12:15:04.9285025Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9285135Z frames [('total', 1)]
2025-12-04T12:15:04.9285250Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9285596Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9285845Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.9285942Z graph_break []
2025-12-04T12:15:04.9286159Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9286276Z frames [('total', 1)]
2025-12-04T12:15:04.9286392Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9286674Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9286905Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.9287053Z graph_break []
2025-12-04T12:15:04.9287725Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-47dd8058babbbd0d.xml -
2025-12-04T12:15:04.9287898Z =========================== short test summary info ============================
2025-12-04T12:15:04.9288684Z FAILED [0.3581s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9289246Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9289335Z ^
2025-12-04T12:15:04.9289808Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9289817Z 
2025-12-04T12:15:04.9290527Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9290533Z 
2025-12-04T12:15:04.9290538Z 
2025-12-04T12:15:04.9290756Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9291396Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:04.9291415Z 
2025-12-04T12:15:04.9291688Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9291871Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:04.9292083Z =================== 1 failed, 1 deselected, 2 rerun in 4.08s ===================
2025-12-04T12:15:04.9292188Z Got exit code 1
2025-12-04T12:15:04.9292297Z Retrying single test...
2025-12-04T12:15:04.9292782Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e92e228ccdafe934.xml
2025-12-04T12:15:04.9292950Z ============================= test session starts ==============================
2025-12-04T12:15:04.9293317Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:04.9293430Z cachedir: .pytest_cache
2025-12-04T12:15:04.9293952Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:04.9294092Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:04.9294204Z configfile: pytest.ini
2025-12-04T12:15:04.9294794Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:04.9295026Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:04.9295737Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:04.9295870Z Running 1 items in this shard
2025-12-04T12:15:04.9295875Z 
2025-12-04T12:15:04.9297221Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:04.9298200Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9298639Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:04.9299115Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:04.9299688Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:04.9300151Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:04.9300701Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:04.9301238Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.9301854Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:04.9302452Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:04.9303007Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:04.9303463Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:04.9303983Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:04.9304455Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:04.9304931Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:04.9305380Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:04.9306556Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:04.9307082Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.9307627Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:04.9308136Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:04.9308721Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:04.9309304Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:04.9309934Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:04.9310452Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T12:15:04.9310916Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:04.9311435Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:04.9312020Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:04.9312457Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:04.9313077Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:04.9313642Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:04.9314353Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T12:15:04.9315077Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T12:15:04.9315474Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.9317572Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.9318109Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.9319162Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9319791Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9320704Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9321386Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9322274Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9323060Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9323668Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.9324640Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9325005Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.9325955Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9326096Z ('RERUN', {'yellow': True}) [3.3507s] [100%]
2025-12-04T12:15:04.9327332Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:04.9328356Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9328792Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:04.9329248Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:04.9329800Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:04.9330271Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:04.9330804Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:04.9331346Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.9331941Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:04.9332530Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:04.9333096Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:04.9333536Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:04.9334053Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:04.9334538Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:04.9334996Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:04.9335459Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:04.9336107Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:04.9336699Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.9337263Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:04.9337769Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:04.9338362Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:04.9338969Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:04.9339596Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:04.9340120Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T12:15:04.9340624Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:04.9341110Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:04.9341687Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:04.9342130Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:04.9342720Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:04.9343286Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:04.9344001Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T12:15:04.9344708Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T12:15:04.9345085Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.9347179Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.9347730Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.9348776Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9349404Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9350312Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9350990Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9351891Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9352690Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9353312Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.9354266Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9354676Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.9355599Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9355739Z ('RERUN', {'yellow': True}) [0.3559s] [100%]
2025-12-04T12:15:04.9356990Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:04.9357985Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9358436Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:04.9358884Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:04.9359402Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:04.9359883Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:04.9360419Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:04.9360973Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.9361563Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:04.9362148Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:04.9362723Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:04.9363165Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:04.9363703Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:04.9364178Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:04.9364652Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:04.9365102Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:04.9365747Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:04.9366287Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.9366867Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:04.9367386Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:04.9367969Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:04.9368598Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:04.9369240Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:04.9369746Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T12:15:04.9370226Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:04.9370705Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:04.9371468Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:04.9371924Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:04.9372503Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:04.9373053Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:04.9373763Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T12:15:04.9374485Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T12:15:04.9374853Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.9377009Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.9377567Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.9378609Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9379261Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9380154Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9380934Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9381820Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9382653Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9383311Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.9384266Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9384689Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.9385580Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9385704Z FAILED [0.3694s] [100%]
2025-12-04T12:15:04.9385711Z 
2025-12-04T12:15:04.9385860Z ==================================== RERUNS ====================================
2025-12-04T12:15:04.9386186Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _
2025-12-04T12:15:04.9386326Z Traceback (most recent call last):
2025-12-04T12:15:04.9386782Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.9387042Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.9387533Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.9387786Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.9388315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.9388512Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.9389035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.9389193Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.9389726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.9390063Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.9390586Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.9390737Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.9391229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.9391353Z     return self._compile_to_module()
2025-12-04T12:15:04.9391850Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.9392020Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.9392540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.9392686Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.9393219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.9393473Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.9394063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.9394191Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.9395072Z   File "/tmp/tmpd5orxukv/z7/cz7lutm3es2lyz6khdiqs5qmbvwebokimmwds4jh3wrg7aysnpl2.py", line 62, in <module>
2025-12-04T12:15:04.9395588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.9395732Z     kernel.precompile(
2025-12-04T12:15:04.9396301Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.9396420Z     self._precompile_worker()
2025-12-04T12:15:04.9397030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.9397242Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.9397835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9398047Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9398500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9398760Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9399203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9399538Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9399777Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9400293Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9400543Z ^
2025-12-04T12:15:04.9401025Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9401031Z 
2025-12-04T12:15:04.9401748Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9401758Z 
2025-12-04T12:15:04.9401762Z 
2025-12-04T12:15:04.9402001Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9402639Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:04.9402644Z 
2025-12-04T12:15:04.9402929Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9403159Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9403269Z frames [('total', 1)]
2025-12-04T12:15:04.9403405Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9403644Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.9403867Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9403992Z graph_break []
2025-12-04T12:15:04.9404315Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _
2025-12-04T12:15:04.9404453Z Traceback (most recent call last):
2025-12-04T12:15:04.9404912Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.9405156Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.9405661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.9405961Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.9406488Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.9406681Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.9407190Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.9407389Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.9407983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.9408306Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.9408838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.9408989Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.9409514Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.9409636Z     return self._compile_to_module()
2025-12-04T12:15:04.9410117Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.9410298Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.9410819Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.9410963Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.9411464Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.9411696Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.9412306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.9412436Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.9412952Z   File "/tmp/tmpabpepxim/mc/cmcnqgrmruxpwu2wl7ubmrpoprbrm5zwt5kdzvxqhasamksxzecz.py", line 62, in <module>
2025-12-04T12:15:04.9413425Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.9413539Z     kernel.precompile(
2025-12-04T12:15:04.9414107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.9414225Z     self._precompile_worker()
2025-12-04T12:15:04.9414820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.9415011Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.9415608Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9415822Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9416274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9416627Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9417082Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9417417Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9417645Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9418175Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9418308Z ^
2025-12-04T12:15:04.9418782Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9418790Z 
2025-12-04T12:15:04.9419501Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9419541Z 
2025-12-04T12:15:04.9419545Z 
2025-12-04T12:15:04.9419775Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9420438Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:04.9420445Z 
2025-12-04T12:15:04.9420712Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9420950Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9421057Z frames [('total', 1)]
2025-12-04T12:15:04.9421177Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9421463Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.9421685Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9421799Z graph_break []
2025-12-04T12:15:04.9422022Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9422133Z frames [('total', 1)]
2025-12-04T12:15:04.9422267Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9422485Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9422721Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.9422836Z graph_break []
2025-12-04T12:15:04.9422985Z =================================== FAILURES ===================================
2025-12-04T12:15:04.9423320Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _
2025-12-04T12:15:04.9423450Z Traceback (most recent call last):
2025-12-04T12:15:04.9423912Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.9424172Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.9424664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.9424916Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.9425445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.9425641Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.9426166Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.9426313Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.9426853Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.9427188Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.9427708Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.9427870Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.9428347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.9428473Z     return self._compile_to_module()
2025-12-04T12:15:04.9428970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.9429135Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.9429689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.9429837Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.9430334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.9430576Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.9431194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.9431321Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.9431854Z   File "/tmp/tmpv_nezct4/ru/cruzlhbig75v7zez2elvcwr3kmbw4uczxlz5wfag4aohz4fymkq6.py", line 62, in <module>
2025-12-04T12:15:04.9432319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.9432446Z     kernel.precompile(
2025-12-04T12:15:04.9433007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.9433156Z     self._precompile_worker()
2025-12-04T12:15:04.9433767Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.9433947Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.9434546Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9434762Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9435217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9435478Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9435923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9436257Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9436503Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9437021Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9437132Z ^
2025-12-04T12:15:04.9437592Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9437598Z 
2025-12-04T12:15:04.9438310Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9438317Z 
2025-12-04T12:15:04.9438336Z 
2025-12-04T12:15:04.9438557Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9439190Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:04.9439198Z 
2025-12-04T12:15:04.9439485Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9439709Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9439818Z frames [('total', 1)]
2025-12-04T12:15:04.9439953Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9440194Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.9440432Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9440536Z graph_break []
2025-12-04T12:15:04.9440758Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9440883Z frames [('total', 1)]
2025-12-04T12:15:04.9441002Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9441258Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9441513Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.9441616Z graph_break []
2025-12-04T12:15:04.9441833Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9441953Z frames [('total', 1)]
2025-12-04T12:15:04.9442070Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9442335Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9442570Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.9442671Z graph_break []
2025-12-04T12:15:04.9443374Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e92e228ccdafe934.xml -
2025-12-04T12:15:04.9443552Z =========================== short test summary info ============================
2025-12-04T12:15:04.9444331Z FAILED [0.3694s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9444889Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9444981Z ^
2025-12-04T12:15:04.9445449Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9445458Z 
2025-12-04T12:15:04.9446168Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9446174Z 
2025-12-04T12:15:04.9446179Z 
2025-12-04T12:15:04.9446411Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9447040Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:04.9447048Z 
2025-12-04T12:15:04.9447314Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9447511Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:04.9447714Z ================== 1 failed, 187 deselected, 2 rerun in 4.12s ==================
2025-12-04T12:15:04.9447833Z Got exit code 1
2025-12-04T12:15:04.9447943Z Retrying single test...
2025-12-04T12:15:04.9448418Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0328fb4bc2fb022d.xml
2025-12-04T12:15:04.9448594Z ============================= test session starts ==============================
2025-12-04T12:15:04.9448947Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:04.9449056Z cachedir: .pytest_cache
2025-12-04T12:15:04.9449588Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:04.9449714Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:04.9449840Z configfile: pytest.ini
2025-12-04T12:15:04.9450429Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:04.9450651Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:04.9451378Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:04.9451497Z Running 1 items in this shard
2025-12-04T12:15:04.9451503Z 
2025-12-04T12:15:04.9452750Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:04.9453749Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9454187Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:04.9454698Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:04.9455289Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:04.9455769Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:04.9456397Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:04.9456943Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.9457582Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:04.9458166Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:04.9458737Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:04.9459182Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:04.9459709Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:04.9460186Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:04.9460644Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:04.9461106Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:04.9461756Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:04.9462288Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.9462836Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:04.9463338Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:04.9463931Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:04.9464501Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:04.9465141Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:04.9465646Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T12:15:04.9466115Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:04.9466597Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:04.9467172Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:04.9467621Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:04.9468230Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:04.9468790Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:04.9469508Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T12:15:04.9470215Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T12:15:04.9470618Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.9472901Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.9473460Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.9474507Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9475152Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9476051Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9476747Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9477632Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9478400Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9479020Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.9479971Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9480348Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.9481324Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9481463Z ('RERUN', {'yellow': True}) [3.3342s] [100%]
2025-12-04T12:15:04.9482719Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:04.9483747Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9484194Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:04.9484643Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:04.9485223Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:04.9485683Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:04.9486217Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:04.9486770Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.9487349Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:04.9487947Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:04.9488506Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:04.9488944Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:04.9489479Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:04.9489953Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:04.9490430Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:04.9490879Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:04.9491522Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:04.9492059Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.9492605Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:04.9493116Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:04.9493700Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:04.9494325Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:04.9494952Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:04.9495459Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T12:15:04.9495971Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:04.9496510Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:04.9497095Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:04.9497533Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:04.9498109Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:04.9498684Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:04.9499388Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T12:15:04.9500103Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T12:15:04.9500467Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.9502576Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.9503116Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.9504167Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9504799Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9505692Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9506383Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9507269Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9508052Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9508712Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.9509688Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9510084Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.9511003Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9511160Z ('RERUN', {'yellow': True}) [0.3472s] [100%]
2025-12-04T12:15:04.9512393Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:04.9513385Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9513821Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:04.9514282Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:04.9514805Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:04.9515266Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:04.9515813Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:04.9516356Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.9516956Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:04.9517542Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:04.9518097Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:04.9518553Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:04.9519068Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:04.9519552Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:04.9520010Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:04.9520455Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:04.9521113Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:04.9521629Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.9522212Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:04.9522716Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:04.9523295Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:04.9523906Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:04.9524562Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:04.9525079Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T12:15:04.9525549Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:04.9526033Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:04.9526599Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:04.9527040Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:04.9527630Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:04.9528164Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:04.9528884Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T12:15:04.9529590Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T12:15:04.9529953Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.9532047Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.9532590Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.9533655Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9534290Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9535197Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9535914Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9536874Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9537646Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9538354Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.9539311Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9539678Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.9540615Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9540727Z FAILED [0.3454s] [100%]
2025-12-04T12:15:04.9540734Z 
2025-12-04T12:15:04.9540899Z ==================================== RERUNS ====================================
2025-12-04T12:15:04.9541223Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _
2025-12-04T12:15:04.9541351Z Traceback (most recent call last):
2025-12-04T12:15:04.9541824Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.9542069Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.9542575Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.9542831Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.9543345Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.9543559Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.9544069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.9544221Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.9544773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.9545095Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.9545634Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.9545789Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.9546275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.9546419Z     return self._compile_to_module()
2025-12-04T12:15:04.9546910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.9547090Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.9547610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.9547744Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.9548250Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.9548523Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.9549114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.9549259Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.9549755Z   File "/tmp/tmpu3owmxje/b4/cb4w36z67ouxaq5j3vsw7lsoxkldnbjlxjhks5bmuhdl7hhmegzf.py", line 62, in <module>
2025-12-04T12:15:04.9550264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.9550406Z     kernel.precompile(
2025-12-04T12:15:04.9550966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.9551095Z     self._precompile_worker()
2025-12-04T12:15:04.9551695Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.9551892Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.9552532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9552731Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9553193Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9553440Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9553886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9554231Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9554457Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9554984Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9555076Z ^
2025-12-04T12:15:04.9555530Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9555536Z 
2025-12-04T12:15:04.9556254Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9556263Z 
2025-12-04T12:15:04.9556268Z 
2025-12-04T12:15:04.9556486Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9557131Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:04.9557137Z 
2025-12-04T12:15:04.9557404Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9557647Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9557756Z frames [('total', 1)]
2025-12-04T12:15:04.9557874Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9558125Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.9558349Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9558454Z graph_break []
2025-12-04T12:15:04.9558790Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _
2025-12-04T12:15:04.9558913Z Traceback (most recent call last):
2025-12-04T12:15:04.9559373Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.9559630Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.9560124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.9560422Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.9560940Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.9561132Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.9561658Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.9561838Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.9562416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.9562744Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.9563264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.9563429Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.9563937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.9564072Z     return self._compile_to_module()
2025-12-04T12:15:04.9564555Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.9564721Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.9565251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.9565385Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.9565884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.9566130Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.9566713Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.9566857Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.9567358Z   File "/tmp/tmp9650aj7v/xq/cxqp5ybp2bbis5lrrx2of4wejt4azewfckjsqz7oyanqdwexzxrv.py", line 62, in <module>
2025-12-04T12:15:04.9567820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.9567951Z     kernel.precompile(
2025-12-04T12:15:04.9568506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.9568627Z     self._precompile_worker()
2025-12-04T12:15:04.9569236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.9569417Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.9570027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9570227Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9570676Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9571124Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9571578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9571930Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9572159Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9572676Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9572782Z ^
2025-12-04T12:15:04.9573332Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9573341Z 
2025-12-04T12:15:04.9574065Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9574112Z 
2025-12-04T12:15:04.9574117Z 
2025-12-04T12:15:04.9574337Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9575012Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:04.9575034Z 
2025-12-04T12:15:04.9575309Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9575533Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9575653Z frames [('total', 1)]
2025-12-04T12:15:04.9575773Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9576055Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.9576350Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9576455Z graph_break []
2025-12-04T12:15:04.9576676Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9576800Z frames [('total', 1)]
2025-12-04T12:15:04.9576917Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9577151Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9577390Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.9577490Z graph_break []
2025-12-04T12:15:04.9577652Z =================================== FAILURES ===================================
2025-12-04T12:15:04.9577977Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _
2025-12-04T12:15:04.9578103Z Traceback (most recent call last):
2025-12-04T12:15:04.9578572Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.9578817Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.9579319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.9579571Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.9580087Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.9580290Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.9580801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.9580950Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.9581498Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.9581822Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.9582350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.9582501Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.9582983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.9583121Z     return self._compile_to_module()
2025-12-04T12:15:04.9583604Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.9583780Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.9584335Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.9584471Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.9584985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.9585214Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.9585812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.9585969Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.9586499Z   File "/tmp/tmpzgybdau9/rz/crzmctqfkn4vp54aytyfgk3qgfe4ha3prv52bltqg5w73znfs6gd.py", line 62, in <module>
2025-12-04T12:15:04.9586980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.9587092Z     kernel.precompile(
2025-12-04T12:15:04.9587651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.9587857Z     self._precompile_worker()
2025-12-04T12:15:04.9588452Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.9588647Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.9589248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9589446Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9589914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9590160Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9590618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9590954Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9591182Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9592219Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9592315Z ^
2025-12-04T12:15:04.9592775Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9592797Z 
2025-12-04T12:15:04.9593509Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9593516Z 
2025-12-04T12:15:04.9593521Z 
2025-12-04T12:15:04.9593738Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9594383Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:04.9594391Z 
2025-12-04T12:15:04.9594660Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9594894Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9595002Z frames [('total', 1)]
2025-12-04T12:15:04.9595123Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9595375Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.9595598Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9595699Z graph_break []
2025-12-04T12:15:04.9595930Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9596049Z frames [('total', 1)]
2025-12-04T12:15:04.9596180Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9596452Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9596691Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.9596811Z graph_break []
2025-12-04T12:15:04.9597027Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9597133Z frames [('total', 1)]
2025-12-04T12:15:04.9597263Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9597482Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9597748Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:04.9597866Z graph_break []
2025-12-04T12:15:04.9598565Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0328fb4bc2fb022d.xml -
2025-12-04T12:15:04.9598759Z =========================== short test summary info ============================
2025-12-04T12:15:04.9599533Z FAILED [0.3454s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9600081Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9600190Z ^
2025-12-04T12:15:04.9600650Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9600659Z 
2025-12-04T12:15:04.9601385Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9601391Z 
2025-12-04T12:15:04.9601396Z 
2025-12-04T12:15:04.9601617Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9602247Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:04.9602269Z 
2025-12-04T12:15:04.9602542Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9602727Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:04.9602947Z ================== 1 failed, 187 deselected, 2 rerun in 4.07s ==================
2025-12-04T12:15:04.9603051Z Got exit code 1
2025-12-04T12:15:04.9603615Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:04.9604152Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:04.9604628Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4ecceae3d20d3515.xml
2025-12-04T12:15:04.9604812Z ============================= test session starts ==============================
2025-12-04T12:15:04.9605171Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:04.9605283Z cachedir: .pytest_cache
2025-12-04T12:15:04.9605822Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:04.9605952Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:04.9606063Z configfile: pytest.ini
2025-12-04T12:15:04.9606665Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:04.9606894Z collecting ... collected 188 items / 2 deselected / 186 selected
2025-12-04T12:15:04.9607055Z stepcurrent: skipping 2 already run items.
2025-12-04T12:15:04.9607172Z Running 186 items in this shard
2025-12-04T12:15:04.9607178Z 
2025-12-04T12:15:04.9608384Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T12:15:04.9609202Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9609652Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 40960
2025-12-04T12:15:04.9610243Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.9610837Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:04.9611415Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:04.9611853Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:04.9612472Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T12:15:04.9613009Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.9613560Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T12:15:04.9614083Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:04.9614550Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T12:15:04.9614992Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T12:15:04.9615570Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T12:15:04.9616006Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T12:15:04.9616651Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T12:15:04.9617184Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T12:15:04.9617725Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T12:15:04.9618100Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.9620009Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.9620562Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.9621606Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9622298Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9623189Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9623910Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9624829Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9625600Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9626223Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.9627057Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9627437Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.9628327Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
﻿2025-12-04T12:15:04.9631617Z ('RERUN', {'yellow': True}) [3.7007s] [  0%]
2025-12-04T12:15:04.9632817Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T12:15:04.9633963Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9634433Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 40960
2025-12-04T12:15:04.9634981Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.9635558Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:04.9636164Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:04.9636595Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:04.9637201Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T12:15:04.9637720Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.9638272Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T12:15:04.9638794Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:04.9639260Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T12:15:04.9639790Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T12:15:04.9640361Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T12:15:04.9640798Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T12:15:04.9641416Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T12:15:04.9641973Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T12:15:04.9642531Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T12:15:04.9642897Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.9644808Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.9645353Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.9646415Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9647114Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9648008Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9648706Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9649591Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9650384Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9650996Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.9651805Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9652176Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.9653075Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9653231Z ('RERUN', {'yellow': True}) [0.5517s] [  0%]
2025-12-04T12:15:04.9654429Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T12:15:04.9655242Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9655717Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 40960
2025-12-04T12:15:04.9656369Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.9656949Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:04.9657519Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:04.9657969Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:04.9658556Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T12:15:04.9659096Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.9659645Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T12:15:04.9660149Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:04.9660687Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T12:15:04.9661128Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T12:15:04.9661701Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T12:15:04.9662138Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T12:15:04.9662707Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T12:15:04.9663245Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T12:15:04.9663789Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T12:15:04.9664165Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.9666070Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.9666621Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.9667778Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9668409Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9669313Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9670051Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9671122Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9671898Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9672524Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.9673327Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9673713Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.9674608Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9674789Z FAILED [0.5510s] [  0%]
2025-12-04T12:15:04.9674799Z 
2025-12-04T12:15:04.9674960Z ==================================== RERUNS ====================================
2025-12-04T12:15:04.9675284Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _
2025-12-04T12:15:04.9675411Z Traceback (most recent call last):
2025-12-04T12:15:04.9675883Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.9676132Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.9676638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.9676888Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.9677404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.9677613Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.9678125Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.9678288Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.9678821Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.9679143Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.9679678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.9679828Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.9680322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.9680449Z     return self._compile_to_module()
2025-12-04T12:15:04.9680994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.9681178Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.9681694Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.9681824Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.9682379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.9682654Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.9683255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.9683386Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.9683900Z   File "/tmp/tmp1plybcp3/ky/cky7khmdr5lyfsuub4j6geabdqllkm256nn3louctma6b7lduzvd.py", line 163, in <module>
2025-12-04T12:15:04.9684376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.9684488Z     kernel.precompile(
2025-12-04T12:15:04.9685059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.9685181Z     self._precompile_worker()
2025-12-04T12:15:04.9685779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.9685977Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.9686571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9686813Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9687274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9687522Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9687977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9688313Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9688540Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9688913Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9689003Z ^
2025-12-04T12:15:04.9689462Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9689482Z 
2025-12-04T12:15:04.9690194Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9690200Z 
2025-12-04T12:15:04.9690205Z 
2025-12-04T12:15:04.9690422Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9691078Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:04.9691086Z 
2025-12-04T12:15:04.9691354Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9691593Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9691701Z frames [('total', 1)]
2025-12-04T12:15:04.9691821Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9692074Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:04.9692299Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9692402Z graph_break []
2025-12-04T12:15:04.9692773Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _
2025-12-04T12:15:04.9692902Z Traceback (most recent call last):
2025-12-04T12:15:04.9693372Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.9693615Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.9694104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.9694402Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.9694943Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.9695152Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.9695667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.9695821Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.9696440Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.9696765Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.9697287Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.9697458Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.9697939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.9698077Z     return self._compile_to_module()
2025-12-04T12:15:04.9698566Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.9698791Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.9699328Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.9699463Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.9699975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.9700211Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.9700798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.9700945Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.9701454Z   File "/tmp/tmpod28e3ow/hk/chkrruefcnej6yo2sl3hrqyh426dvgtouxusocl6uumvojl6nidp.py", line 163, in <module>
2025-12-04T12:15:04.9701921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.9702049Z     kernel.precompile(
2025-12-04T12:15:04.9702605Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.9702740Z     self._precompile_worker()
2025-12-04T12:15:04.9703332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.9703514Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.9704129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9704325Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9704787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9705037Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9705515Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9705868Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9706098Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9706458Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9706590Z ^
2025-12-04T12:15:04.9707051Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9707084Z 
2025-12-04T12:15:04.9707810Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9707819Z 
2025-12-04T12:15:04.9707823Z 
2025-12-04T12:15:04.9708043Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9708691Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:04.9708697Z 
2025-12-04T12:15:04.9708968Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9709192Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9709314Z frames [('total', 1)]
2025-12-04T12:15:04.9709431Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9709668Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:04.9709905Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9710005Z graph_break []
2025-12-04T12:15:04.9710237Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9710379Z frames [('total', 1)]
2025-12-04T12:15:04.9710494Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9710730Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9710963Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:04.9711063Z graph_break []
2025-12-04T12:15:04.9711222Z =================================== FAILURES ===================================
2025-12-04T12:15:04.9711543Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _
2025-12-04T12:15:04.9711683Z Traceback (most recent call last):
2025-12-04T12:15:04.9712143Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.9712385Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.9712889Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.9713138Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.9713656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.9713866Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.9714376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.9714543Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.9715074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.9715394Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.9715927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.9716078Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.9716603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.9716728Z     return self._compile_to_module()
2025-12-04T12:15:04.9717213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.9717387Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.9717934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.9718063Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.9718604Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.9718835Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.9719437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.9719565Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.9720075Z   File "/tmp/tmp241kvq85/w5/cw56acwuuf44v55zghnxv2hmw7hpmbeclykkqwvt47zyidmyzlxl.py", line 163, in <module>
2025-12-04T12:15:04.9720546Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.9720661Z     kernel.precompile(
2025-12-04T12:15:04.9721227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.9721346Z     self._precompile_worker()
2025-12-04T12:15:04.9721942Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.9722171Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.9722765Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9722969Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9723435Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9723679Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9724137Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9724471Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9724701Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9725076Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9725169Z ^
2025-12-04T12:15:04.9725643Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9725649Z 
2025-12-04T12:15:04.9726357Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9726363Z 
2025-12-04T12:15:04.9726368Z 
2025-12-04T12:15:04.9726586Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9727238Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:04.9727244Z 
2025-12-04T12:15:04.9727519Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9727752Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9727862Z frames [('total', 1)]
2025-12-04T12:15:04.9727978Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9728262Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:04.9728486Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9728587Z graph_break []
2025-12-04T12:15:04.9728821Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9728927Z frames [('total', 1)]
2025-12-04T12:15:04.9729055Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9729303Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9729535Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:04.9729648Z graph_break []
2025-12-04T12:15:04.9729906Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9730011Z frames [('total', 1)]
2025-12-04T12:15:04.9730142Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9730360Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9730605Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:04.9730706Z graph_break []
2025-12-04T12:15:04.9731360Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4ecceae3d20d3515.xml -
2025-12-04T12:15:04.9731547Z =========================== short test summary info ============================
2025-12-04T12:15:04.9732326Z FAILED [0.5510s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9732690Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9732795Z ^
2025-12-04T12:15:04.9733252Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9733418Z 
2025-12-04T12:15:04.9734143Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9734149Z 
2025-12-04T12:15:04.9734154Z 
2025-12-04T12:15:04.9734371Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9735027Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:04.9735035Z 
2025-12-04T12:15:04.9735305Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9735490Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:04.9735704Z =================== 1 failed, 2 deselected, 2 rerun in 4.85s ===================
2025-12-04T12:15:04.9735805Z Got exit code 1
2025-12-04T12:15:04.9735917Z Retrying single test...
2025-12-04T12:15:04.9736490Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-af3f0411f43ffff1.xml
2025-12-04T12:15:04.9736658Z ============================= test session starts ==============================
2025-12-04T12:15:04.9737024Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:04.9737137Z cachedir: .pytest_cache
2025-12-04T12:15:04.9737657Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:04.9737800Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:04.9737910Z configfile: pytest.ini
2025-12-04T12:15:04.9738504Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:04.9738741Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:04.9739460Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:04.9739630Z Running 1 items in this shard
2025-12-04T12:15:04.9739637Z 
2025-12-04T12:15:04.9740800Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T12:15:04.9741643Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9742143Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 40960
2025-12-04T12:15:04.9742684Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.9743258Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:04.9743817Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:04.9744262Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:04.9744857Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T12:15:04.9745378Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.9745940Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T12:15:04.9746484Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:04.9746961Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T12:15:04.9747401Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T12:15:04.9747959Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T12:15:04.9748413Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T12:15:04.9748977Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T12:15:04.9749515Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T12:15:04.9750059Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T12:15:04.9750432Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.9752342Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.9753358Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.9754411Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9755043Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9756008Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9756690Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9757589Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9758503Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9759123Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.9759926Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9760290Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.9761248Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9761387Z ('RERUN', {'yellow': True}) [3.6849s] [100%]
2025-12-04T12:15:04.9762563Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T12:15:04.9763362Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9763823Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 40960
2025-12-04T12:15:04.9764366Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.9764931Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:04.9765513Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:04.9765946Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:04.9766558Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T12:15:04.9767081Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.9767630Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T12:15:04.9768187Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:04.9768655Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T12:15:04.9769107Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T12:15:04.9769706Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T12:15:04.9770180Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T12:15:04.9770761Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T12:15:04.9771880Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T12:15:04.9772439Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T12:15:04.9772804Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.9774725Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.9775357Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.9776457Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9777102Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9778007Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9778702Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9779589Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9780375Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9780984Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.9781803Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9782176Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.9783129Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9783279Z ('RERUN', {'yellow': True}) [0.5445s] [100%]
2025-12-04T12:15:04.9784436Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T12:15:04.9785326Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9785777Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 40960
2025-12-04T12:15:04.9786323Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.9786898Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:04.9787458Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:04.9787910Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:04.9788503Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T12:15:04.9789041Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.9789633Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T12:15:04.9790142Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:04.9790622Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T12:15:04.9791063Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T12:15:04.9791644Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T12:15:04.9792079Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T12:15:04.9792646Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T12:15:04.9793182Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T12:15:04.9793722Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T12:15:04.9794093Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.9796031Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.9796584Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.9797625Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9798299Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9799234Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9799920Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9800812Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9801580Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9802204Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.9802997Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9803400Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.9804307Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9804414Z FAILED [0.5454s] [100%]
2025-12-04T12:15:04.9804420Z 
2025-12-04T12:15:04.9804580Z ==================================== RERUNS ====================================
2025-12-04T12:15:04.9804908Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _
2025-12-04T12:15:04.9805035Z Traceback (most recent call last):
2025-12-04T12:15:04.9805508Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.9805751Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.9806259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.9806512Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.9807028Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.9807237Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.9807744Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.9807911Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.9808445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.9808764Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.9809297Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.9809480Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.9809962Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.9810100Z     return self._compile_to_module()
2025-12-04T12:15:04.9810585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.9810793Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.9811340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.9811472Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.9811980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.9812214Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.9812814Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.9812940Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.9813448Z   File "/tmp/tmpzaf0e86o/x7/cx7uih5uzm2uphxtckruuoybmwgam52us2bcsodffye5bwzycvje.py", line 163, in <module>
2025-12-04T12:15:04.9813924Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.9814041Z     kernel.precompile(
2025-12-04T12:15:04.9814593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.9814726Z     self._precompile_worker()
2025-12-04T12:15:04.9815323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.9815561Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.9816157Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9816418Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9816892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9817146Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9817605Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9817942Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9818172Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9818546Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9818639Z ^
2025-12-04T12:15:04.9819100Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9819122Z 
2025-12-04T12:15:04.9819837Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9819844Z 
2025-12-04T12:15:04.9819851Z 
2025-12-04T12:15:04.9820070Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9820725Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:04.9820731Z 
2025-12-04T12:15:04.9820999Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9821236Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9821342Z frames [('total', 1)]
2025-12-04T12:15:04.9821507Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9821759Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:04.9821980Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9822080Z graph_break []
2025-12-04T12:15:04.9822414Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _
2025-12-04T12:15:04.9822569Z Traceback (most recent call last):
2025-12-04T12:15:04.9823037Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.9823312Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.9823803Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.9824068Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.9824587Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.9824782Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.9825305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.9825452Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.9825997Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.9826318Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.9826835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.9827038Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.9827521Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.9827657Z     return self._compile_to_module()
2025-12-04T12:15:04.9828140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.9828302Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.9828831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.9828965Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.9829462Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.9829705Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.9830294Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.9830435Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.9830915Z   File "/tmp/tmpfax_frvz/57/c572mxpmqtrjjs5gfj4qmykfskskgjkosm4aanfpybly7so6743j.py", line 163, in <module>
2025-12-04T12:15:04.9831378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.9831502Z     kernel.precompile(
2025-12-04T12:15:04.9832054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.9832187Z     self._precompile_worker()
2025-12-04T12:15:04.9832780Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.9832960Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.9833565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9833799Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9834249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9834508Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9834952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9835337Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9835599Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9835958Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9836060Z ^
2025-12-04T12:15:04.9836524Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9836530Z 
2025-12-04T12:15:04.9837263Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9837268Z 
2025-12-04T12:15:04.9837273Z 
2025-12-04T12:15:04.9837490Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9838139Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:04.9838159Z 
2025-12-04T12:15:04.9838431Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9838654Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9838771Z frames [('total', 1)]
2025-12-04T12:15:04.9838926Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9839162Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:04.9839398Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9839499Z graph_break []
2025-12-04T12:15:04.9839731Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9839835Z frames [('total', 1)]
2025-12-04T12:15:04.9839949Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9840178Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9840413Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:04.9840511Z graph_break []
2025-12-04T12:15:04.9840679Z =================================== FAILURES ===================================
2025-12-04T12:15:04.9841001Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _
2025-12-04T12:15:04.9841125Z Traceback (most recent call last):
2025-12-04T12:15:04.9841596Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.9841841Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.9842340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.9842590Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.9843105Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.9843314Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.9843830Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.9843992Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.9844525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.9844902Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.9845439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.9845592Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.9846070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.9846237Z     return self._compile_to_module()
2025-12-04T12:15:04.9846722Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.9846934Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.9847451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.9847586Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.9848103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.9848336Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.9848938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.9849071Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.9849583Z   File "/tmp/tmpw53ectv7/7b/c7b5sfbk6rqmzdkpw5iowyahvgrhvy7avmmk2he2l6xosmewtmzx.py", line 163, in <module>
2025-12-04T12:15:04.9850063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.9850178Z     kernel.precompile(
2025-12-04T12:15:04.9850732Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.9850929Z     self._precompile_worker()
2025-12-04T12:15:04.9851530Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.9851724Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.9852322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9852521Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9852990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9853239Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9853699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9854039Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9854270Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9854648Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9854740Z ^
2025-12-04T12:15:04.9855196Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9855216Z 
2025-12-04T12:15:04.9855936Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9855944Z 
2025-12-04T12:15:04.9855948Z 
2025-12-04T12:15:04.9856166Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9856889Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:04.9856899Z 
2025-12-04T12:15:04.9857214Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9857450Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9857559Z frames [('total', 1)]
2025-12-04T12:15:04.9857678Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9857926Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:04.9858150Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9858282Z graph_break []
2025-12-04T12:15:04.9858517Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9858653Z frames [('total', 1)]
2025-12-04T12:15:04.9858785Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9859006Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9859242Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:04.9859354Z graph_break []
2025-12-04T12:15:04.9859575Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9859679Z frames [('total', 1)]
2025-12-04T12:15:04.9859807Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9860026Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9860257Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:04.9860367Z graph_break []
2025-12-04T12:15:04.9861023Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-af3f0411f43ffff1.xml -
2025-12-04T12:15:04.9861210Z =========================== short test summary info ============================
2025-12-04T12:15:04.9861986Z FAILED [0.5454s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9862459Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9862570Z ^
2025-12-04T12:15:04.9863028Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9863033Z 
2025-12-04T12:15:04.9863760Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9863768Z 
2025-12-04T12:15:04.9863773Z 
2025-12-04T12:15:04.9863990Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9864632Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:04.9864652Z 
2025-12-04T12:15:04.9864921Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9865106Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:04.9865326Z ================== 1 failed, 187 deselected, 2 rerun in 4.82s ==================
2025-12-04T12:15:04.9865428Z Got exit code 1
2025-12-04T12:15:04.9865537Z Retrying single test...
2025-12-04T12:15:04.9866025Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5f646abfecfc34db.xml
2025-12-04T12:15:04.9866191Z ============================= test session starts ==============================
2025-12-04T12:15:04.9866557Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:04.9866668Z cachedir: .pytest_cache
2025-12-04T12:15:04.9867188Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:04.9867332Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:04.9867444Z configfile: pytest.ini
2025-12-04T12:15:04.9868071Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:04.9868308Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:04.9869028Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:04.9869158Z Running 1 items in this shard
2025-12-04T12:15:04.9869193Z 
2025-12-04T12:15:04.9870383Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T12:15:04.9871397Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9871854Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 40960
2025-12-04T12:15:04.9872400Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.9872978Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:04.9873544Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:04.9873992Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:04.9874583Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T12:15:04.9875187Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.9875752Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T12:15:04.9876256Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:04.9876741Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T12:15:04.9877188Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T12:15:04.9877755Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T12:15:04.9878209Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T12:15:04.9878779Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T12:15:04.9879320Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T12:15:04.9879861Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T12:15:04.9880223Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.9882205Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.9882748Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.9883806Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9884528Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9885437Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9886119Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9887012Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9887789Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9888409Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.9889254Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9889618Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.9890521Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9890658Z ('RERUN', {'yellow': True}) [3.7187s] [100%]
2025-12-04T12:15:04.9891815Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T12:15:04.9892607Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9893058Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 40960
2025-12-04T12:15:04.9893609Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.9894164Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:04.9894743Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:04.9895178Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:04.9895781Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T12:15:04.9896404Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.9896958Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T12:15:04.9897481Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:04.9897982Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T12:15:04.9898474Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T12:15:04.9899046Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T12:15:04.9899489Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T12:15:04.9900067Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T12:15:04.9900591Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T12:15:04.9901148Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T12:15:04.9901509Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.9903428Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.9904012Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.9905051Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9905698Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9906592Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9907285Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9908162Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9908947Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9909552Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.9910385Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9910767Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.9911653Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9911832Z ('RERUN', {'yellow': True}) [0.5593s] [100%]
2025-12-04T12:15:04.9913011Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T12:15:04.9913824Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9914271Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 40960
2025-12-04T12:15:04.9914812Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:04.9915387Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:04.9915950Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:04.9916394Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:04.9917018Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T12:15:04.9917540Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:04.9918100Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T12:15:04.9918611Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:04.9919097Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T12:15:04.9919537Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T12:15:04.9920104Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T12:15:04.9920554Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T12:15:04.9921115Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T12:15:04.9921654Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T12:15:04.9922194Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T12:15:04.9922556Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:04.9924519Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:04.9925073Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:04.9926171Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9926802Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9927709Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9928384Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9929280Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9930058Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9930716Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:04.9931511Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9931875Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:04.9932776Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9932886Z FAILED [0.5497s] [100%]
2025-12-04T12:15:04.9932895Z 
2025-12-04T12:15:04.9933052Z ==================================== RERUNS ====================================
2025-12-04T12:15:04.9933376Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _
2025-12-04T12:15:04.9933504Z Traceback (most recent call last):
2025-12-04T12:15:04.9933975Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.9934219Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.9934736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.9934985Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.9935499Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.9935711Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.9936223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.9936438Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.9937036Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.9937361Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.9937897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.9938052Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.9938563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.9938703Z     return self._compile_to_module()
2025-12-04T12:15:04.9939249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.9939430Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.9939951Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.9940084Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.9940598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.9940835Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.9941424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.9941570Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.9942084Z   File "/tmp/tmp1exnzehg/if/cifkbiminc6b7mmuphmf7suncm7zmsuhxz37qefngp7frv7ocxzz.py", line 163, in <module>
2025-12-04T12:15:04.9942566Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.9942718Z     kernel.precompile(
2025-12-04T12:15:04.9943275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.9943417Z     self._precompile_worker()
2025-12-04T12:15:04.9944017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.9944213Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.9944810Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9945013Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9945484Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9945733Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9946177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9946532Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9946763Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9947139Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9947232Z ^
2025-12-04T12:15:04.9947690Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9947698Z 
2025-12-04T12:15:04.9948429Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9948436Z 
2025-12-04T12:15:04.9948441Z 
2025-12-04T12:15:04.9948659Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9949317Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:04.9949364Z 
2025-12-04T12:15:04.9949635Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9949875Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9949979Z frames [('total', 1)]
2025-12-04T12:15:04.9950099Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9950351Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:04.9950603Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9950703Z graph_break []
2025-12-04T12:15:04.9951071Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _
2025-12-04T12:15:04.9951198Z Traceback (most recent call last):
2025-12-04T12:15:04.9951650Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.9951907Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.9952398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.9952660Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.9953173Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.9953371Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.9953892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.9954039Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.9954583Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.9954934Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.9955456Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.9955616Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.9956096Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.9956233Z     return self._compile_to_module()
2025-12-04T12:15:04.9956721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.9956883Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.9957410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.9957543Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.9958040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.9958281Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.9958864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.9959004Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.9959514Z   File "/tmp/tmpwkf37tjm/md/cmdhorstx6yqakfmcbbzckz3wgrv7s3rff4a73j3f7w4g3lyk6ev.py", line 163, in <module>
2025-12-04T12:15:04.9959978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.9960103Z     kernel.precompile(
2025-12-04T12:15:04.9960658Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.9960791Z     self._precompile_worker()
2025-12-04T12:15:04.9961423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.9961605Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.9962213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9962410Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9962892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9963151Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9963627Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9963975Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9964202Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9964566Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9964669Z ^
2025-12-04T12:15:04.9965123Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9965128Z 
2025-12-04T12:15:04.9965853Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9965861Z 
2025-12-04T12:15:04.9965866Z 
2025-12-04T12:15:04.9966086Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9966724Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:04.9966786Z 
2025-12-04T12:15:04.9967058Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9967285Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9967404Z frames [('total', 1)]
2025-12-04T12:15:04.9967523Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9967759Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:04.9967995Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9968099Z graph_break []
2025-12-04T12:15:04.9968318Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9968440Z frames [('total', 1)]
2025-12-04T12:15:04.9968556Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9968789Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9969023Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:04.9969125Z graph_break []
2025-12-04T12:15:04.9975367Z =================================== FAILURES ===================================
2025-12-04T12:15:04.9975775Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _
2025-12-04T12:15:04.9975903Z Traceback (most recent call last):
2025-12-04T12:15:04.9976447Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:04.9976696Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:04.9977212Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:04.9977466Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:04.9977990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:04.9978194Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:04.9978849Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:04.9979001Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:04.9979544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:04.9979867Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:04.9980396Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:04.9980599Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:04.9981126Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:04.9981265Z     return self._compile_to_module()
2025-12-04T12:15:04.9981754Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:04.9981929Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:04.9982450Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:04.9982580Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:04.9983092Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:04.9983327Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:04.9983913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:04.9984056Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:04.9984565Z   File "/tmp/tmp8witzdji/m6/cm64mukunvqnv4ogmdbzdplkr5f2rm4tlozbcmimeunhijmt44ty.py", line 163, in <module>
2025-12-04T12:15:04.9985098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:04.9985218Z     kernel.precompile(
2025-12-04T12:15:04.9985769Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:04.9985908Z     self._precompile_worker()
2025-12-04T12:15:04.9986502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:04.9986696Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:04.9987293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:04.9987493Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:04.9987955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:04.9988205Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:04.9988649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:04.9988996Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:04.9989224Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9989593Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9989687Z ^
2025-12-04T12:15:04.9990145Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9990152Z 
2025-12-04T12:15:04.9990873Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9990882Z 
2025-12-04T12:15:04.9990887Z 
2025-12-04T12:15:04.9991137Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9991788Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:04.9991794Z 
2025-12-04T12:15:04.9992064Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9992296Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9992455Z frames [('total', 1)]
2025-12-04T12:15:04.9992572Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9992849Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:04.9993073Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9993172Z graph_break []
2025-12-04T12:15:04.9993405Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9993510Z frames [('total', 1)]
2025-12-04T12:15:04.9993629Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9993859Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9994088Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:04.9994205Z graph_break []
2025-12-04T12:15:04.9994420Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:04.9994520Z frames [('total', 1)]
2025-12-04T12:15:04.9994649Z stats [('calls_captured', 7)]
2025-12-04T12:15:04.9994866Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:04.9995096Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:04.9995205Z graph_break []
2025-12-04T12:15:04.9995867Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5f646abfecfc34db.xml -
2025-12-04T12:15:04.9996082Z =========================== short test summary info ============================
2025-12-04T12:15:04.9996861Z FAILED [0.5497s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:04.9997219Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:04.9997318Z ^
2025-12-04T12:15:04.9997774Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:04.9997782Z 
2025-12-04T12:15:04.9998500Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:04.9998506Z 
2025-12-04T12:15:04.9998511Z 
2025-12-04T12:15:04.9998728Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:04.9999372Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:04.9999378Z 
2025-12-04T12:15:04.9999656Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:04.9999835Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.0000051Z ================== 1 failed, 187 deselected, 2 rerun in 4.88s ==================
2025-12-04T12:15:05.0000154Z Got exit code 1
2025-12-04T12:15:05.0000707Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:05.0001130Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:05.0001593Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f71fa45f6063b14.xml
2025-12-04T12:15:05.0001770Z ============================= test session starts ==============================
2025-12-04T12:15:05.0002158Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.0002271Z cachedir: .pytest_cache
2025-12-04T12:15:05.0002798Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.0002921Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.0003031Z configfile: pytest.ini
2025-12-04T12:15:05.0003664Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.0003914Z collecting ... collected 188 items / 3 deselected / 185 selected
2025-12-04T12:15:05.0004068Z stepcurrent: skipping 3 already run items.
2025-12-04T12:15:05.0004183Z Running 185 items in this shard
2025-12-04T12:15:05.0004192Z 
2025-12-04T12:15:05.0005435Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:05.0006504Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0006942Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.0007408Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.0007865Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.0008436Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.0008989Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0009572Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.0010179Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.0010733Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.0011190Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.0011819Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.0012338Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0012889Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.0013470Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.0014012Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.0014535Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.0015062Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.0015554Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.0016021Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T12:15:05.0016919Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.0017505Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.0018096Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.0018689Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T12:15:05.0019245Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T12:15:05.0019806Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = tmp0.to(tl.float32)
2025-12-04T12:15:05.0020295Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.0020778Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = -448.0
2025-12-04T12:15:05.0021361Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.0021851Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = 448.0
2025-12-04T12:15:05.0022439Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.0022979Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.0023698Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask)
2025-12-04T12:15:05.0024274Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T12:15:05.0024982Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None)
2025-12-04T12:15:05.0025362Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0027699Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0028255Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0029365Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0030008Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0030945Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0031687Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0032577Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0033357Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0033961Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0035028Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0035406Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0036345Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0036491Z ('RERUN', {'yellow': True}) [3.3187s] [  0%]
2025-12-04T12:15:05.0037732Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:05.0038802Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0039237Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.0039690Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.0040169Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.0040706Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.0041257Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0041848Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.0042435Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.0043082Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.0043534Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.0044175Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.0044741Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0045330Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.0045913Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.0046447Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.0046989Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.0047477Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.0047978Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.0048454Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T12:15:05.0049218Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.0049793Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.0050381Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.0050967Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T12:15:05.0051522Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T12:15:05.0052052Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = tmp0.to(tl.float32)
2025-12-04T12:15:05.0052550Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.0053017Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = -448.0
2025-12-04T12:15:05.0053602Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.0054063Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = 448.0
2025-12-04T12:15:05.0054640Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.0055196Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.0055902Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask)
2025-12-04T12:15:05.0056611Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T12:15:05.0057320Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None)
2025-12-04T12:15:05.0057697Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0060104Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0060666Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0061710Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0062363Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0063255Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0063992Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0064885Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0065667Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0066290Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0067348Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0067731Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0068625Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0068764Z ('RERUN', {'yellow': True}) [0.3581s] [  0%]
2025-12-04T12:15:05.0070013Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:05.0071383Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0071838Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.0072295Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.0072813Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.0073483Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.0074029Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0074634Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.0075219Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.0075789Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.0076244Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.0076876Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.0077412Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0078018Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.0078612Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.0079143Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.0079672Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.0080177Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.0080654Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.0081135Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T12:15:05.0081903Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.0082428Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.0083019Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.0083593Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T12:15:05.0084162Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T12:15:05.0084724Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = tmp0.to(tl.float32)
2025-12-04T12:15:05.0085223Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.0085684Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = -448.0
2025-12-04T12:15:05.0086292Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.0086787Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = 448.0
2025-12-04T12:15:05.0087368Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.0087929Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.0088636Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask)
2025-12-04T12:15:05.0089208Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T12:15:05.0089926Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None)
2025-12-04T12:15:05.0090292Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0092635Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0093224Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0094278Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0094908Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0095810Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0096553Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0097449Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0098218Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0098863Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0099935Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0100337Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0101273Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0101379Z FAILED [0.3591s] [  0%]
2025-12-04T12:15:05.0101388Z 
2025-12-04T12:15:05.0101550Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.0101879Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _
2025-12-04T12:15:05.0102004Z Traceback (most recent call last):
2025-12-04T12:15:05.0102472Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:05.0102717Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.0103208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.0103470Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.0103990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.0104195Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.0104737Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.0104891Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.0105446Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.0105766Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.0106295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.0106447Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.0106929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.0107066Z     return self._compile_to_module()
2025-12-04T12:15:05.0107565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.0107736Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.0108264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.0108394Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.0108899Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.0109131Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.0109716Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.0109855Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.0110352Z   File "/tmp/tmpotzsm7xh/qu/cqufc5x6lwjjws43ojgjstkx4n6wmjh3epgcpo7hc3eqwk666dtt.py", line 62, in <module>
2025-12-04T12:15:05.0110826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.0110971Z     kernel.precompile(
2025-12-04T12:15:05.0111525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.0111656Z     self._precompile_worker()
2025-12-04T12:15:05.0112252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.0112465Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.0113099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0113299Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0113761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0114012Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0114457Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0114808Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0115036Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0115662Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0115756Z ^
2025-12-04T12:15:05.0116215Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0116221Z 
2025-12-04T12:15:05.0116940Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0116978Z 
2025-12-04T12:15:05.0116983Z 
2025-12-04T12:15:05.0117204Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0117851Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.0117857Z 
2025-12-04T12:15:05.0118126Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0118355Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0118473Z frames [('total', 1)]
2025-12-04T12:15:05.0118592Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0118845Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.0119064Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0119168Z graph_break []
2025-12-04T12:15:05.0119500Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _
2025-12-04T12:15:05.0119626Z Traceback (most recent call last):
2025-12-04T12:15:05.0120076Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:05.0120332Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.0120820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.0121085Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.0121598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.0121791Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.0122315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.0122464Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.0123033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.0123368Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.0123883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.0124077Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.0124557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.0124710Z     return self._compile_to_module()
2025-12-04T12:15:05.0125207Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.0125375Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.0125903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.0126032Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.0126526Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.0126770Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.0127357Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.0127504Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.0128005Z   File "/tmp/tmpp8qefu7m/dx/cdxobz35is7vmn2nhahqefcnttool33pgq6zbnkhhcb2ssgtygfe.py", line 62, in <module>
2025-12-04T12:15:05.0128466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.0128625Z     kernel.precompile(
2025-12-04T12:15:05.0129182Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.0129298Z     self._precompile_worker()
2025-12-04T12:15:05.0129903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.0130083Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.0130691Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0130889Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0131343Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0131599Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0132047Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0132393Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0132619Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0133228Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0133333Z ^
2025-12-04T12:15:05.0133789Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0133795Z 
2025-12-04T12:15:05.0134521Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0134529Z 
2025-12-04T12:15:05.0134534Z 
2025-12-04T12:15:05.0134751Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0135426Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.0135433Z 
2025-12-04T12:15:05.0135714Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0135935Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0136086Z frames [('total', 1)]
2025-12-04T12:15:05.0136207Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0136523Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.0136828Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0136932Z graph_break []
2025-12-04T12:15:05.0137153Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0137275Z frames [('total', 1)]
2025-12-04T12:15:05.0137391Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0137612Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0137861Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.0137960Z graph_break []
2025-12-04T12:15:05.0138123Z =================================== FAILURES ===================================
2025-12-04T12:15:05.0138447Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _
2025-12-04T12:15:05.0138575Z Traceback (most recent call last):
2025-12-04T12:15:05.0139040Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:05.0139286Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.0139777Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.0140081Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.0140597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.0140800Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.0141308Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.0141454Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.0142003Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.0142326Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.0142858Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.0143010Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.0143493Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.0143628Z     return self._compile_to_module()
2025-12-04T12:15:05.0144132Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.0144310Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.0144827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.0144962Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.0145482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.0145716Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.0146313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.0146500Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.0146978Z   File "/tmp/tmpluiakr_1/oh/coh7xscqvd6toy45gje54ujlzvbfh7fibt64kkk6wcyq4btftkku.py", line 62, in <module>
2025-12-04T12:15:05.0147455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.0147568Z     kernel.precompile(
2025-12-04T12:15:05.0148123Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.0148287Z     self._precompile_worker()
2025-12-04T12:15:05.0148915Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.0149114Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.0149714Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0149916Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0150385Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0150635Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0151079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0151433Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0151666Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0152295Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0152422Z ^
2025-12-04T12:15:05.0152884Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0152890Z 
2025-12-04T12:15:05.0153618Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0153624Z 
2025-12-04T12:15:05.0153629Z 
2025-12-04T12:15:05.0153847Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0154503Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.0154509Z 
2025-12-04T12:15:05.0154783Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0155019Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0155127Z frames [('total', 1)]
2025-12-04T12:15:05.0155247Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0155504Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.0155726Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0155830Z graph_break []
2025-12-04T12:15:05.0156063Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0156169Z frames [('total', 1)]
2025-12-04T12:15:05.0156287Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0156521Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0156756Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.0156871Z graph_break []
2025-12-04T12:15:05.0157086Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0157190Z frames [('total', 1)]
2025-12-04T12:15:05.0157316Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0157535Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0157809Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.0157928Z graph_break []
2025-12-04T12:15:05.0158583Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f71fa45f6063b14.xml -
2025-12-04T12:15:05.0158769Z =========================== short test summary info ============================
2025-12-04T12:15:05.0159542Z FAILED [0.3591s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0160213Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0160316Z ^
2025-12-04T12:15:05.0160771Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0160779Z 
2025-12-04T12:15:05.0161500Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0161506Z 
2025-12-04T12:15:05.0161511Z 
2025-12-04T12:15:05.0161728Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0162362Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.0162383Z 
2025-12-04T12:15:05.0162651Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0162834Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.0163046Z =================== 1 failed, 3 deselected, 2 rerun in 4.08s ===================
2025-12-04T12:15:05.0163179Z Got exit code 1
2025-12-04T12:15:05.0163288Z Retrying single test...
2025-12-04T12:15:05.0163773Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3d881319a967678f.xml
2025-12-04T12:15:05.0163937Z ============================= test session starts ==============================
2025-12-04T12:15:05.0164302Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.0164415Z cachedir: .pytest_cache
2025-12-04T12:15:05.0164934Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.0165074Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.0165184Z configfile: pytest.ini
2025-12-04T12:15:05.0165779Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.0166018Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.0166732Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.0166864Z Running 1 items in this shard
2025-12-04T12:15:05.0166869Z 
2025-12-04T12:15:05.0168110Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:05.0169181Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0169612Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.0170096Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.0170573Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.0171315Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.0171872Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0172581Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.0173168Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.0173742Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.0174193Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.0174832Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.0175356Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0175898Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.0176553Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.0177159Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.0177699Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.0178188Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.0178671Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.0179159Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T12:15:05.0179922Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.0180458Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.0181048Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.0181634Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T12:15:05.0182187Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T12:15:05.0182709Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = tmp0.to(tl.float32)
2025-12-04T12:15:05.0183206Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.0183709Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = -448.0
2025-12-04T12:15:05.0184302Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.0184758Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = 448.0
2025-12-04T12:15:05.0185335Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.0185962Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.0186671Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask)
2025-12-04T12:15:05.0187262Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T12:15:05.0187971Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None)
2025-12-04T12:15:05.0188353Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0190695Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0191276Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0192318Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0192951Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0193855Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0194539Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0195436Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0196210Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0196829Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0197930Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0198357Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0199298Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0199507Z ('RERUN', {'yellow': True}) [3.3108s] [100%]
2025-12-04T12:15:05.0200793Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:05.0201885Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0202346Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.0202796Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.0203276Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.0203814Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.0204364Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0204986Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.0205571Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.0206147Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.0206597Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.0207248Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.0207771Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0208339Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.0208920Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.0209453Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.0209998Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.0210489Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.0210981Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.0211451Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T12:15:05.0212249Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.0212778Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.0213396Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.0214016Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T12:15:05.0214574Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T12:15:05.0215106Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = tmp0.to(tl.float32)
2025-12-04T12:15:05.0215609Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.0216074Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = -448.0
2025-12-04T12:15:05.0216736Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.0217208Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = 448.0
2025-12-04T12:15:05.0217800Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.0218385Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.0219098Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask)
2025-12-04T12:15:05.0219686Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T12:15:05.0220397Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None)
2025-12-04T12:15:05.0220780Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0223155Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0223711Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0224775Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0225460Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0226363Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0227047Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0228014Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0228787Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0229426Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0230487Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0230870Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0231768Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0231903Z ('RERUN', {'yellow': True}) [0.3543s] [100%]
2025-12-04T12:15:05.0233206Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:05.0234262Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0234712Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.0235163Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.0235636Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.0236176Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.0236721Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0237319Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.0237908Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.0238486Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.0238944Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.0239632Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.0240175Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0240717Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.0241342Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.0241903Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.0242437Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.0242948Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.0243431Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.0243916Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T12:15:05.0244683Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.0245215Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.0245802Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.0246415Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T12:15:05.0246985Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T12:15:05.0247510Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = tmp0.to(tl.float32)
2025-12-04T12:15:05.0248014Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.0248480Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = -448.0
2025-12-04T12:15:05.0249056Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.0249538Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = 448.0
2025-12-04T12:15:05.0250118Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.0250677Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.0251382Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask)
2025-12-04T12:15:05.0251956Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T12:15:05.0252681Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None)
2025-12-04T12:15:05.0253091Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0255492Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0256063Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0257220Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0257859Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0258775Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0259458Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0260402Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0261168Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0261785Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0262857Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0263226Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0264139Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0264247Z FAILED [0.3555s] [100%]
2025-12-04T12:15:05.0264255Z 
2025-12-04T12:15:05.0264415Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.0264744Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _
2025-12-04T12:15:05.0264872Z Traceback (most recent call last):
2025-12-04T12:15:05.0265344Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:05.0265591Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.0266080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.0266345Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.0266890Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.0267100Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.0267612Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.0267766Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.0268349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.0268709Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.0269245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.0269402Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.0269890Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.0270031Z     return self._compile_to_module()
2025-12-04T12:15:05.0270521Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.0270687Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.0271541Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.0271679Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.0272198Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.0272432Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.0273122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.0273270Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.0273780Z   File "/tmp/tmp61d8e04k/ff/cffrfjwasuxxgzokqulnluti23iuaacrarvz4fycblwg3su4uyv3.py", line 62, in <module>
2025-12-04T12:15:05.0274260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.0274376Z     kernel.precompile(
2025-12-04T12:15:05.0274932Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.0275072Z     self._precompile_worker()
2025-12-04T12:15:05.0275670Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.0275853Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.0276467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0276671Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0277137Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0277385Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0277834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0278190Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0278420Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0279049Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0279143Z ^
2025-12-04T12:15:05.0279658Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0279665Z 
2025-12-04T12:15:05.0280398Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0280405Z 
2025-12-04T12:15:05.0280409Z 
2025-12-04T12:15:05.0280632Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0281402Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.0281408Z 
2025-12-04T12:15:05.0281727Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0281956Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0282085Z frames [('total', 1)]
2025-12-04T12:15:05.0282203Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0282461Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.0282684Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0282788Z graph_break []
2025-12-04T12:15:05.0283125Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _
2025-12-04T12:15:05.0283250Z Traceback (most recent call last):
2025-12-04T12:15:05.0283705Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:05.0283963Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.0284460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.0284721Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.0285285Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.0285482Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.0286006Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.0286156Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.0286703Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.0287028Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.0287548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.0287711Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.0288190Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.0288318Z     return self._compile_to_module()
2025-12-04T12:15:05.0288817Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.0288983Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.0289511Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.0289645Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.0290139Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.0290386Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.0290972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.0291115Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.0291632Z   File "/tmp/tmp_di73wtl/4p/c4pjicw4ad5n4qi5pbu5cdwblqv7hbjmlhfzpd6uualp4khnauvg.py", line 62, in <module>
2025-12-04T12:15:05.0292098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.0292225Z     kernel.precompile(
2025-12-04T12:15:05.0292783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.0292933Z     self._precompile_worker()
2025-12-04T12:15:05.0293548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.0293765Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.0294377Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0294578Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0295037Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0295300Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0295750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0296101Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0296397Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0297023Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0297130Z ^
2025-12-04T12:15:05.0297587Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0297633Z 
2025-12-04T12:15:05.0298360Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0298366Z 
2025-12-04T12:15:05.0298371Z 
2025-12-04T12:15:05.0298589Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0299225Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.0299246Z 
2025-12-04T12:15:05.0299515Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0299741Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0299860Z frames [('total', 1)]
2025-12-04T12:15:05.0299978Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0300219Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.0300454Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0300557Z graph_break []
2025-12-04T12:15:05.0300776Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0300893Z frames [('total', 1)]
2025-12-04T12:15:05.0301011Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0301242Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0301476Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.0301582Z graph_break []
2025-12-04T12:15:05.0301744Z =================================== FAILURES ===================================
2025-12-04T12:15:05.0302066Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _
2025-12-04T12:15:05.0302190Z Traceback (most recent call last):
2025-12-04T12:15:05.0302663Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:05.0302948Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.0303491Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.0303744Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.0304260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.0304504Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.0305051Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.0305203Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.0305760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.0306087Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.0306626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.0306775Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.0307254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.0307393Z     return self._compile_to_module()
2025-12-04T12:15:05.0307877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.0308056Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.0308577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.0308742Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.0309255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.0309488Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.0310075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.0310218Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.0310724Z   File "/tmp/tmpph97hlyp/ky/ckyit2fbc7htzmeglpvxlpbe7vwe3hsuabb5q3bvcqlfgssfpniq.py", line 62, in <module>
2025-12-04T12:15:05.0311203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.0311321Z     kernel.precompile(
2025-12-04T12:15:05.0311877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.0312028Z     self._precompile_worker()
2025-12-04T12:15:05.0312628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.0312821Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.0313419Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0313621Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0314093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0314342Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0314803Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0315140Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0315372Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0316050Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0316142Z ^
2025-12-04T12:15:05.0316602Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0316622Z 
2025-12-04T12:15:05.0317366Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0317372Z 
2025-12-04T12:15:05.0317405Z 
2025-12-04T12:15:05.0317627Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0318275Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.0318283Z 
2025-12-04T12:15:05.0318555Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0318795Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0318903Z frames [('total', 1)]
2025-12-04T12:15:05.0319022Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0319277Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.0319499Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0319604Z graph_break []
2025-12-04T12:15:05.0319841Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0319952Z frames [('total', 1)]
2025-12-04T12:15:05.0320088Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0320311Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0320579Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.0320692Z graph_break []
2025-12-04T12:15:05.0320913Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0321016Z frames [('total', 1)]
2025-12-04T12:15:05.0321149Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0321367Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0321602Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.0321717Z graph_break []
2025-12-04T12:15:05.0322380Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3d881319a967678f.xml -
2025-12-04T12:15:05.0322579Z =========================== short test summary info ============================
2025-12-04T12:15:05.0323362Z FAILED [0.3555s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0323977Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0324084Z ^
2025-12-04T12:15:05.0324544Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0324550Z 
2025-12-04T12:15:05.0325273Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0325281Z 
2025-12-04T12:15:05.0325285Z 
2025-12-04T12:15:05.0325504Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0326141Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.0326161Z 
2025-12-04T12:15:05.0326431Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0326649Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.0326868Z ================== 1 failed, 187 deselected, 2 rerun in 4.06s ==================
2025-12-04T12:15:05.0326970Z Got exit code 1
2025-12-04T12:15:05.0327080Z Retrying single test...
2025-12-04T12:15:05.0327567Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7b45d70025cf6016.xml
2025-12-04T12:15:05.0327767Z ============================= test session starts ==============================
2025-12-04T12:15:05.0328133Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.0328294Z cachedir: .pytest_cache
2025-12-04T12:15:05.0328816Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.0328960Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.0329069Z configfile: pytest.ini
2025-12-04T12:15:05.0329666Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.0329903Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.0330618Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.0330750Z Running 1 items in this shard
2025-12-04T12:15:05.0330755Z 
2025-12-04T12:15:05.0332030Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:05.0333136Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0333570Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.0334022Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.0334499Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.0335041Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.0335598Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0336187Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.0336844Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.0337416Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.0337865Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.0338518Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.0339043Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0339632Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.0340228Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.0340760Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.0341303Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.0341857Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.0342351Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.0342823Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T12:15:05.0343593Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.0344122Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.0344709Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.0345304Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T12:15:05.0345861Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T12:15:05.0346427Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = tmp0.to(tl.float32)
2025-12-04T12:15:05.0346928Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.0347393Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = -448.0
2025-12-04T12:15:05.0347982Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.0348442Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = 448.0
2025-12-04T12:15:05.0349025Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.0349583Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.0350298Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask)
2025-12-04T12:15:05.0350883Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T12:15:05.0351596Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None)
2025-12-04T12:15:05.0351974Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0354343Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0354931Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0356003Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0356654Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0357558Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0358236Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0359136Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0359908Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0360564Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0361621Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0362004Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0362906Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0363043Z ('RERUN', {'yellow': True}) [3.3088s] [100%]
2025-12-04T12:15:05.0364306Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:05.0365368Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0365820Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.0366281Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.0366753Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.0367296Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.0367874Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0368480Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.0369066Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.0369713Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.0370164Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.0370800Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.0371540Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0372087Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.0372684Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.0373221Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.0373753Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.0374355Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.0374835Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.0375320Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T12:15:05.0376082Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.0376680Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.0377272Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.0377853Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T12:15:05.0378417Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T12:15:05.0378937Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = tmp0.to(tl.float32)
2025-12-04T12:15:05.0379434Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.0379901Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = -448.0
2025-12-04T12:15:05.0380478Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.0380950Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = 448.0
2025-12-04T12:15:05.0381577Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.0382134Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.0382849Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask)
2025-12-04T12:15:05.0383544Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T12:15:05.0384255Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None)
2025-12-04T12:15:05.0384626Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0386964Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0387502Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0388603Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0389229Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0390142Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0390832Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0391732Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0392530Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0393142Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0394217Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0394597Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0395548Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0395688Z ('RERUN', {'yellow': True}) [0.3580s] [100%]
2025-12-04T12:15:05.0396976Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T12:15:05.0398084Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0398536Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.0398992Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.0399453Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.0400005Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.0400550Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0401157Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.0401745Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.0402349Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.0402815Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.0403446Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.0403984Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0404534Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.0405130Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.0405683Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.0406205Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.0406714Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.0407194Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.0407665Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T12:15:05.0408444Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.0408998Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.0409607Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.0410181Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T12:15:05.0410785Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T12:15:05.0411337Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = tmp0.to(tl.float32)
2025-12-04T12:15:05.0411829Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.0412310Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = -448.0
2025-12-04T12:15:05.0412886Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.0413355Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = 448.0
2025-12-04T12:15:05.0413931Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.0414480Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.0415202Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask)
2025-12-04T12:15:05.0415809Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T12:15:05.0416599Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None)
2025-12-04T12:15:05.0416962Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0419320Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0419858Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0420908Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0421543Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0422453Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0423174Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0424056Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0424874Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0425516Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0426598Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0426975Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0427885Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0427997Z FAILED [0.3581s] [100%]
2025-12-04T12:15:05.0428004Z 
2025-12-04T12:15:05.0428149Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.0428492Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _
2025-12-04T12:15:05.0428625Z Traceback (most recent call last):
2025-12-04T12:15:05.0429079Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:05.0429374Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.0429866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.0430128Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.0430643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.0430842Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.0431368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.0431517Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.0432062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.0432394Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.0432945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.0433109Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.0433591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.0433731Z     return self._compile_to_module()
2025-12-04T12:15:05.0434219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.0434385Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.0434912Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.0435048Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.0435592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.0435841Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.0436426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.0436566Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.0437037Z   File "/tmp/tmp2jd4mi3_/42/c425kzmprx75eue6yyh3uob3fhnoxdrjot44i67kcgmfh4b6uoj3.py", line 62, in <module>
2025-12-04T12:15:05.0437535Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.0437695Z     kernel.precompile(
2025-12-04T12:15:05.0438254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.0438391Z     self._precompile_worker()
2025-12-04T12:15:05.0438989Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.0439168Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.0439783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0439982Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0440437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0440697Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0441145Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0441502Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0441763Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0442378Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0442482Z ^
2025-12-04T12:15:05.0442944Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0442950Z 
2025-12-04T12:15:05.0443681Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0443688Z 
2025-12-04T12:15:05.0443693Z 
2025-12-04T12:15:05.0443915Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0444573Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.0444582Z 
2025-12-04T12:15:05.0444853Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0445078Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0445197Z frames [('total', 1)]
2025-12-04T12:15:05.0445314Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0445550Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.0445790Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0445894Z graph_break []
2025-12-04T12:15:05.0446230Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _
2025-12-04T12:15:05.0446358Z Traceback (most recent call last):
2025-12-04T12:15:05.0446817Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:05.0447076Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.0447599Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.0447850Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.0448375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.0448569Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.0449130Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.0449280Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.0449844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.0450178Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.0450726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.0450889Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.0451373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.0451495Z     return self._compile_to_module()
2025-12-04T12:15:05.0451999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.0452168Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.0452682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.0452828Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.0453323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.0453600Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.0454197Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.0454325Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.0454839Z   File "/tmp/tmp2l7zvco7/4h/c4hqymdsulobuwkkx7xt2md63hatst6mngltzptiw2dmrehhia7u.py", line 62, in <module>
2025-12-04T12:15:05.0455305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.0455433Z     kernel.precompile(
2025-12-04T12:15:05.0455996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.0456117Z     self._precompile_worker()
2025-12-04T12:15:05.0456802Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.0456993Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.0457591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0457807Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0458257Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0458528Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0458979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0459315Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0459562Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0460229Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0460336Z ^
2025-12-04T12:15:05.0460797Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0460802Z 
2025-12-04T12:15:05.0461521Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0461572Z 
2025-12-04T12:15:05.0461578Z 
2025-12-04T12:15:05.0461800Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0462478Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.0462487Z 
2025-12-04T12:15:05.0462776Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0463002Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0463109Z frames [('total', 1)]
2025-12-04T12:15:05.0463240Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0463479Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.0463718Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0463819Z graph_break []
2025-12-04T12:15:05.0464039Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0464158Z frames [('total', 1)]
2025-12-04T12:15:05.0464273Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0464500Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0464748Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.0464850Z graph_break []
2025-12-04T12:15:05.0465049Z =================================== FAILURES ===================================
2025-12-04T12:15:05.0465375Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _
2025-12-04T12:15:05.0465502Z Traceback (most recent call last):
2025-12-04T12:15:05.0465974Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:05.0466216Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.0466710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.0466975Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.0467505Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.0467715Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.0468228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.0468392Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.0468930Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.0469252Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.0469793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.0469950Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.0470435Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.0470573Z     return self._compile_to_module()
2025-12-04T12:15:05.0471278Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.0471467Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.0472080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.0472217Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.0472737Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.0472973Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.0473632Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.0473813Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.0474300Z   File "/tmp/tmp1i_pvvjs/xj/cxj2cj7kqa5zrggtpbepqwgwh3wguwq6jerzpkiytiia4z2auv6d.py", line 62, in <module>
2025-12-04T12:15:05.0474791Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.0474908Z     kernel.precompile(
2025-12-04T12:15:05.0475470Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.0475605Z     self._precompile_worker()
2025-12-04T12:15:05.0476205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.0476407Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.0477006Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0477207Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0477681Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0478040Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0478506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0478844Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0479072Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0479704Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0479799Z ^
2025-12-04T12:15:05.0480278Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0480284Z 
2025-12-04T12:15:05.0480999Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0481007Z 
2025-12-04T12:15:05.0481013Z 
2025-12-04T12:15:05.0481234Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0481889Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.0481895Z 
2025-12-04T12:15:05.0482167Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0482410Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0482519Z frames [('total', 1)]
2025-12-04T12:15:05.0482633Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0482889Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.0483112Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0483226Z graph_break []
2025-12-04T12:15:05.0483449Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0483551Z frames [('total', 1)]
2025-12-04T12:15:05.0483794Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0484018Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0484253Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.0484368Z graph_break []
2025-12-04T12:15:05.0484582Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0484722Z frames [('total', 1)]
2025-12-04T12:15:05.0484854Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0485072Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0485351Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.0485466Z graph_break []
2025-12-04T12:15:05.0486126Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7b45d70025cf6016.xml -
2025-12-04T12:15:05.0486320Z =========================== short test summary info ============================
2025-12-04T12:15:05.0487093Z FAILED [0.3581s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0487721Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.0487815Z ^
2025-12-04T12:15:05.0488271Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0488277Z 
2025-12-04T12:15:05.0489004Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0489041Z 
2025-12-04T12:15:05.0489046Z 
2025-12-04T12:15:05.0489264Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0489918Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.0489924Z 
2025-12-04T12:15:05.0490196Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0490381Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.0490602Z ================== 1 failed, 187 deselected, 2 rerun in 4.07s ==================
2025-12-04T12:15:05.0490705Z Got exit code 1
2025-12-04T12:15:05.0491275Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.0491688Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:05.0492163Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b4a285d41fdad5fc.xml
2025-12-04T12:15:05.0492347Z ============================= test session starts ==============================
2025-12-04T12:15:05.0492701Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.0492825Z cachedir: .pytest_cache
2025-12-04T12:15:05.0493343Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.0493471Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.0493596Z configfile: pytest.ini
2025-12-04T12:15:05.0494188Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.0494414Z collecting ... collected 188 items / 4 deselected / 184 selected
2025-12-04T12:15:05.0494570Z stepcurrent: skipping 4 already run items.
2025-12-04T12:15:05.0494690Z Running 184 items in this shard
2025-12-04T12:15:05.0494694Z 
2025-12-04T12:15:05.0495927Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T12:15:05.0496819Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0497327Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 33554432
2025-12-04T12:15:05.0497921Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0498487Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:05.0499076Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:05.0499512Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.0500116Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T12:15:05.0500642Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0501199Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T12:15:05.0501728Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.0502238Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T12:15:05.0502694Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T12:15:05.0503262Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T12:15:05.0503701Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T12:15:05.0504289Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T12:15:05.0504838Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T12:15:05.0505403Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T12:15:05.0505773Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0507706Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0508243Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0509342Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0509993Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0510894Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0511655Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0512539Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0513336Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0513944Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0514756Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0515128Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0516025Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0516209Z ('RERUN', {'yellow': True}) [3.7607s] [  0%]
2025-12-04T12:15:05.0517383Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T12:15:05.0518193Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0518662Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 33554432
2025-12-04T12:15:05.0519217Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0519780Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:05.0520343Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:05.0520792Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.0521384Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T12:15:05.0522476Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0523036Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T12:15:05.0523545Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.0524082Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T12:15:05.0524526Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T12:15:05.0525103Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T12:15:05.0525581Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T12:15:05.0526192Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T12:15:05.0526735Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T12:15:05.0527291Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T12:15:05.0527669Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0529588Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0530165Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0531252Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0531903Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0532800Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0533485Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0534387Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0535255Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0535914Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0536791Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0537181Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0538123Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0538265Z ('RERUN', {'yellow': True}) [0.5906s] [  0%]
2025-12-04T12:15:05.0539446Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T12:15:05.0540274Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0540781Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 33554432
2025-12-04T12:15:05.0541329Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0541892Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:05.0542469Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:05.0542905Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.0543512Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T12:15:05.0544037Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0544600Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T12:15:05.0545151Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.0545619Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T12:15:05.0546080Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T12:15:05.0546644Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T12:15:05.0547100Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T12:15:05.0547667Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T12:15:05.0548192Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T12:15:05.0548754Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T12:15:05.0549118Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0551030Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0551613Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0552675Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0553307Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0554289Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0554974Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0555864Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0556654Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0557262Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0558083Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0558451Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0559402Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0559509Z FAILED [0.5733s] [  0%]
2025-12-04T12:15:05.0559516Z 
2025-12-04T12:15:05.0559666Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.0560014Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:05.0560147Z Traceback (most recent call last):
2025-12-04T12:15:05.0560608Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:05.0560874Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.0561367Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.0561634Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.0562160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.0562353Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.0562877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.0563025Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.0563575Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.0563904Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.0564421Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.0564587Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.0565111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.0565251Z     return self._compile_to_module()
2025-12-04T12:15:05.0565740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.0565929Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.0566457Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.0566624Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.0567158Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.0567410Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.0568351Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.0568496Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.0568986Z   File "/tmp/tmpcz80_m4d/w2/cw2w2nzlrsvlalmiaenkppzrll7ug7oywppgchyrtmwm6jkw3x5w.py", line 168, in <module>
2025-12-04T12:15:05.0569453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.0569587Z     kernel.precompile(
2025-12-04T12:15:05.0570144Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.0570280Z     self._precompile_worker()
2025-12-04T12:15:05.0570885Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.0571246Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.0571966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0572171Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0572626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0572889Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0573338Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0573698Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0573929Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0574292Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0574404Z ^
2025-12-04T12:15:05.0574863Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0574869Z 
2025-12-04T12:15:05.0575605Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0575610Z 
2025-12-04T12:15:05.0575615Z 
2025-12-04T12:15:05.0575840Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0576555Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.0576580Z 
2025-12-04T12:15:05.0576858Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0577085Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0577209Z frames [('total', 1)]
2025-12-04T12:15:05.0577334Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0577579Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.0577884Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0577992Z graph_break []
2025-12-04T12:15:05.0578324Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:05.0578463Z Traceback (most recent call last):
2025-12-04T12:15:05.0578916Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:05.0579236Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.0579774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.0580027Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.0580558Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.0580761Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.0581283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.0581438Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.0581977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.0582318Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.0582845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.0582999Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.0583496Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.0583662Z     return self._compile_to_module()
2025-12-04T12:15:05.0584161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.0584324Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.0584845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.0584990Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.0585492Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.0585743Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.0586332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.0586462Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.0586994Z   File "/tmp/tmpvwxewveo/ej/cejkkbtgn6ov6zv4kojr6w2b6wy2tgudjeczdjnlv4bmxk6az4ml.py", line 168, in <module>
2025-12-04T12:15:05.0587455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.0587568Z     kernel.precompile(
2025-12-04T12:15:05.0588136Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.0588256Z     self._precompile_worker()
2025-12-04T12:15:05.0588870Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.0589053Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.0589648Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0589867Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0590353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0590619Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0591067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0591406Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0591689Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0592054Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0592174Z ^
2025-12-04T12:15:05.0592652Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0592660Z 
2025-12-04T12:15:05.0593376Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0593384Z 
2025-12-04T12:15:05.0593388Z 
2025-12-04T12:15:05.0593617Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0594265Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.0594271Z 
2025-12-04T12:15:05.0594555Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0594778Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0594890Z frames [('total', 1)]
2025-12-04T12:15:05.0595022Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0595259Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.0595518Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0595633Z graph_break []
2025-12-04T12:15:05.0595856Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0595978Z frames [('total', 1)]
2025-12-04T12:15:05.0596095Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0596314Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0596560Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.0596661Z graph_break []
2025-12-04T12:15:05.0596811Z =================================== FAILURES ===================================
2025-12-04T12:15:05.0597158Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:05.0597286Z Traceback (most recent call last):
2025-12-04T12:15:05.0597753Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:05.0597998Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.0598493Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.0598754Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.0599271Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.0599494Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.0600030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.0600180Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.0600739Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.0601065Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.0601624Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.0601788Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.0602280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.0602417Z     return self._compile_to_module()
2025-12-04T12:15:05.0602909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.0603111Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.0603680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.0603814Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.0604313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.0604568Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.0605160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.0605301Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.0605818Z   File "/tmp/tmpvzwhzjxd/ki/ckioq6mwobl3fchhfvif7vc7iyaqwzminfkey3j5mnmxsnsqy3bm.py", line 168, in <module>
2025-12-04T12:15:05.0606284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.0606412Z     kernel.precompile(
2025-12-04T12:15:05.0606973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.0607105Z     self._precompile_worker()
2025-12-04T12:15:05.0607738Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.0607919Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.0608525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0608746Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0609211Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0609462Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0609912Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0610263Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0610494Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0610870Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0610962Z ^
2025-12-04T12:15:05.0611421Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0611426Z 
2025-12-04T12:15:05.0612158Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0612166Z 
2025-12-04T12:15:05.0612171Z 
2025-12-04T12:15:05.0612393Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0613057Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.0613063Z 
2025-12-04T12:15:05.0613333Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0613561Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0613734Z frames [('total', 1)]
2025-12-04T12:15:05.0613855Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0614106Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.0614332Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0614437Z graph_break []
2025-12-04T12:15:05.0614674Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0614820Z frames [('total', 1)]
2025-12-04T12:15:05.0614942Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0615212Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0615452Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.0615555Z graph_break []
2025-12-04T12:15:05.0615796Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0615904Z frames [('total', 1)]
2025-12-04T12:15:05.0616038Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0616262Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0616607Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.0616726Z graph_break []
2025-12-04T12:15:05.0617383Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b4a285d41fdad5fc.xml -
2025-12-04T12:15:05.0617565Z =========================== short test summary info ============================
2025-12-04T12:15:05.0618380Z FAILED [0.5733s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0618744Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0618902Z ^
2025-12-04T12:15:05.0619360Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0619368Z 
2025-12-04T12:15:05.0620075Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0620094Z 
2025-12-04T12:15:05.0620098Z 
2025-12-04T12:15:05.0620322Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0620969Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.0620975Z 
2025-12-04T12:15:05.0621262Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0621447Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.0621662Z =================== 1 failed, 4 deselected, 2 rerun in 4.97s ===================
2025-12-04T12:15:05.0621762Z Got exit code 1
2025-12-04T12:15:05.0621877Z Retrying single test...
2025-12-04T12:15:05.0622363Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9b24822b6f23300e.xml
2025-12-04T12:15:05.0622527Z ============================= test session starts ==============================
2025-12-04T12:15:05.0622875Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.0623003Z cachedir: .pytest_cache
2025-12-04T12:15:05.0623526Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.0623667Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.0623778Z configfile: pytest.ini
2025-12-04T12:15:05.0624371Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.0624611Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.0625386Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.0625505Z Running 1 items in this shard
2025-12-04T12:15:05.0625511Z 
2025-12-04T12:15:05.0626693Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T12:15:05.0627556Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0628041Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 33554432
2025-12-04T12:15:05.0628587Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0629166Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:05.0629735Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:05.0630172Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.0630785Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T12:15:05.0631310Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0631914Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T12:15:05.0632425Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.0632893Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T12:15:05.0633353Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T12:15:05.0633927Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T12:15:05.0634373Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T12:15:05.0634944Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T12:15:05.0635467Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T12:15:05.0636021Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T12:15:05.0636387Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0638362Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0638905Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0639969Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0640670Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0641589Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0642283Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0643168Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0643952Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0644570Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0645381Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0645784Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0646688Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0646824Z ('RERUN', {'yellow': True}) [3.7463s] [100%]
2025-12-04T12:15:05.0647991Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T12:15:05.0648806Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0649274Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 33554432
2025-12-04T12:15:05.0649826Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0650391Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:05.0650970Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:05.0651407Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.0651997Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T12:15:05.0652576Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0653130Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T12:15:05.0653653Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.0654120Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T12:15:05.0654596Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T12:15:05.0655204Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T12:15:05.0655644Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T12:15:05.0656227Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T12:15:05.0656830Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T12:15:05.0657405Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T12:15:05.0657786Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0660129Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0660740Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0661783Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0662428Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0663328Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0664033Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0665067Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0665846Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0666479Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0667331Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0667716Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0668612Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0668793Z ('RERUN', {'yellow': True}) [0.5893s] [100%]
2025-12-04T12:15:05.0670002Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T12:15:05.0670795Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0671480Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 33554432
2025-12-04T12:15:05.0672022Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0672602Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:05.0673173Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:05.0673611Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.0674215Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T12:15:05.0674855Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0675420Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T12:15:05.0675929Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.0676411Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T12:15:05.0676903Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T12:15:05.0677473Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T12:15:05.0677936Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T12:15:05.0678496Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T12:15:05.0679029Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T12:15:05.0679573Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T12:15:05.0679939Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0681920Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0682463Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0683522Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0684234Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0685147Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0685830Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0686733Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0687508Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0688115Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0689048Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0689415Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0690322Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0690433Z FAILED [0.5778s] [100%]
2025-12-04T12:15:05.0690440Z 
2025-12-04T12:15:05.0690588Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.0690933Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:05.0691061Z Traceback (most recent call last):
2025-12-04T12:15:05.0691534Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:05.0691783Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.0692274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.0692537Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.0693050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.0693257Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.0693770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.0693920Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.0702083Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.0702637Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.0703194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.0703367Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.0703860Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.0704043Z     return self._compile_to_module()
2025-12-04T12:15:05.0704533Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.0704748Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.0705284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.0705421Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.0705925Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.0706174Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.0706765Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.0706909Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.0707421Z   File "/tmp/tmpog1bh0c0/6n/c6ncjlgssni77o6mwasj6isbalfab5frgpzcdycrnp5j7dka4ylh.py", line 168, in <module>
2025-12-04T12:15:05.0707889Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.0708022Z     kernel.precompile(
2025-12-04T12:15:05.0708579Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.0708753Z     self._precompile_worker()
2025-12-04T12:15:05.0709355Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.0709539Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.0710153Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0710356Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0710812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0711082Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0711527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0711882Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0712114Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0712476Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0712585Z ^
2025-12-04T12:15:05.0713048Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0713056Z 
2025-12-04T12:15:05.0713789Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0713798Z 
2025-12-04T12:15:05.0713803Z 
2025-12-04T12:15:05.0714028Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0714675Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.0714697Z 
2025-12-04T12:15:05.0715000Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0715234Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0715358Z frames [('total', 1)]
2025-12-04T12:15:05.0715478Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0715716Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.0715956Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0716088Z graph_break []
2025-12-04T12:15:05.0716428Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:05.0716599Z Traceback (most recent call last):
2025-12-04T12:15:05.0717056Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:05.0717313Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.0717812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.0718061Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.0718587Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.0718780Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.0719304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.0719453Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.0719986Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.0720321Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.0720874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.0721036Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.0721516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.0721637Z     return self._compile_to_module()
2025-12-04T12:15:05.0722140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.0722308Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.0722826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.0722974Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.0723472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.0723717Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.0724304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.0724431Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.0724930Z   File "/tmp/tmpf2_jk7i4/ed/cedg4cetw7lgrzulgrg6uysbph33aagyrxijhaajbsbz3dowra6h.py", line 168, in <module>
2025-12-04T12:15:05.0725390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.0725519Z     kernel.precompile(
2025-12-04T12:15:05.0726075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.0726197Z     self._precompile_worker()
2025-12-04T12:15:05.0726806Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.0727019Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.0727616Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0727829Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0728279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0728571Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0729064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0729403Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0729643Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0730006Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0730101Z ^
2025-12-04T12:15:05.0730569Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0730575Z 
2025-12-04T12:15:05.0731284Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0731294Z 
2025-12-04T12:15:05.0731299Z 
2025-12-04T12:15:05.0731529Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0732177Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.0732183Z 
2025-12-04T12:15:05.0732464Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0732721Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0732829Z frames [('total', 1)]
2025-12-04T12:15:05.0732963Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0733200Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.0733421Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0733537Z graph_break []
2025-12-04T12:15:05.0733757Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0733880Z frames [('total', 1)]
2025-12-04T12:15:05.0733995Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0734213Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0734463Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.0734564Z graph_break []
2025-12-04T12:15:05.0734711Z =================================== FAILURES ===================================
2025-12-04T12:15:05.0735060Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:05.0735187Z Traceback (most recent call last):
2025-12-04T12:15:05.0735652Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:05.0735898Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.0736475Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.0736744Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.0737259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.0737455Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.0737982Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.0738133Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.0738719Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.0739041Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.0739562Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.0739760Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.0740243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.0740412Z     return self._compile_to_module()
2025-12-04T12:15:05.0740903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.0741072Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.0741605Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.0741738Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.0742238Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.0742489Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.0743076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.0743215Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.0743732Z   File "/tmp/tmp4nx58gwi/zz/czzv6nwerbcir24skctpdetkn3iuakkso7dhks7wsbelyelggatr.py", line 168, in <module>
2025-12-04T12:15:05.0744193Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.0744351Z     kernel.precompile(
2025-12-04T12:15:05.0744909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.0745039Z     self._precompile_worker()
2025-12-04T12:15:05.0745632Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.0745812Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.0746421Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0746622Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0747086Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0747333Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0747776Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0748126Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0748353Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0748716Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0748821Z ^
2025-12-04T12:15:05.0749275Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0749280Z 
2025-12-04T12:15:05.0750005Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0750011Z 
2025-12-04T12:15:05.0750018Z 
2025-12-04T12:15:05.0750238Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0750923Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.0750929Z 
2025-12-04T12:15:05.0751198Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0751421Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0751542Z frames [('total', 1)]
2025-12-04T12:15:05.0752291Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0752528Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.0752794Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0752898Z graph_break []
2025-12-04T12:15:05.0753130Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0753238Z frames [('total', 1)]
2025-12-04T12:15:05.0753355Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0753584Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0753818Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.0753920Z graph_break []
2025-12-04T12:15:05.0754148Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0754254Z frames [('total', 1)]
2025-12-04T12:15:05.0754375Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0754606Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0754837Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.0754954Z graph_break []
2025-12-04T12:15:05.0755605Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9b24822b6f23300e.xml -
2025-12-04T12:15:05.0755780Z =========================== short test summary info ============================
2025-12-04T12:15:05.0756627Z FAILED [0.5778s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0756988Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0757079Z ^
2025-12-04T12:15:05.0757548Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0757556Z 
2025-12-04T12:15:05.0758262Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0758267Z 
2025-12-04T12:15:05.0758274Z 
2025-12-04T12:15:05.0758503Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0759146Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.0759154Z 
2025-12-04T12:15:05.0759437Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0759620Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.0759820Z ================== 1 failed, 187 deselected, 2 rerun in 4.96s ==================
2025-12-04T12:15:05.0759935Z Got exit code 1
2025-12-04T12:15:05.0760045Z Retrying single test...
2025-12-04T12:15:05.0760517Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-642548938a706c13.xml
2025-12-04T12:15:05.0760697Z ============================= test session starts ==============================
2025-12-04T12:15:05.0761050Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.0761173Z cachedir: .pytest_cache
2025-12-04T12:15:05.0761698Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.0761857Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.0761980Z configfile: pytest.ini
2025-12-04T12:15:05.0762567Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.0762806Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.0763531Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.0763680Z Running 1 items in this shard
2025-12-04T12:15:05.0763685Z 
2025-12-04T12:15:05.0764902Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T12:15:05.0765745Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0766230Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 33554432
2025-12-04T12:15:05.0766771Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0767336Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:05.0767917Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:05.0768382Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.0768993Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T12:15:05.0769513Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0770072Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T12:15:05.0770585Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.0771247Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T12:15:05.0771705Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T12:15:05.0772281Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T12:15:05.0772733Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T12:15:05.0773300Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T12:15:05.0773825Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T12:15:05.0774386Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T12:15:05.0774752Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0776841Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0777441Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0778540Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0779176Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0780069Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0780762Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0781654Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0782440Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0783096Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0783899Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0784266Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0785165Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0785317Z ('RERUN', {'yellow': True}) [3.7459s] [100%]
2025-12-04T12:15:05.0786484Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T12:15:05.0787302Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0787785Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 33554432
2025-12-04T12:15:05.0788349Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0788916Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:05.0789479Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:05.0789957Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.0790549Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T12:15:05.0791082Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0791631Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T12:15:05.0792196Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.0792684Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T12:15:05.0793130Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T12:15:05.0793707Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T12:15:05.0794142Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T12:15:05.0794703Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T12:15:05.0795242Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T12:15:05.0795787Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T12:15:05.0796172Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0798113Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0798662Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0799700Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0800349Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0801238Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0801918Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0802816Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0803584Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0804236Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0805034Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0805410Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0806366Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0806504Z ('RERUN', {'yellow': True}) [0.5813s] [100%]
2025-12-04T12:15:05.0807690Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T12:15:05.0808485Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0808962Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 33554432
2025-12-04T12:15:05.0809505Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0810076Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:05.0810672Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:05.0811107Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.0811713Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T12:15:05.0812232Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0812791Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T12:15:05.0813298Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.0813765Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T12:15:05.0814220Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T12:15:05.0814782Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T12:15:05.0815225Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T12:15:05.0815794Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T12:15:05.0816373Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T12:15:05.0816931Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T12:15:05.0817293Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0819269Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0819866Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0820932Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0821564Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0822474Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0823151Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0824042Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0824859Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0825471Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0826281Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0826651Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0827555Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0827663Z FAILED [0.5994s] [100%]
2025-12-04T12:15:05.0827671Z 
2025-12-04T12:15:05.0827816Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.0828162Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:05.0828289Z Traceback (most recent call last):
2025-12-04T12:15:05.0828744Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:05.0829001Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.0829489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.0829750Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.0830262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.0830457Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.0831029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.0831177Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.0831724Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.0832046Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.0832596Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.0832756Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.0833269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.0833394Z     return self._compile_to_module()
2025-12-04T12:15:05.0833893Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.0834059Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.0834588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.0834718Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.0835215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.0835459Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.0836048Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.0836189Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.0836701Z   File "/tmp/tmpcovv30vq/l2/cl2pfyu3yerwqajaunnn6bdchrm5jjmcj5tpr5xrpkyrtxpk3qwp.py", line 168, in <module>
2025-12-04T12:15:05.0837195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.0837320Z     kernel.precompile(
2025-12-04T12:15:05.0837876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.0837994Z     self._precompile_worker()
2025-12-04T12:15:05.0838599Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.0838781Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.0839388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0839590Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0840041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0840301Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0840746Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0841090Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0841317Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0841678Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0841781Z ^
2025-12-04T12:15:05.0842243Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0842249Z 
2025-12-04T12:15:05.0842976Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0842984Z 
2025-12-04T12:15:05.0842989Z 
2025-12-04T12:15:05.0843237Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0843883Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.0843901Z 
2025-12-04T12:15:05.0844169Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0844425Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0844549Z frames [('total', 1)]
2025-12-04T12:15:05.0844669Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0844935Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.0845170Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0845272Z graph_break []
2025-12-04T12:15:05.0845606Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:05.0845743Z Traceback (most recent call last):
2025-12-04T12:15:05.0846201Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:05.0846457Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.0846945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.0847197Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.0847721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.0847918Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.0848439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.0848621Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.0849710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.0850051Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.0850580Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.0850731Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.0851228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.0851354Z     return self._compile_to_module()
2025-12-04T12:15:05.0851851Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.0852019Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.0852536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.0852683Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.0853177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.0853420Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.0854004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.0854134Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.0854659Z   File "/tmp/tmp1rnwc7cl/kt/cktt3qo6gdwctexez2ytmxkqwi3slhgisa7ujqlzt5l7gsfou4mu.py", line 168, in <module>
2025-12-04T12:15:05.0855121Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.0855237Z     kernel.precompile(
2025-12-04T12:15:05.0855849Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.0855970Z     self._precompile_worker()
2025-12-04T12:15:05.0856668Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.0856864Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.0857461Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0857714Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0858201Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0858464Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0858914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0859252Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0859498Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0859859Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0859952Z ^
2025-12-04T12:15:05.0860427Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0860435Z 
2025-12-04T12:15:05.0861150Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0861157Z 
2025-12-04T12:15:05.0861162Z 
2025-12-04T12:15:05.0861430Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0862219Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.0862226Z 
2025-12-04T12:15:05.0862512Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0862742Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0862852Z frames [('total', 1)]
2025-12-04T12:15:05.0862994Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0863234Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.0863457Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0863577Z graph_break []
2025-12-04T12:15:05.0863798Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0863919Z frames [('total', 1)]
2025-12-04T12:15:05.0864043Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0864264Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0864515Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.0864619Z graph_break []
2025-12-04T12:15:05.0864769Z =================================== FAILURES ===================================
2025-12-04T12:15:05.0865122Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:05.0865252Z Traceback (most recent call last):
2025-12-04T12:15:05.0865709Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T12:15:05.0865973Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.0866466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.0866732Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.0867305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.0867505Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.0868032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.0868182Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.0868735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.0869107Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.0869663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.0869830Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.0870313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.0870454Z     return self._compile_to_module()
2025-12-04T12:15:05.0871103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.0871271Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.0871807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.0871941Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.0872438Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.0872687Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.0873273Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.0873512Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.0874003Z   File "/tmp/tmpjp_ii3nz/pg/cpgmhtrqxu7r366x6ajfqgjb7chhlm5bjwvw77kf2nla77uk5ley.py", line 168, in <module>
2025-12-04T12:15:05.0874468Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.0874593Z     kernel.precompile(
2025-12-04T12:15:05.0875141Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.0875276Z     self._precompile_worker()
2025-12-04T12:15:05.0875878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.0876057Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.0876667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0876869Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0877325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0877585Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0878027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0878375Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0878601Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0878966Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0879071Z ^
2025-12-04T12:15:05.0879527Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0879536Z 
2025-12-04T12:15:05.0880313Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0880321Z 
2025-12-04T12:15:05.0880325Z 
2025-12-04T12:15:05.0880545Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0881189Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.0881252Z 
2025-12-04T12:15:05.0881525Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0881797Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0881918Z frames [('total', 1)]
2025-12-04T12:15:05.0882035Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0882278Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.0882512Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0882616Z graph_break []
2025-12-04T12:15:05.0882833Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0882952Z frames [('total', 1)]
2025-12-04T12:15:05.0883068Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0883299Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0883535Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.0883637Z graph_break []
2025-12-04T12:15:05.0883872Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.0883979Z frames [('total', 1)]
2025-12-04T12:15:05.0884095Z stats [('calls_captured', 7)]
2025-12-04T12:15:05.0884329Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.0884596Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.0884696Z graph_break []
2025-12-04T12:15:05.0885358Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-642548938a706c13.xml -
2025-12-04T12:15:05.0885535Z =========================== short test summary info ============================
2025-12-04T12:15:05.0886351Z FAILED [0.5994s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0886715Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0886808Z ^
2025-12-04T12:15:05.0887280Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0887285Z 
2025-12-04T12:15:05.0887994Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0888002Z 
2025-12-04T12:15:05.0888009Z 
2025-12-04T12:15:05.0888246Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0888897Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.0888903Z 
2025-12-04T12:15:05.0889186Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.0889370Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.0889574Z ================== 1 failed, 187 deselected, 2 rerun in 4.97s ==================
2025-12-04T12:15:05.0889693Z Got exit code 1
2025-12-04T12:15:05.0890262Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.0890695Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:05.0891205Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3087aa3d89d0a96b.xml
2025-12-04T12:15:05.0891374Z ============================= test session starts ==============================
2025-12-04T12:15:05.0891739Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.0891852Z cachedir: .pytest_cache
2025-12-04T12:15:05.0892412Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.0892553Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.0892769Z configfile: pytest.ini
2025-12-04T12:15:05.0893379Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.0893605Z collecting ... collected 188 items / 5 deselected / 183 selected
2025-12-04T12:15:05.0893750Z stepcurrent: skipping 5 already run items.
2025-12-04T12:15:05.0893884Z Running 183 items in this shard
2025-12-04T12:15:05.0893889Z 
2025-12-04T12:15:05.0894396Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,1,15_cuda PASSED [3.3894s] [  0%]
2025-12-04T12:15:05.0894896Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,15_cuda PASSED [0.2990s] [  1%]
2025-12-04T12:15:05.0895419Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,4096_cuda PASSED [0.6567s] [  1%]
2025-12-04T12:15:05.0895925Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,512_cuda PASSED [0.3272s] [  2%]
2025-12-04T12:15:05.0896521Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.7233s] [  2%]
2025-12-04T12:15:05.0897717Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.0898610Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0899050Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.0899495Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.0900361Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.0900835Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.0901385Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.0901924Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0902510Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.0903113Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.0903669Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.0904130Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.0904694Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.0905168Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.0905642Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.0906120Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.0906813Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.0907338Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0907898Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:05.0908399Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.0908979Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.0909563Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:05.0910189Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.0910706Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:05.0911207Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:05.0911653Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:05.0912233Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:05.0912675Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:05.0913261Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:05.0913795Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:05.0914513Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T12:15:05.0914894Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0916876Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0917483Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0918572Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0919223Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0920157Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0920909Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0921837Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0922627Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0923303Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0924205Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0924597Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0925544Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0925692Z ('RERUN', {'yellow': True}) [0.1761s] [  3%]
2025-12-04T12:15:05.0926846Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.0927745Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0928181Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.0928624Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.0929178Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.0929646Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.0930199Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.0930757Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0931343Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.0931943Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.0932539Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.0932996Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.0933517Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.0934038Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.0934538Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.0934992Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.0935666Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.0936194Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0936840Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:05.0937343Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.0937937Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.0938529Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:05.0939198Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.0939722Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:05.0940195Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:05.0940649Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:05.0941247Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:05.0941688Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:05.0942289Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:05.0942824Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:05.0943536Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T12:15:05.0943916Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0945892Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0946446Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0947491Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0948201Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0949099Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0949798Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0950683Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0951464Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0952083Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0952959Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0953379Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0954279Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0954431Z ('RERUN', {'yellow': True}) [0.3661s] [  3%]
2025-12-04T12:15:05.0955580Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.0956449Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0956906Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.0957356Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.0957894Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.0958360Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.0958915Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.0959457Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.0960047Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.0960682Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.0961239Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.0961696Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.0962277Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.0962751Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.0963237Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.0963689Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.0964351Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.0964881Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.0965430Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:05.0965944Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.0966527Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.0967146Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:05.0967773Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.0968280Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:05.0968768Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:05.0969215Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:05.0969805Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:05.0970254Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:05.0970829Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:05.0971566Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:05.0972280Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T12:15:05.0972661Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.0974685Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.0975248Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.0976464Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0977117Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0978037Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0978719Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0979620Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0980404Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0981075Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.0981959Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0982339Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.0983232Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0983349Z FAILED [0.3408s] [  3%]
2025-12-04T12:15:05.0983374Z 
2025-12-04T12:15:05.0983525Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.0983820Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____
2025-12-04T12:15:05.0983962Z Traceback (most recent call last):
2025-12-04T12:15:05.0984367Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.0984527Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.0985033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.0985285Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.0985814Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.0986010Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.0986523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.0986683Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.0987225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.0987587Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.0988121Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.0988272Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.0988762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.0988921Z     return self._compile_to_module()
2025-12-04T12:15:05.0989436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.0989615Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.0990132Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.0990281Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.0990778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.0991013Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.0991620Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.0991754Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.0992239Z   File "/tmp/tmp03_ztxxq/cx/ccxjqkpnzscpe7jvzeho2zrfnk4hkkr4ok5z7cgtyhamznr5ph5i.py", line 58, in <module>
2025-12-04T12:15:05.0992722Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.0992837Z     kernel.precompile(
2025-12-04T12:15:05.0993438Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.0993562Z     self._precompile_worker()
2025-12-04T12:15:05.0994160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.0994364Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.0994957Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.0995172Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.0995630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.0995877Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.0996337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.0996677Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.0996907Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.0997350Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.0997442Z ^
2025-12-04T12:15:05.0997916Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.0997924Z 
2025-12-04T12:15:05.0998642Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.0998648Z 
2025-12-04T12:15:05.0998653Z 
2025-12-04T12:15:05.0998883Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.0999460Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:05.0999466Z 
2025-12-04T12:15:05.0999765Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1000005Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1000112Z frames [('total', 1)]
2025-12-04T12:15:05.1000230Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1000467Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1000736Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1000848Z graph_break []
2025-12-04T12:15:05.1001171Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____
2025-12-04T12:15:05.1001298Z Traceback (most recent call last):
2025-12-04T12:15:05.1001708Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.1001867Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.1002359Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.1002642Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.1003156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.1003364Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.1003877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.1004025Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.1004574Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.1004896Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.1005462Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.1005616Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.1006100Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.1006244Z     return self._compile_to_module()
2025-12-04T12:15:05.1006730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.1006898Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.1007428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.1007561Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.1008068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.1008308Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.1008893Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.1009039Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.1009545Z   File "/tmp/tmpzu2vv2w2/pq/cpqops675htgquucs5stc3cdhr6yh67ew3vhngbdxmjnohi4f2kc.py", line 58, in <module>
2025-12-04T12:15:05.1010028Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.1010141Z     kernel.precompile(
2025-12-04T12:15:05.1010700Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.1010835Z     self._precompile_worker()
2025-12-04T12:15:05.1011433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.1011644Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.1012258Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1012460Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1013336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1013633Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1014111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1014465Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1014716Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1015167Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1015259Z ^
2025-12-04T12:15:05.1015726Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1015732Z 
2025-12-04T12:15:05.1016532Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1016542Z 
2025-12-04T12:15:05.1016547Z 
2025-12-04T12:15:05.1016767Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1017361Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:05.1017367Z 
2025-12-04T12:15:05.1017638Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1017904Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1018032Z frames [('total', 1)]
2025-12-04T12:15:05.1018151Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1018391Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1018631Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1018735Z graph_break []
2025-12-04T12:15:05.1018969Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1019080Z frames [('total', 1)]
2025-12-04T12:15:05.1019198Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1019436Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1019676Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1019779Z graph_break []
2025-12-04T12:15:05.1019946Z =================================== FAILURES ===================================
2025-12-04T12:15:05.1020238Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____
2025-12-04T12:15:05.1020380Z Traceback (most recent call last):
2025-12-04T12:15:05.1020783Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.1020943Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.1021454Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.1021759Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.1022424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.1022624Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.1023135Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.1023301Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.1023913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.1024238Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.1024773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.1024959Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.1025453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.1025824Z     return self._compile_to_module()
2025-12-04T12:15:05.1026319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.1026502Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.1027024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.1027172Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.1027673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.1027909Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.1028516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.1028644Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.1029126Z   File "/tmp/tmp7_bgkn17/w4/cw4vyp64zvdbwybls4u4tmfkgzxoykbl65jou4flvp457v3ti6uc.py", line 58, in <module>
2025-12-04T12:15:05.1029610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.1029760Z     kernel.precompile(
2025-12-04T12:15:05.1030336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.1030455Z     self._precompile_worker()
2025-12-04T12:15:05.1031051Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.1031245Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.1031840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1032057Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1032508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1032758Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1033218Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1033553Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1033785Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1034229Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1034324Z ^
2025-12-04T12:15:05.1034799Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1034805Z 
2025-12-04T12:15:05.1035516Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1035525Z 
2025-12-04T12:15:05.1035530Z 
2025-12-04T12:15:05.1035758Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1036370Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:05.1036377Z 
2025-12-04T12:15:05.1036649Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1036890Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1036997Z frames [('total', 1)]
2025-12-04T12:15:05.1037146Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1037386Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1037651Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1037768Z graph_break []
2025-12-04T12:15:05.1037989Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1038096Z frames [('total', 1)]
2025-12-04T12:15:05.1038229Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1038447Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1038684Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1038799Z graph_break []
2025-12-04T12:15:05.1039017Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1039121Z frames [('total', 1)]
2025-12-04T12:15:05.1039250Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1039468Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1039713Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1039814Z graph_break []
2025-12-04T12:15:05.1040469Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3087aa3d89d0a96b.xml -
2025-12-04T12:15:05.1040660Z =========================== short test summary info ============================
2025-12-04T12:15:05.1041409Z FAILED [0.3408s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1041853Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1041945Z ^
2025-12-04T12:15:05.1042407Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1042415Z 
2025-12-04T12:15:05.1043135Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1043141Z 
2025-12-04T12:15:05.1043148Z 
2025-12-04T12:15:05.1043368Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1043955Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:05.1043963Z 
2025-12-04T12:15:05.1044235Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1044417Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.1044646Z ============== 1 failed, 5 passed, 5 deselected, 2 rerun in 6.33s ==============
2025-12-04T12:15:05.1044749Z Got exit code 1
2025-12-04T12:15:05.1044871Z Retrying single test...
2025-12-04T12:15:05.1045346Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5fb4c628c04a1cdc.xml
2025-12-04T12:15:05.1045515Z ============================= test session starts ==============================
2025-12-04T12:15:05.1045885Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.1045998Z cachedir: .pytest_cache
2025-12-04T12:15:05.1046523Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.1046663Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.1046806Z configfile: pytest.ini
2025-12-04T12:15:05.1047409Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.1047634Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.1048309Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:05.1048476Z Running 1 items in this shard
2025-12-04T12:15:05.1048481Z 
2025-12-04T12:15:05.1049654Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.1050545Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1050977Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.1051422Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.1051953Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.1052419Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.1052972Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.1053548Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.1054144Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.1054726Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.1055285Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.1055743Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.1056261Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.1056842Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.1057307Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.1057754Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.1058420Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.1058945Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.1059501Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:05.1060001Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.1060620Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.1061210Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:05.1061838Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.1062417Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:05.1062884Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:05.1063331Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:05.1063918Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:05.1064357Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:05.1064945Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:05.1065481Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:05.1066213Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T12:15:05.1066622Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.1068557Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.1069109Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.1070159Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1070807Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1071900Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1072598Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1073489Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1074273Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1074964Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.1075840Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1076265Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.1077214Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1077364Z ('RERUN', {'yellow': True}) [3.2740s] [100%]
2025-12-04T12:15:05.1078504Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.1079386Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1079821Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.1080270Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.1080797Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.1081304Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.1081856Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.1082397Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.1082983Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.1083584Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.1084235Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.1084737Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.1085257Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.1085733Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.1086210Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.1086659Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.1087323Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.1087845Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.1088437Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:05.1088951Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.1089540Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.1090154Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:05.1090814Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.1091336Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:05.1091815Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:05.1092264Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:05.1092848Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:05.1093294Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:05.1093885Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:05.1094417Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:05.1095163Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T12:15:05.1095542Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.1097547Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.1098103Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.1099167Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1099815Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1100708Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1101406Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1102333Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1103104Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1103723Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.1104732Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1105123Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.1106018Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1106168Z ('RERUN', {'yellow': True}) [0.3258s] [100%]
2025-12-04T12:15:05.1107334Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.1108211Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1108659Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.1109163Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.1109695Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.1110159Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.1110694Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.1111255Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.1111850Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.1112449Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.1113012Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.1113457Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.1113991Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.1114468Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.1114948Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.1115392Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.1116068Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.1116608Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.1117152Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:05.1117670Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.1118328Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.1118910Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:05.1119541Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.1120049Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:05.1120528Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:05.1120974Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:05.1121563Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:05.1122002Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:05.1122608Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:05.1123157Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:05.1123867Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T12:15:05.1124247Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.1126180Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.1126724Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.1127764Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1128407Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1129294Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1130007Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1130924Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1131696Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1132377Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.1133251Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1133636Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.1134529Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1134642Z FAILED [0.3241s] [100%]
2025-12-04T12:15:05.1134649Z 
2025-12-04T12:15:05.1134816Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.1135107Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____
2025-12-04T12:15:05.1135256Z Traceback (most recent call last):
2025-12-04T12:15:05.1135661Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.1135849Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.1136424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.1136679Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.1137192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.1137399Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.1137912Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.1138077Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.1138614Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.1138941Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.1139478Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.1139635Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.1140131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.1140259Z     return self._compile_to_module()
2025-12-04T12:15:05.1140743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.1140922Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.1141440Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.1141571Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.1142086Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.1142321Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.1142965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.1143096Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.1143596Z   File "/tmp/tmpkmloqxox/sm/csm67ljxmul4unwsrhd45cdsj7wa7asxlqevhown7jvo2dnz77xa.py", line 58, in <module>
2025-12-04T12:15:05.1144071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.1144222Z     kernel.precompile(
2025-12-04T12:15:05.1144821Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.1144942Z     self._precompile_worker()
2025-12-04T12:15:05.1145559Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.1145759Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.1146359Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1146561Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1147028Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1147278Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1147733Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1148072Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1148302Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1148779Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1148874Z ^
2025-12-04T12:15:05.1149346Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1149351Z 
2025-12-04T12:15:05.1150070Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1150079Z 
2025-12-04T12:15:05.1150084Z 
2025-12-04T12:15:05.1150303Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1150897Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:05.1150902Z 
2025-12-04T12:15:05.1151176Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1151415Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1151522Z frames [('total', 1)]
2025-12-04T12:15:05.1151644Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1151896Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1152116Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1152217Z graph_break []
2025-12-04T12:15:05.1152523Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____
2025-12-04T12:15:05.1152649Z Traceback (most recent call last):
2025-12-04T12:15:05.1153059Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.1153220Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.1153711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.1153978Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.1154524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.1154733Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.1155241Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.1155388Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.1155936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.1156294Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.1156849Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.1157015Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.1157501Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.1157644Z     return self._compile_to_module()
2025-12-04T12:15:05.1158130Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.1158295Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.1158823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.1158959Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.1159472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.1159705Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.1160292Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.1160470Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.1160947Z   File "/tmp/tmpm_8a6myn/7s/c7scoabjmz4lq7otdr2zlv55wkha6upjywo2ftq6bgeoee6simbd.py", line 58, in <module>
2025-12-04T12:15:05.1161413Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.1161540Z     kernel.precompile(
2025-12-04T12:15:05.1162098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.1162232Z     self._precompile_worker()
2025-12-04T12:15:05.1162828Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.1163006Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.1163617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1163819Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1164281Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1164528Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1164970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1165322Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1165549Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1165986Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1166097Z ^
2025-12-04T12:15:05.1166557Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1166562Z 
2025-12-04T12:15:05.1167328Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1167335Z 
2025-12-04T12:15:05.1167339Z 
2025-12-04T12:15:05.1167558Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1168151Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:05.1168204Z 
2025-12-04T12:15:05.1168474Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1168727Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1168854Z frames [('total', 1)]
2025-12-04T12:15:05.1168973Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1169218Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1169464Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1169566Z graph_break []
2025-12-04T12:15:05.1169804Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1169913Z frames [('total', 1)]
2025-12-04T12:15:05.1170028Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1170262Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1170500Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1170602Z graph_break []
2025-12-04T12:15:05.1170762Z =================================== FAILURES ===================================
2025-12-04T12:15:05.1171261Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____
2025-12-04T12:15:05.1171403Z Traceback (most recent call last):
2025-12-04T12:15:05.1171891Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.1172050Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.1172558Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.1172809Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.1173322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.1173532Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.1174046Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.1174209Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.1174750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.1175073Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.1175617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.1175765Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.1176273Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.1176465Z     return self._compile_to_module()
2025-12-04T12:15:05.1176960Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.1177144Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.1177666Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.1177801Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.1178370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.1178605Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.1179205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.1179335Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.1179846Z   File "/tmp/tmpuhdzuvz8/hi/chiiftdpysrhhsk6nfs3sptp5gxmdbmcgijuvcbktwfco6sf4d5u.py", line 58, in <module>
2025-12-04T12:15:05.1180371Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.1180530Z     kernel.precompile(
2025-12-04T12:15:05.1181177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.1181344Z     self._precompile_worker()
2025-12-04T12:15:05.1182001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.1182199Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.1182796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1182998Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1183471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1183718Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1184181Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1184522Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1184798Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1185249Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1185343Z ^
2025-12-04T12:15:05.1185818Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1185825Z 
2025-12-04T12:15:05.1186544Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1186552Z 
2025-12-04T12:15:05.1186557Z 
2025-12-04T12:15:05.1186781Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1187372Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:05.1187380Z 
2025-12-04T12:15:05.1187649Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1187889Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1187998Z frames [('total', 1)]
2025-12-04T12:15:05.1188116Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1188369Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1188590Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1188694Z graph_break []
2025-12-04T12:15:05.1188926Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1189029Z frames [('total', 1)]
2025-12-04T12:15:05.1189162Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1189384Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1189619Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1189735Z graph_break []
2025-12-04T12:15:05.1189949Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1190089Z frames [('total', 1)]
2025-12-04T12:15:05.1190223Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1190437Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1190685Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1190784Z graph_break []
2025-12-04T12:15:05.1191829Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5fb4c628c04a1cdc.xml -
2025-12-04T12:15:05.1192090Z =========================== short test summary info ============================
2025-12-04T12:15:05.1192843Z FAILED [0.3241s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1193280Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1193386Z ^
2025-12-04T12:15:05.1193844Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1193850Z 
2025-12-04T12:15:05.1194580Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1194586Z 
2025-12-04T12:15:05.1194593Z 
2025-12-04T12:15:05.1194815Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1195406Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:05.1195412Z 
2025-12-04T12:15:05.1195679Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1195896Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.1196118Z ================== 1 failed, 187 deselected, 2 rerun in 3.97s ==================
2025-12-04T12:15:05.1196220Z Got exit code 1
2025-12-04T12:15:05.1196330Z Retrying single test...
2025-12-04T12:15:05.1196816Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0d752e0bfa5071ea.xml
2025-12-04T12:15:05.1196981Z ============================= test session starts ==============================
2025-12-04T12:15:05.1197348Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.1197461Z cachedir: .pytest_cache
2025-12-04T12:15:05.1197985Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.1198129Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.1198240Z configfile: pytest.ini
2025-12-04T12:15:05.1198834Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.1199072Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.1199724Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:05.1199857Z Running 1 items in this shard
2025-12-04T12:15:05.1199862Z 
2025-12-04T12:15:05.1201008Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.1201901Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1202337Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.1202811Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.1203339Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.1203876Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.1204528Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.1205123Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.1205715Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.1206314Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.1206872Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.1207327Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.1207852Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.1208328Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.1208804Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.1209290Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.1209955Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.1210476Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.1211029Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:05.1211534Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.1212116Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.1212709Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:05.1213339Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.1213860Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:05.1214333Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:05.1214782Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:05.1215365Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:05.1215872Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:05.1216563Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:05.1217100Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:05.1217814Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T12:15:05.1218270Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.1220214Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.1220766Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.1221816Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1222460Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1223387Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1224078Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1224958Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1225733Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1226358Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.1227233Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1227617Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.1228510Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1228664Z ('RERUN', {'yellow': True}) [3.2742s] [100%]
2025-12-04T12:15:05.1229800Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.1230708Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1231152Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.1231861Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.1232435Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.1232931Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.1233465Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.1234026Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.1234610Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.1235205Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.1235762Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.1236221Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.1236737Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.1237245Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.1237719Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.1238168Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.1238826Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.1239356Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.1239897Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:05.1240410Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.1240994Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.1241577Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:05.1242206Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.1242717Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:05.1243195Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:05.1243644Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:05.1244262Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:05.1244701Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:05.1245277Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:05.1245856Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:05.1246598Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T12:15:05.1246985Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.1248930Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.1249487Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.1250532Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1251218Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1252110Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1252791Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1253697Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1254474Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1255094Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.1255970Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1256417Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.1257480Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1257621Z ('RERUN', {'yellow': True}) [0.3327s] [100%]
2025-12-04T12:15:05.1258819Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.1259715Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1260222Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.1260841Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.1261511Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.1261977Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.1262514Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.1263068Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.1263651Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.1264249Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.1264807Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.1265311Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.1265846Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.1266315Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.1266787Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.1267236Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.1267885Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.1268421Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.1268973Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:05.1269489Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.1270072Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.1270647Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:05.1271475Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.1271986Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:05.1272559Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:05.1273011Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:05.1273595Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:05.1274079Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:05.1274691Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:05.1275240Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:05.1275955Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T12:15:05.1276337Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.1278274Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.1278826Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.1279919Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1280554Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1281473Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1282154Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1283060Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1283832Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1284454Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.1285330Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1285699Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.1286642Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1286750Z FAILED [0.3320s] [100%]
2025-12-04T12:15:05.1286756Z 
2025-12-04T12:15:05.1286917Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.1287211Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____
2025-12-04T12:15:05.1287338Z Traceback (most recent call last):
2025-12-04T12:15:05.1287781Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.1287940Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.1288474Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.1288727Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.1289248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.1289461Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.1289971Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.1290132Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.1290667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.1290992Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.1291530Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.1291680Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.1292293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.1292452Z     return self._compile_to_module()
2025-12-04T12:15:05.1292962Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.1293145Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.1293662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.1293795Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.1294304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.1294539Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.1295140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.1295271Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.1295779Z   File "/tmp/tmpp8o7idow/xn/cxnwjbv4kkjuhrs3bzu23uu66vz7hji6zr45at7jbbnlflwtqy2z.py", line 58, in <module>
2025-12-04T12:15:05.1296257Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.1296438Z     kernel.precompile(
2025-12-04T12:15:05.1296993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.1297128Z     self._precompile_worker()
2025-12-04T12:15:05.1297726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.1297921Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.1298519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1298721Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1299232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1299480Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1299933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1300301Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1300536Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1301015Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1301110Z ^
2025-12-04T12:15:05.1301569Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1301591Z 
2025-12-04T12:15:05.1302313Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1302320Z 
2025-12-04T12:15:05.1302325Z 
2025-12-04T12:15:05.1302544Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1303139Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:05.1303149Z 
2025-12-04T12:15:05.1303421Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1303663Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1303771Z frames [('total', 1)]
2025-12-04T12:15:05.1303889Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1304172Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1304397Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1304500Z graph_break []
2025-12-04T12:15:05.1304805Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____
2025-12-04T12:15:05.1304930Z Traceback (most recent call last):
2025-12-04T12:15:05.1305342Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.1305502Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.1305992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.1306254Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.1306765Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.1306962Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.1307493Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.1307642Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.1308191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.1308515Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.1309039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.1309203Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.1309687Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.1309824Z     return self._compile_to_module()
2025-12-04T12:15:05.1310313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.1310590Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.1311129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.1311261Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.1311761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.1312041Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.1312676Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.1313145Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.1313660Z   File "/tmp/tmpgee3xitr/ei/ceixziejlt5nrat6w2xlvumtvbh4hkg5o4amqsjju4anlnub3a3z.py", line 58, in <module>
2025-12-04T12:15:05.1314205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.1314362Z     kernel.precompile(
2025-12-04T12:15:05.1314921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.1315052Z     self._precompile_worker()
2025-12-04T12:15:05.1315650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.1315833Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.1316445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1316646Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1317098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1317591Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1318095Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1318499Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1318900Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1319339Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1319463Z ^
2025-12-04T12:15:05.1320043Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1320050Z 
2025-12-04T12:15:05.1320773Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1320782Z 
2025-12-04T12:15:05.1320787Z 
2025-12-04T12:15:05.1321007Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1321583Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:05.1321602Z 
2025-12-04T12:15:05.1321871Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1322095Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1322217Z frames [('total', 1)]
2025-12-04T12:15:05.1322339Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1322581Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1322816Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1322918Z graph_break []
2025-12-04T12:15:05.1323140Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1323256Z frames [('total', 1)]
2025-12-04T12:15:05.1323428Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1323664Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1323897Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1323997Z graph_break []
2025-12-04T12:15:05.1324159Z =================================== FAILURES ===================================
2025-12-04T12:15:05.1324483Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____
2025-12-04T12:15:05.1324608Z Traceback (most recent call last):
2025-12-04T12:15:05.1325059Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.1325217Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.1325722Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.1325976Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.1326492Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.1326699Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.1327209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.1327372Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.1327905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.1328230Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.1328766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.1328946Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.1329429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.1329569Z     return self._compile_to_module()
2025-12-04T12:15:05.1330056Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.1330239Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.1330761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.1330892Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.1331407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.1331642Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.1332250Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.1332379Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.1332854Z   File "/tmp/tmpm_wfro_n/ug/cugkxgnqbwfx3x2whxlt522gco5jleyjoobigwqugfoemjvcekri.py", line 58, in <module>
2025-12-04T12:15:05.1333329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.1333443Z     kernel.precompile(
2025-12-04T12:15:05.1334001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.1334132Z     self._precompile_worker()
2025-12-04T12:15:05.1334730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.1334923Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.1335550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1335752Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1336217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1336555Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1337052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1337388Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1337647Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1338098Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1338193Z ^
2025-12-04T12:15:05.1338653Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1338659Z 
2025-12-04T12:15:05.1339389Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1339395Z 
2025-12-04T12:15:05.1339400Z 
2025-12-04T12:15:05.1339619Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1340217Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:05.1340222Z 
2025-12-04T12:15:05.1340495Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1340734Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1340873Z frames [('total', 1)]
2025-12-04T12:15:05.1340990Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1341247Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1341472Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1341576Z graph_break []
2025-12-04T12:15:05.1341815Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1341922Z frames [('total', 1)]
2025-12-04T12:15:05.1342041Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1342279Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1342514Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1342628Z graph_break []
2025-12-04T12:15:05.1342848Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1342956Z frames [('total', 1)]
2025-12-04T12:15:05.1343091Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1343312Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1343549Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1343666Z graph_break []
2025-12-04T12:15:05.1344327Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0d752e0bfa5071ea.xml -
2025-12-04T12:15:05.1344518Z =========================== short test summary info ============================
2025-12-04T12:15:05.1345242Z FAILED [0.3320s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1345684Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1345791Z ^
2025-12-04T12:15:05.1346252Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1346260Z 
2025-12-04T12:15:05.1347029Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1347036Z 
2025-12-04T12:15:05.1347040Z 
2025-12-04T12:15:05.1347260Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1347840Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:05.1347890Z 
2025-12-04T12:15:05.1348160Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1348373Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.1348594Z ================== 1 failed, 187 deselected, 2 rerun in 3.98s ==================
2025-12-04T12:15:05.1348697Z Got exit code 1
2025-12-04T12:15:05.1349191Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T12:15:05.1349621Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:05.1350099Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fedbc7df4b1c2869.xml
2025-12-04T12:15:05.1350281Z ============================= test session starts ==============================
2025-12-04T12:15:05.1350632Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.1350747Z cachedir: .pytest_cache
2025-12-04T12:15:05.1351281Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.1351407Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.1351516Z configfile: pytest.ini
2025-12-04T12:15:05.1352124Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.1352387Z collecting ... collected 188 items / 11 deselected / 177 selected
2025-12-04T12:15:05.1352544Z stepcurrent: skipping 11 already run items.
2025-12-04T12:15:05.1352662Z Running 177 items in this shard
2025-12-04T12:15:05.1352667Z 
2025-12-04T12:15:05.1353812Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.1354698Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1355127Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.1355592Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.1356113Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.1356578Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.1357125Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.1357667Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.1358272Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.1358856Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.1359456Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.1359900Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.1360417Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.1360952Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.1361439Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.1361903Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.1362561Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.1363082Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.1363634Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:05.1364137Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.1364737Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.1365308Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:05.1365967Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.1366490Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:05.1366957Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:05.1367414Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:05.1367983Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:05.1368421Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:05.1369016Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:05.1369549Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:05.1370277Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T12:15:05.1370701Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.1372987Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.1373536Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.1374723Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1375462Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1376688Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1377669Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1378839Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1379631Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1380243Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.1381134Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1381598Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.1382492Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1382649Z ('RERUN', {'yellow': True}) [3.2672s] [  0%]
2025-12-04T12:15:05.1383798Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.1384677Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1385112Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.1385572Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.1386091Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.1386554Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.1387104Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.1387645Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.1388280Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.1388867Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.1389425Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.1389919Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.1390465Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.1390953Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.1391418Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.1391861Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.1392522Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.1393041Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.1393599Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:05.1394097Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.1394708Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.1395291Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:05.1395921Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.1396442Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:05.1396911Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:05.1397372Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:05.1397943Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:05.1398384Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:05.1398970Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:05.1399502Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:05.1400232Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T12:15:05.1400592Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.1402556Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.1403136Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.1404222Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1404870Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1405769Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1406461Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1407346Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1408124Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1408762Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.1409627Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1410010Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.1410904Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1411051Z ('RERUN', {'yellow': True}) [0.3237s] [  0%]
2025-12-04T12:15:05.1412187Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.1413066Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1413497Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.1413948Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.1414482Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.1414942Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.1415495Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.1416086Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.1416747Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.1417341Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.1417964Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.1418425Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.1418950Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.1419427Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.1419905Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.1420356Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.1421017Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.1421541Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.1422095Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:05.1422630Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.1423211Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.1423796Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:05.1424425Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.1424940Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:05.1425409Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:05.1432629Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:05.1433508Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:05.1433962Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:05.1434903Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:05.1435558Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:05.1436327Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T12:15:05.1436850Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.1438874Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.1439472Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.1440523Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1441165Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1442061Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1442758Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1443643Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1444457Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1445075Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.1445948Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1446332Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.1447230Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1447338Z FAILED [0.3204s] [  0%]
2025-12-04T12:15:05.1447359Z 
2025-12-04T12:15:05.1447510Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.1447803Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____
2025-12-04T12:15:05.1447945Z Traceback (most recent call last):
2025-12-04T12:15:05.1448345Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.1448505Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.1449011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.1449267Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.1449798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.1449996Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.1450540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.1450703Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.1451238Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.1451558Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.1452128Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.1452306Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.1452799Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.1452927Z     return self._compile_to_module()
2025-12-04T12:15:05.1453416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.1453590Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.1454108Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.1454253Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.1454747Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.1454984Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.1455587Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.1455713Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.1456241Z   File "/tmp/tmpx1q1l5d9/jp/cjp7bmi7v7ypzibnpt4wclabccxrx6672ohcappxd44gst6cvgcr.py", line 58, in <module>
2025-12-04T12:15:05.1456816Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.1456930Z     kernel.precompile(
2025-12-04T12:15:05.1457496Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.1457613Z     self._precompile_worker()
2025-12-04T12:15:05.1458211Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.1458407Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.1459003Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1459220Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1459672Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1459922Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1460380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1460715Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1460943Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1461391Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1461486Z ^
2025-12-04T12:15:05.1461954Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1461961Z 
2025-12-04T12:15:05.1462674Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1462720Z 
2025-12-04T12:15:05.1462726Z 
2025-12-04T12:15:05.1462961Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1463544Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:05.1463550Z 
2025-12-04T12:15:05.1463817Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1464089Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1464198Z frames [('total', 1)]
2025-12-04T12:15:05.1464349Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1464602Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1464823Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1464940Z graph_break []
2025-12-04T12:15:05.1465232Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____
2025-12-04T12:15:05.1465356Z Traceback (most recent call last):
2025-12-04T12:15:05.1465772Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.1465928Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.1466416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.1466678Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.1467192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.1467393Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.1467899Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.1468091Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.1468634Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.1468952Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.1469484Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.1469634Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.1470112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.1470248Z     return self._compile_to_module()
2025-12-04T12:15:05.1470730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.1470897Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.1471615Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.1471748Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.1472261Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.1472496Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.1473088Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.1473233Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.1473735Z   File "/tmp/tmpfqj4hjli/ze/cze3y35l3qkwon2z22jt6gic5gihofiimccweugivfjhak65tw7a.py", line 58, in <module>
2025-12-04T12:15:05.1474212Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.1474331Z     kernel.precompile(
2025-12-04T12:15:05.1474974Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.1475108Z     self._precompile_worker()
2025-12-04T12:15:05.1475706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.1475887Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.1476541Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1476788Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1477255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1477502Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1477945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1478297Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1478526Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1478973Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1479067Z ^
2025-12-04T12:15:05.1479519Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1479525Z 
2025-12-04T12:15:05.1480254Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1480261Z 
2025-12-04T12:15:05.1480312Z 
2025-12-04T12:15:05.1480531Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1481126Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:05.1481132Z 
2025-12-04T12:15:05.1481401Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1481622Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1481741Z frames [('total', 1)]
2025-12-04T12:15:05.1481903Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1482153Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1482372Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1482471Z graph_break []
2025-12-04T12:15:05.1482704Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1482813Z frames [('total', 1)]
2025-12-04T12:15:05.1482927Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1483160Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1483399Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1483509Z graph_break []
2025-12-04T12:15:05.1483656Z =================================== FAILURES ===================================
2025-12-04T12:15:05.1484034Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____
2025-12-04T12:15:05.1484213Z Traceback (most recent call last):
2025-12-04T12:15:05.1484645Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.1484801Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.1485300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.1485549Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.1486119Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.1486317Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.1486829Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.1486986Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.1487521Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.1487873Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.1488432Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.1488582Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.1489074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.1489203Z     return self._compile_to_module()
2025-12-04T12:15:05.1489727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.1489968Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.1490538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.1490683Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.1491722Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.1491998Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.1492714Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.1493001Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.1493834Z   File "/tmp/tmpbalprfne/ll/cllr2iscaoldjg5hjwmklegpzmgueaoxwb5heq27w6iuupzc5wju.py", line 58, in <module>
2025-12-04T12:15:05.1494345Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.1494478Z     kernel.precompile(
2025-12-04T12:15:05.1495070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.1495195Z     self._precompile_worker()
2025-12-04T12:15:05.1495790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.1496039Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.1497104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1497457Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1498139Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1498396Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1498852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1499192Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1499511Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1499974Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1500084Z ^
2025-12-04T12:15:05.1500545Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1500555Z 
2025-12-04T12:15:05.1501346Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1501353Z 
2025-12-04T12:15:05.1501358Z 
2025-12-04T12:15:05.1501588Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1502169Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:05.1502213Z 
2025-12-04T12:15:05.1502482Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1502755Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1502863Z frames [('total', 1)]
2025-12-04T12:15:05.1502980Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1503294Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1503571Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1503692Z graph_break []
2025-12-04T12:15:05.1503916Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1504021Z frames [('total', 1)]
2025-12-04T12:15:05.1504152Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1504374Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1504611Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1504731Z graph_break []
2025-12-04T12:15:05.1505001Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1505122Z frames [('total', 1)]
2025-12-04T12:15:05.1505238Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1505455Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1505750Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1505852Z graph_break []
2025-12-04T12:15:05.1506515Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fedbc7df4b1c2869.xml -
2025-12-04T12:15:05.1506705Z =========================== short test summary info ============================
2025-12-04T12:15:05.1507426Z FAILED [0.3204s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1507879Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1507970Z ^
2025-12-04T12:15:05.1508436Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1508441Z 
2025-12-04T12:15:05.1509169Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1509178Z 
2025-12-04T12:15:05.1509185Z 
2025-12-04T12:15:05.1509403Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1510000Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:05.1510005Z 
2025-12-04T12:15:05.1510273Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1510460Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.1510678Z ================== 1 failed, 11 deselected, 2 rerun in 3.95s ===================
2025-12-04T12:15:05.1510785Z Got exit code 1
2025-12-04T12:15:05.1510906Z Retrying single test...
2025-12-04T12:15:05.1511385Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3cf6d62b643bfad7.xml
2025-12-04T12:15:05.1511555Z ============================= test session starts ==============================
2025-12-04T12:15:05.1511957Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.1512100Z cachedir: .pytest_cache
2025-12-04T12:15:05.1512622Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.1512759Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.1512871Z configfile: pytest.ini
2025-12-04T12:15:05.1513507Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.1513781Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.1514441Z stepcurrent: skipping 11 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:05.1514574Z Running 1 items in this shard
2025-12-04T12:15:05.1514579Z 
2025-12-04T12:15:05.1515722Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.1516613Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1517052Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.1517512Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.1518033Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.1518605Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.1519154Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.1519697Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.1520298Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.1520890Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.1521444Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.1521907Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.1522425Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.1522909Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.1523367Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.1523815Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.1524472Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.1525025Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.1525581Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:05.1526076Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.1526657Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.1527295Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:05.1527920Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.1528442Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:05.1528907Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:05.1529349Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:05.1529937Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:05.1530378Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:05.1530968Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:05.1531823Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:05.1532561Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T12:15:05.1532922Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.1534853Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.1535412Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.1536526Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1537170Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1538062Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1538751Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1539688Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1540521Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1541186Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.1542135Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1542516Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.1543417Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1543565Z ('RERUN', {'yellow': True}) [3.2888s] [100%]
2025-12-04T12:15:05.1544699Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.1545580Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1546011Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.1546492Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.1547027Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.1547489Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.1548038Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.1548581Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.1549320Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.1550355Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.1551241Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.1551998Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.1552631Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.1553158Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.1553635Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.1554083Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.1554821Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.1555349Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.1555894Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:05.1556449Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.1557108Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.1557700Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:05.1558330Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.1558859Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:05.1559330Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:05.1559777Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:05.1560406Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:05.1560886Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:05.1561563Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:05.1562102Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:05.1563223Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T12:15:05.1563867Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.1567191Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.1568127Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.1569205Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1569855Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1570746Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1571741Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1572630Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1573477Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1574154Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.1575032Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1575406Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.1576351Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1576508Z ('RERUN', {'yellow': True}) [0.3350s] [100%]
2025-12-04T12:15:05.1577657Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.1578523Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1579023Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.1579472Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.1580000Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.1580464Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.1581003Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.1581561Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.1582150Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.1582745Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.1583305Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.1583765Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.1584282Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.1584754Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.1585259Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.1585709Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.1586365Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.1586920Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.1587490Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:05.1588001Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.1588587Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.1589169Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:05.1589794Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.1590306Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:05.1590786Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:05.1591230Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:05.1591854Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:05.1592297Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:05.1592869Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:05.1593413Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:05.1594132Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T12:15:05.1594515Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.1596468Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.1597016Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.1598060Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1598798Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1599731Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1600412Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1601370Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1602145Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1602770Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.1603641Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1604021Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.1604925Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1605031Z FAILED [0.3360s] [100%]
2025-12-04T12:15:05.1605038Z 
2025-12-04T12:15:05.1605198Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.1605523Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____
2025-12-04T12:15:05.1605665Z Traceback (most recent call last):
2025-12-04T12:15:05.1606066Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.1606224Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.1606729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.1606981Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.1607578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.1607895Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.1608412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.1608587Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.1609124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.1609524Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.1610430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.1610677Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.1611218Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.1611348Z     return self._compile_to_module()
2025-12-04T12:15:05.1611878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.1612098Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.1613021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.1613232Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.1613929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.1614205Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.1614951Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.1615129Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.1615709Z   File "/tmp/tmp3yt0jipj/ly/clyw34eolqozc3qd2w5eu6cjthldi2jqg5jtgupxlfwg2wb4jphf.py", line 58, in <module>
2025-12-04T12:15:05.1616230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.1616425Z     kernel.precompile(
2025-12-04T12:15:05.1617003Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.1617127Z     self._precompile_worker()
2025-12-04T12:15:05.1617726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.1617971Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.1618608Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1618817Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1619325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1619576Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1620079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1620421Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1620650Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1621100Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1621191Z ^
2025-12-04T12:15:05.1621666Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1621672Z 
2025-12-04T12:15:05.1622481Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1622488Z 
2025-12-04T12:15:05.1622493Z 
2025-12-04T12:15:05.1622714Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1623321Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:05.1623327Z 
2025-12-04T12:15:05.1623673Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1623956Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1624067Z frames [('total', 1)]
2025-12-04T12:15:05.1624187Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1624447Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1624672Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1624790Z graph_break []
2025-12-04T12:15:05.1625085Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____
2025-12-04T12:15:05.1625210Z Traceback (most recent call last):
2025-12-04T12:15:05.1625623Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.1625844Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.1626334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.1626957Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.1627524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.1627827Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.1628418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.1628568Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.1629159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.1629522Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.1630100Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.1630250Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.1630731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.1630869Z     return self._compile_to_module()
2025-12-04T12:15:05.1631358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.1631529Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.1632149Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.1632322Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.1632878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.1633146Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.1633773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.1633915Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.1634454Z   File "/tmp/tmp5d1dhucf/w3/cw3g23zyjiveu3xexvruyv4swrhhon324rpzi3cas6hxw53qffpi.py", line 58, in <module>
2025-12-04T12:15:05.1635101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.1635216Z     kernel.precompile(
2025-12-04T12:15:05.1635772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.1635907Z     self._precompile_worker()
2025-12-04T12:15:05.1636511Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.1636756Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.1637405Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1637643Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1638151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1638399Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1638844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1639190Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1639421Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1639911Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1640004Z ^
2025-12-04T12:15:05.1640461Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1640468Z 
2025-12-04T12:15:05.1641195Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1641233Z 
2025-12-04T12:15:05.1641238Z 
2025-12-04T12:15:05.1641489Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1642085Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:05.1642093Z 
2025-12-04T12:15:05.1642361Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1642589Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1642712Z frames [('total', 1)]
2025-12-04T12:15:05.1642830Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1643081Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1643307Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1643412Z graph_break []
2025-12-04T12:15:05.1643647Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1643755Z frames [('total', 1)]
2025-12-04T12:15:05.1643876Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1644113Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1644351Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1644489Z graph_break []
2025-12-04T12:15:05.1644649Z =================================== FAILURES ===================================
2025-12-04T12:15:05.1644945Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____
2025-12-04T12:15:05.1645084Z Traceback (most recent call last):
2025-12-04T12:15:05.1645486Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.1645642Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.1646150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.1646398Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.1646915Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.1647120Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.1647633Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.1647795Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.1648330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.1648651Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.1649186Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.1649337Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.1649829Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.1649951Z     return self._compile_to_module()
2025-12-04T12:15:05.1650434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.1650651Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.1651170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.1651300Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.1651811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.1652077Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.1652707Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.1652837Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.1653345Z   File "/tmp/tmp3khqosgp/7z/c7zlrawrzs2ffa3mrvgu5rfsrgbpcxqqoafrhdaq5t7pkuovwzzf.py", line 58, in <module>
2025-12-04T12:15:05.1653824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.1653942Z     kernel.precompile(
2025-12-04T12:15:05.1654512Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.1654635Z     self._precompile_worker()
2025-12-04T12:15:05.1655287Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.1655484Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.1656078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1656278Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1656815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1657107Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1657566Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1657903Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1658131Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1658573Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1658666Z ^
2025-12-04T12:15:05.1659140Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1659145Z 
2025-12-04T12:15:05.1659857Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1659865Z 
2025-12-04T12:15:05.1659870Z 
2025-12-04T12:15:05.1660092Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1660684Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:05.1660690Z 
2025-12-04T12:15:05.1660957Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1661193Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1661302Z frames [('total', 1)]
2025-12-04T12:15:05.1661419Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1661674Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1661895Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1662009Z graph_break []
2025-12-04T12:15:05.1662233Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1662338Z frames [('total', 1)]
2025-12-04T12:15:05.1662466Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1662720Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1662957Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1663075Z graph_break []
2025-12-04T12:15:05.1663294Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1663403Z frames [('total', 1)]
2025-12-04T12:15:05.1663568Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1663783Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1664075Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1664216Z graph_break []
2025-12-04T12:15:05.1664913Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3cf6d62b643bfad7.xml -
2025-12-04T12:15:05.1665148Z =========================== short test summary info ============================
2025-12-04T12:15:05.1665918Z FAILED [0.3360s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1666391Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1666502Z ^
2025-12-04T12:15:05.1666966Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1666975Z 
2025-12-04T12:15:05.1667746Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1667752Z 
2025-12-04T12:15:05.1667756Z 
2025-12-04T12:15:05.1667976Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1668642Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:05.1668648Z 
2025-12-04T12:15:05.1668918Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1669100Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.1669341Z ================== 1 failed, 187 deselected, 2 rerun in 4.00s ==================
2025-12-04T12:15:05.1669469Z Got exit code 1
2025-12-04T12:15:05.1669579Z Retrying single test...
2025-12-04T12:15:05.1670070Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e2e744a24cd2751e.xml
2025-12-04T12:15:05.1670236Z ============================= test session starts ==============================
2025-12-04T12:15:05.1670644Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.1670763Z cachedir: .pytest_cache
2025-12-04T12:15:05.1671590Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.1671739Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.1671888Z configfile: pytest.ini
2025-12-04T12:15:05.1672542Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.1672802Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.1673474Z stepcurrent: skipping 11 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:05.1673607Z Running 1 items in this shard
2025-12-04T12:15:05.1673613Z 
2025-12-04T12:15:05.1674815Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.1675893Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1676367Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.1676876Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.1677468Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.1677938Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.1678491Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.1679036Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.1679619Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.1680219Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.1680782Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.1681246Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.1681817Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.1682310Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.1682774Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.1683226Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.1683892Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.1684417Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.1684979Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:05.1685486Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.1686069Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.1686654Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:05.1687282Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.1687809Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:05.1688281Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:05.1688772Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:05.1689366Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:05.1689807Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:05.1690427Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:05.1690990Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:05.1691706Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T12:15:05.1692091Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.1694029Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.1694584Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.1695634Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1696375Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1697270Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1697973Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1698860Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1699632Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1700251Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.1701129Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1701512Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.1702400Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1702552Z ('RERUN', {'yellow': True}) [3.2738s] [100%]
2025-12-04T12:15:05.1703733Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.1704601Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1705105Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.1705551Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.1706085Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.1706556Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.1707110Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.1707652Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.1708235Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.1708837Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.1709394Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.1709898Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.1710415Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.1710889Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.1711362Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.1711811Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.1712472Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.1712996Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.1713543Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:05.1714057Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.1714639Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.1715220Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:05.1715845Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.1716404Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:05.1716885Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:05.1717331Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:05.1717914Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:05.1718433Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:05.1719004Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:05.1719551Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:05.1720264Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T12:15:05.1720643Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.1722578Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.1723169Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.1724208Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1724847Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1725740Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1726416Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1727314Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1728090Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1728715Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.1729590Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1729975Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.1730911Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1731049Z ('RERUN', {'yellow': True}) [0.3303s] [100%]
2025-12-04T12:15:05.1732194Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.1733203Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1733653Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.1734105Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.1734641Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.1735105Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.1735642Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.1736204Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.1736855Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.1737502Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.1738061Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.1738505Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.1739040Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.1739515Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.1739989Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.1740438Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.1741087Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.1741621Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.1742164Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T12:15:05.1742678Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.1743258Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.1743843Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T12:15:05.1744507Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.1745012Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:05.1745495Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T12:15:05.1745969Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T12:15:05.1746582Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T12:15:05.1747022Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T12:15:05.1747599Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T12:15:05.1748144Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T12:15:05.1748856Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T12:15:05.1749237Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.1751164Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.1751749Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.1752794Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1753426Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1754327Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1755010Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1755921Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1756693Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1757314Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.1758225Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1758608Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.1759501Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1759639Z FAILED [0.3260s] [100%]
2025-12-04T12:15:05.1759645Z 
2025-12-04T12:15:05.1759805Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.1760130Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____
2025-12-04T12:15:05.1760258Z Traceback (most recent call last):
2025-12-04T12:15:05.1760674Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.1760835Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.1761343Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.1761592Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.1762108Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.1762317Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.1762832Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.1762998Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.1763534Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.1763890Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.1764427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.1764577Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.1765053Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.1765194Z     return self._compile_to_module()
2025-12-04T12:15:05.1765685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.1765867Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.1766388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.1766518Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.1767031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.1767267Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.1767868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.1767998Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.1768474Z   File "/tmp/tmpiq_xyrgc/qw/cqwfwgqwf53dh6to6cphxvynpeqqb2g6xdq27nvoojjd6baluz5v.py", line 58, in <module>
2025-12-04T12:15:05.1768953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.1769069Z     kernel.precompile(
2025-12-04T12:15:05.1769625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.1769759Z     self._precompile_worker()
2025-12-04T12:15:05.1770400Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.1770596Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.1771386Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1771586Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1772156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1772404Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1772903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1773240Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1773470Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1773922Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1774016Z ^
2025-12-04T12:15:05.1774474Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1774504Z 
2025-12-04T12:15:05.1775221Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1775230Z 
2025-12-04T12:15:05.1775235Z 
2025-12-04T12:15:05.1775457Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1776055Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:05.1776860Z 
2025-12-04T12:15:05.1777148Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1777394Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1777503Z frames [('total', 1)]
2025-12-04T12:15:05.1777623Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1777882Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1778110Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1778213Z graph_break []
2025-12-04T12:15:05.1778526Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____
2025-12-04T12:15:05.1778653Z Traceback (most recent call last):
2025-12-04T12:15:05.1779071Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.1779230Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.1779720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.1779991Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.1780506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.1780715Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.1781231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.1781384Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.1781936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.1782262Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.1782783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.1782952Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.1783494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.1783634Z     return self._compile_to_module()
2025-12-04T12:15:05.1784122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.1784288Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.1784863Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.1785028Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.1785541Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.1785775Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.1786368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.1786516Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.1786994Z   File "/tmp/tmpxx_20eu0/en/cenlg47guevxwfvbzco5q43q2pmysmyoe44t3zdvbfokblfu7w4q.py", line 58, in <module>
2025-12-04T12:15:05.1787457Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.1787591Z     kernel.precompile(
2025-12-04T12:15:05.1788147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.1788285Z     self._precompile_worker()
2025-12-04T12:15:05.1788883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.1789102Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.1789721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1789924Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1790387Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1790632Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1791079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1791431Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1791657Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1792091Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1792197Z ^
2025-12-04T12:15:05.1792654Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1792660Z 
2025-12-04T12:15:05.1793399Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1793406Z 
2025-12-04T12:15:05.1793410Z 
2025-12-04T12:15:05.1793627Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1794225Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:05.1794233Z 
2025-12-04T12:15:05.1794500Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1794722Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1794841Z frames [('total', 1)]
2025-12-04T12:15:05.1794959Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1795231Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1795468Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1795569Z graph_break []
2025-12-04T12:15:05.1795803Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1795908Z frames [('total', 1)]
2025-12-04T12:15:05.1796025Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1796292Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1796530Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1796665Z graph_break []
2025-12-04T12:15:05.1796830Z =================================== FAILURES ===================================
2025-12-04T12:15:05.1797121Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____
2025-12-04T12:15:05.1797250Z Traceback (most recent call last):
2025-12-04T12:15:05.1797666Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.1797826Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.1798326Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.1798576Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.1799089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.1799296Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.1799808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.1799968Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.1800538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.1800859Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.1801388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.1801537Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.1802017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.1802157Z     return self._compile_to_module()
2025-12-04T12:15:05.1802642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.1802819Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.1803334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.1803469Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.1803983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.1804213Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.1804810Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.1804942Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.1805444Z   File "/tmp/tmpiwa3423i/up/cupqm6wvqnoow64ss6ujrfpfoqsauevwfpwscdjoflk47zwsmo52.py", line 58, in <module>
2025-12-04T12:15:05.1805923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.1806034Z     kernel.precompile(
2025-12-04T12:15:05.1806594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.1806764Z     self._precompile_worker()
2025-12-04T12:15:05.1807359Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.1807556Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.1808152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1808383Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1808876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1809124Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1809580Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1809921Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1810148Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1810592Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1810681Z ^
2025-12-04T12:15:05.1811141Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1811161Z 
2025-12-04T12:15:05.1811874Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1811880Z 
2025-12-04T12:15:05.1811885Z 
2025-12-04T12:15:05.1812101Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1812724Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:05.1812730Z 
2025-12-04T12:15:05.1813000Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1813233Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1813339Z frames [('total', 1)]
2025-12-04T12:15:05.1813456Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1813709Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1813935Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1814035Z graph_break []
2025-12-04T12:15:05.1814269Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1814375Z frames [('total', 1)]
2025-12-04T12:15:05.1814504Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1814722Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1814959Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1815072Z graph_break []
2025-12-04T12:15:05.1815288Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1815395Z frames [('total', 1)]
2025-12-04T12:15:05.1815523Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1815739Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1815973Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.1816088Z graph_break []
2025-12-04T12:15:05.1816845Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e2e744a24cd2751e.xml -
2025-12-04T12:15:05.1817040Z =========================== short test summary info ============================
2025-12-04T12:15:05.1817759Z FAILED [0.3260s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1818252Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1818363Z ^
2025-12-04T12:15:05.1818821Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1818827Z 
2025-12-04T12:15:05.1819548Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1819586Z 
2025-12-04T12:15:05.1819590Z 
2025-12-04T12:15:05.1819810Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1820441Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:05.1820448Z 
2025-12-04T12:15:05.1820718Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1820902Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.1821120Z ================== 1 failed, 187 deselected, 2 rerun in 3.97s ==================
2025-12-04T12:15:05.1821223Z Got exit code 1
2025-12-04T12:15:05.1821720Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T12:15:05.1822145Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:05.1822614Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6de5a411a3f65f82.xml
2025-12-04T12:15:05.1822793Z ============================= test session starts ==============================
2025-12-04T12:15:05.1823144Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.1823289Z cachedir: .pytest_cache
2025-12-04T12:15:05.1823827Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.1823954Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.1824063Z configfile: pytest.ini
2025-12-04T12:15:05.1824666Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.1824894Z collecting ... collected 188 items / 12 deselected / 176 selected
2025-12-04T12:15:05.1825056Z stepcurrent: skipping 12 already run items.
2025-12-04T12:15:05.1825173Z Running 176 items in this shard
2025-12-04T12:15:05.1825178Z 
2025-12-04T12:15:05.1826331Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T12:15:05.1827233Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1827665Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.1828120Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5
2025-12-04T12:15:05.1828635Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 8
2025-12-04T12:15:05.1829114Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.1829648Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.1830224Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.1830826Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.1831413Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.1832017Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.1832491Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.1833012Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.1833496Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.1833958Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.1834417Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.1835012Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T12:15:05.1835535Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.1836093Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.1836677Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.1837298Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T12:15:05.1837922Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.1838440Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T12:15:05.1838910Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.1839356Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.1839931Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.1840373Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.1840959Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.1841492Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.1842208Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.1842590Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.1844541Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.1845096Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.1846203Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1846846Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1847743Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1848439Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1849334Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1850111Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1850773Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.1851652Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1852036Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.1852929Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1853069Z ('RERUN', {'yellow': True}) [3.5894s] [  0%]
2025-12-04T12:15:05.1854227Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T12:15:05.1855105Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1855554Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.1855995Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5
2025-12-04T12:15:05.1856585Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 8
2025-12-04T12:15:05.1857051Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.1857584Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.1858179Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.1858767Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.1859370Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.1859957Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.1860432Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.1860966Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.1861446Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.1861927Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.1862377Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.1862972Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T12:15:05.1863512Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.1864052Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.1864694Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.1865264Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T12:15:05.1865903Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.1866418Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T12:15:05.1866891Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.1867352Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.1867925Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.1868372Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.1868948Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.1869481Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.1870203Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.1870564Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.1872751Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.1873328Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.1874434Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1875071Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1875985Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1876664Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1877548Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1878329Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1878993Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.1879875Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1880239Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.1881145Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1881280Z ('RERUN', {'yellow': True}) [0.4605s] [  0%]
2025-12-04T12:15:05.1882435Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T12:15:05.1883326Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1883755Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.1884207Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5
2025-12-04T12:15:05.1884719Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 8
2025-12-04T12:15:05.1885179Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.1885764Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.1886311Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.1886909Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.1887527Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.1888128Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.1888576Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.1889101Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.1889584Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.1890045Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.1890505Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.1891104Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T12:15:05.1891624Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.1892211Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.1892797Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.1893379Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T12:15:05.1894004Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.1894514Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T12:15:05.1894993Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.1895436Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.1896016Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.1896513Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.1897085Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.1897642Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.1898359Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.1898741Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.1900716Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.1901329Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.1902375Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1903022Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1903908Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1904589Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1905498Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1906267Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1906923Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.1907791Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1908170Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.1909061Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1909166Z FAILED [0.4598s] [  0%]
2025-12-04T12:15:05.1909175Z 
2025-12-04T12:15:05.1909333Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.1909631Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____
2025-12-04T12:15:05.1909770Z Traceback (most recent call last):
2025-12-04T12:15:05.1910169Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.1910325Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.1910825Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.1911077Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.1911587Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.1911791Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.1912303Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.1912497Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.1913031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.1913351Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.1913881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.1914063Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.1914599Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.1914724Z     return self._compile_to_module()
2025-12-04T12:15:05.1915208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.1915390Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.1915910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.1916041Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.1916554Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.1916786Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.1917387Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.1917518Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.1918010Z   File "/tmp/tmpn0ip3q_o/pf/cpfhhofptfqdvu2nzxhygme3zjx2l52n2mjges6etyvlnownz6iv.py", line 113, in <module>
2025-12-04T12:15:05.1918517Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.1918632Z     kernel.precompile(
2025-12-04T12:15:05.1919195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.1919315Z     self._precompile_worker()
2025-12-04T12:15:05.1919912Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.1920110Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.1920705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1920907Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1921370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1921619Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1922074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1922410Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1922636Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1923078Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1923172Z ^
2025-12-04T12:15:05.1923640Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1923648Z 
2025-12-04T12:15:05.1924368Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1924377Z 
2025-12-04T12:15:05.1924382Z 
2025-12-04T12:15:05.1924635Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1925239Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:05.1925245Z 
2025-12-04T12:15:05.1925513Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1925751Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1925892Z frames [('total', 1)]
2025-12-04T12:15:05.1926011Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1926263Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.1926517Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1926636Z graph_break []
2025-12-04T12:15:05.1926931Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____
2025-12-04T12:15:05.1927060Z Traceback (most recent call last):
2025-12-04T12:15:05.1927473Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.1927629Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.1928122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.1928386Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.1928903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.1929109Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.1929618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.1929767Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.1930348Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.1930672Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.1931207Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.1931357Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.1931838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.1931979Z     return self._compile_to_module()
2025-12-04T12:15:05.1932469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.1932632Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.1933163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.1933298Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.1933805Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.1934036Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.1934619Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.1934773Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.1935284Z   File "/tmp/tmp4xvsuzkt/xr/cxrldus6vp6ouqkiqrbuchtbgu7kt3szztotmi4e7s3w5iwdga2z.py", line 113, in <module>
2025-12-04T12:15:05.1935762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.1935876Z     kernel.precompile(
2025-12-04T12:15:05.1936508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.1936687Z     self._precompile_worker()
2025-12-04T12:15:05.1937287Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.1937469Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.1938081Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1938384Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1938884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1939135Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1939582Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1939937Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1940166Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1940617Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1940709Z ^
2025-12-04T12:15:05.1941169Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1941178Z 
2025-12-04T12:15:05.1941911Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1941917Z 
2025-12-04T12:15:05.1941921Z 
2025-12-04T12:15:05.1942143Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1942783Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:05.1942789Z 
2025-12-04T12:15:05.1943061Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1943286Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1943410Z frames [('total', 1)]
2025-12-04T12:15:05.1943531Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1943769Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.1944011Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1944113Z graph_break []
2025-12-04T12:15:05.1944351Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1944458Z frames [('total', 1)]
2025-12-04T12:15:05.1944577Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1944813Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1945053Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.1945158Z graph_break []
2025-12-04T12:15:05.1945323Z =================================== FAILURES ===================================
2025-12-04T12:15:05.1945617Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____
2025-12-04T12:15:05.1945756Z Traceback (most recent call last):
2025-12-04T12:15:05.1946154Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.1946313Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.1946824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.1947078Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.1947595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.1947806Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.1948353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.1948516Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.1949054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.1949380Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.1949945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.1950123Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.1950620Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.1950744Z     return self._compile_to_module()
2025-12-04T12:15:05.1951229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.1951405Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.1951922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.1952052Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.1952561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.1952795Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.1953391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.1953517Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.1954048Z   File "/tmp/tmphoo1dnbk/cm/ccmb5pmh2v4krc2fx4we57durl5bvualdjg4u2iphyynhog5lbjw.py", line 113, in <module>
2025-12-04T12:15:05.1954525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.1954636Z     kernel.precompile(
2025-12-04T12:15:05.1955198Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.1955316Z     self._precompile_worker()
2025-12-04T12:15:05.1955912Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.1956102Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.1956698Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1956896Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1957362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1957607Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1958063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1958398Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1958629Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1959075Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1959165Z ^
2025-12-04T12:15:05.1959631Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1959636Z 
2025-12-04T12:15:05.1960383Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1960390Z 
2025-12-04T12:15:05.1960394Z 
2025-12-04T12:15:05.1960614Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1961219Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:05.1961225Z 
2025-12-04T12:15:05.1961494Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1961779Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1961885Z frames [('total', 1)]
2025-12-04T12:15:05.1962031Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1962281Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.1962503Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1962619Z graph_break []
2025-12-04T12:15:05.1962843Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1962948Z frames [('total', 1)]
2025-12-04T12:15:05.1963077Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1963295Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1963528Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.1963642Z graph_break []
2025-12-04T12:15:05.1963857Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.1963964Z frames [('total', 1)]
2025-12-04T12:15:05.1964095Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.1964314Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.1964558Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.1964659Z graph_break []
2025-12-04T12:15:05.1965344Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6de5a411a3f65f82.xml -
2025-12-04T12:15:05.1965536Z =========================== short test summary info ============================
2025-12-04T12:15:05.1966265Z FAILED [0.4598s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.1966696Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1966803Z ^
2025-12-04T12:15:05.1967259Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.1967264Z 
2025-12-04T12:15:05.1967987Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.1967995Z 
2025-12-04T12:15:05.1967999Z 
2025-12-04T12:15:05.1968217Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.1968819Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:05.1968824Z 
2025-12-04T12:15:05.1969090Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.1969270Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.1969487Z ================== 1 failed, 12 deselected, 2 rerun in 4.55s ===================
2025-12-04T12:15:05.1969590Z Got exit code 1
2025-12-04T12:15:05.1969700Z Retrying single test...
2025-12-04T12:15:05.1970189Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d2f3621583fff098.xml
2025-12-04T12:15:05.1970353Z ============================= test session starts ==============================
2025-12-04T12:15:05.1970719Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.1970862Z cachedir: .pytest_cache
2025-12-04T12:15:05.1971574Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.1971720Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.1971832Z configfile: pytest.ini
2025-12-04T12:15:05.1972433Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.1972729Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.1973442Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:05.1973576Z Running 1 items in this shard
2025-12-04T12:15:05.1973584Z 
2025-12-04T12:15:05.1974743Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T12:15:05.1975635Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.1976069Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.1976585Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5
2025-12-04T12:15:05.1977111Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 8
2025-12-04T12:15:05.1977632Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.1978181Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.1978720Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.1979307Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.1979910Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.1980466Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.1980925Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.1981448Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.1981934Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.1982393Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.1982841Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.1983447Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T12:15:05.1983964Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.1984563Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.1985145Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.1985713Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T12:15:05.1986384Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.1986920Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T12:15:05.1987402Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.1987847Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.1988410Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.1988862Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.1989432Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.1989983Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.1990695Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.1991100Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.1993048Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.1993595Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.1994650Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.1995284Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.1996190Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.1996872Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.1997769Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.1998576Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.1999199Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.2000069Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.2000468Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.2001404Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2001546Z ('RERUN', {'yellow': True}) [3.6178s] [100%]
2025-12-04T12:15:05.2002709Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T12:15:05.2003578Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.2004006Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.2004458Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5
2025-12-04T12:15:05.2004970Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 8
2025-12-04T12:15:05.2005481Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.2006013Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.2006552Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.2007150Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.2007742Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.2008311Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.2008759Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.2009286Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.2009759Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.2010220Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.2010681Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.2011277Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T12:15:05.2011810Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.2012408Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.2012987Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.2013570Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T12:15:05.2014259Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.2014781Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T12:15:05.2015249Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.2015693Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.2016274Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.2016780Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.2017375Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.2017909Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.2018625Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.2019047Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.2020983Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.2021535Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.2022587Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2023235Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2024131Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2024835Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2025718Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2026538Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2027148Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.2028019Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.2028464Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.2029926Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2031100Z ('RERUN', {'yellow': True}) [0.4823s] [100%]
2025-12-04T12:15:05.2032522Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T12:15:05.2034704Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.2036166Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.2037194Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5
2025-12-04T12:15:05.2038314Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 8
2025-12-04T12:15:05.2039471Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.2040617Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.2041836Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.2043095Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.2044406Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.2045691Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.2046838Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.2047937Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.2049084Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.2050170Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.2051231Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.2052403Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T12:15:05.2053709Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.2054927Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.2056191Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.2057604Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T12:15:05.2058994Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.2060279Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T12:15:05.2061405Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.2062462Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.2063607Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.2064768Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.2065934Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.2067188Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.2068606Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.2069826Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.2072467Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.2075074Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.2076814Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2078631Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2080303Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2082028Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2083816Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2085615Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2087135Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.2088881Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.2090260Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.2091685Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2092830Z FAILED [0.4675s] [100%]
2025-12-04T12:15:05.2093016Z 
2025-12-04T12:15:05.2093163Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.2093760Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____
2025-12-04T12:15:05.2094325Z Traceback (most recent call last):
2025-12-04T12:15:05.2094973Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.2095668Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.2096525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.2097428Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.2098411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.2099276Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.2100131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.2100949Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.2101779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.2102795Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.2103789Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.2104613Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.2105396Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.2106156Z     return self._compile_to_module()
2025-12-04T12:15:05.2106884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.2107696Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.2108527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.2109340Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.2110098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.2110981Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.2111958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.2112817Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.2113659Z   File "/tmp/tmpuge66jqu/j5/cj5eczgvcwmdynhzqwwuhshosypcq5osnr2lopm66l3olxca4x73.py", line 113, in <module>
2025-12-04T12:15:05.2114790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.2115528Z     kernel.precompile(
2025-12-04T12:15:05.2116272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.2117141Z     self._precompile_worker()
2025-12-04T12:15:05.2118001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.2118928Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.2119847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2120806Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2121605Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2122439Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2123280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2124213Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2124927Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.2125714Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.2126379Z ^
2025-12-04T12:15:05.2126975Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2127614Z 
2025-12-04T12:15:05.2128350Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.2129200Z 
2025-12-04T12:15:05.2129204Z 
2025-12-04T12:15:05.2129425Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.2130377Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:05.2131117Z 
2025-12-04T12:15:05.2131389Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.2132036Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2132501Z frames [('total', 1)]
2025-12-04T12:15:05.2132806Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2133271Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.2133861Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2134335Z graph_break []
2025-12-04T12:15:05.2134794Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____
2025-12-04T12:15:05.2135363Z Traceback (most recent call last):
2025-12-04T12:15:05.2135993Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.2136763Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.2137561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.2138431Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.2139353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.2140207Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.2141134Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.2141922Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.2142746Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.2143748Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.2144741Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.2145582Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.2146388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.2147145Z     return self._compile_to_module()
2025-12-04T12:15:05.2147876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.2148675Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.2149499Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.2150294Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.2151044Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.2151925Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.2152893Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.2153754Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.2154506Z   File "/tmp/tmpsh441ywf/2l/c2leueaeuuq4sigwf2uransvivrn3z2dtdo55t6sxaal2qj3uz5d.py", line 113, in <module>
2025-12-04T12:15:05.2155669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.2156395Z     kernel.precompile(
2025-12-04T12:15:05.2157135Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.2157955Z     self._precompile_worker()
2025-12-04T12:15:05.2158775Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.2159701Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.2160609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2161546Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2162340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2163186Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2164014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2164937Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2165646Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.2166428Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.2167110Z ^
2025-12-04T12:15:05.2167705Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2168304Z 
2025-12-04T12:15:05.2169026Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.2169873Z 
2025-12-04T12:15:05.2169878Z 
2025-12-04T12:15:05.2170155Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.2171282Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:05.2172021Z 
2025-12-04T12:15:05.2172293Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.2172932Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2173467Z frames [('total', 1)]
2025-12-04T12:15:05.2180495Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2181134Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.2181737Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2182213Z graph_break []
2025-12-04T12:15:05.2182614Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2183093Z frames [('total', 1)]
2025-12-04T12:15:05.2183387Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2183834Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2184435Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.2184904Z graph_break []
2025-12-04T12:15:05.2185216Z =================================== FAILURES ===================================
2025-12-04T12:15:05.2185815Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____
2025-12-04T12:15:05.2186364Z Traceback (most recent call last):
2025-12-04T12:15:05.2187020Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.2187720Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.2188502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.2189443Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.2190355Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.2191211Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.2192058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.2192852Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.2193675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.2194676Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.2195654Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.2196468Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.2197234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.2197975Z     return self._compile_to_module()
2025-12-04T12:15:05.2198688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.2199485Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.2200305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.2201101Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.2201847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.2202724Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.2203746Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.2204586Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.2205360Z   File "/tmp/tmpjeys9fpk/qt/cqtngethmeezmzzpbyza2yhzfoppgoh6nj6yoahjjox5zrv76dl5.py", line 113, in <module>
2025-12-04T12:15:05.2206481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.2207200Z     kernel.precompile(
2025-12-04T12:15:05.2207972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.2208787Z     self._precompile_worker()
2025-12-04T12:15:05.2209660Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.2210576Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.2211480Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2212420Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2213210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2214038Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2214876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2215801Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2216622Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.2217413Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.2218134Z ^
2025-12-04T12:15:05.2218724Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2219320Z 
2025-12-04T12:15:05.2220051Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.2220899Z 
2025-12-04T12:15:05.2220904Z 
2025-12-04T12:15:05.2221126Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.2222079Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:05.2222806Z 
2025-12-04T12:15:05.2223079Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.2223711Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2224176Z frames [('total', 1)]
2025-12-04T12:15:05.2224476Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2224931Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.2225510Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2225975Z graph_break []
2025-12-04T12:15:05.2226354Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2226820Z frames [('total', 1)]
2025-12-04T12:15:05.2227106Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2227544Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2228136Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.2228602Z graph_break []
2025-12-04T12:15:05.2228974Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2229435Z frames [('total', 1)]
2025-12-04T12:15:05.2229718Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2230152Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2230786Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.2231259Z graph_break []
2025-12-04T12:15:05.2232058Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d2f3621583fff098.xml -
2025-12-04T12:15:05.2233021Z =========================== short test summary info ============================
2025-12-04T12:15:05.2234081Z FAILED [0.4675s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.2235448Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.2236103Z ^
2025-12-04T12:15:05.2236694Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2237291Z 
2025-12-04T12:15:05.2238019Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.2238861Z 
2025-12-04T12:15:05.2238866Z 
2025-12-04T12:15:05.2239095Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.2240023Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:05.2240757Z 
2025-12-04T12:15:05.2241024Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.2241613Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.2242146Z ================== 1 failed, 187 deselected, 2 rerun in 4.61s ==================
2025-12-04T12:15:05.2242582Z Got exit code 1
2025-12-04T12:15:05.2242892Z Retrying single test...
2025-12-04T12:15:05.2243554Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-319cee3df6121e1a.xml
2025-12-04T12:15:05.2244323Z ============================= test session starts ==============================
2025-12-04T12:15:05.2244983Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.2245586Z cachedir: .pytest_cache
2025-12-04T12:15:05.2246300Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.2247082Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.2247435Z configfile: pytest.ini
2025-12-04T12:15:05.2248219Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.2249165Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.2250199Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:05.2251120Z Running 1 items in this shard
2025-12-04T12:15:05.2251336Z 
2025-12-04T12:15:05.2252507Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T12:15:05.2254691Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.2256132Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.2257220Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5
2025-12-04T12:15:05.2258376Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 8
2025-12-04T12:15:05.2259502Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.2260634Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.2261844Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.2263182Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.2264494Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.2265770Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.2266918Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.2268025Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.2269170Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.2270248Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.2271523Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.2272804Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T12:15:05.2274059Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.2275247Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.2276505Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.2277793Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T12:15:05.2279126Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.2280405Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T12:15:05.2281511Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.2282562Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.2283718Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.2284868Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.2286006Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.2287252Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.2288785Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.2290001Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.2292495Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.2295119Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.2296894Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2298703Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2300379Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2302083Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2303816Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2305608Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2307123Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.2309032Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.2310420Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.2311828Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2313009Z ('RERUN', {'yellow': True}) [3.5974s] [100%]
2025-12-04T12:15:05.2314432Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T12:15:05.2316655Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.2318094Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.2319099Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5
2025-12-04T12:15:05.2320243Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 8
2025-12-04T12:15:05.2321367Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.2322509Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.2323795Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.2325095Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.2326421Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.2327722Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.2328878Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.2329980Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.2331128Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.2332222Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.2333300Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.2334581Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T12:15:05.2335875Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.2337179Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.2338484Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.2339786Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T12:15:05.2341138Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.2342439Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T12:15:05.2343567Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.2344621Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.2345782Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.2346953Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.2348123Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.2349481Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.2350873Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.2352092Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.2354616Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.2357252Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.2358988Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2360792Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2362453Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2364176Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2365932Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2367731Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2369270Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.2370888Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.2372508Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.2373933Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2375110Z ('RERUN', {'yellow': True}) [0.4623s] [100%]
2025-12-04T12:15:05.2376587Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T12:15:05.2378758Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.2380224Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.2381390Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5
2025-12-04T12:15:05.2382494Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 8
2025-12-04T12:15:05.2383601Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.2384794Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.2386062Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.2387337Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.2388642Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.2389926Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.2391081Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.2392191Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.2393312Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.2394393Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.2395510Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.2396703Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T12:15:05.2397953Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.2399168Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.2400446Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.2401738Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T12:15:05.2403067Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.2404336Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T12:15:05.2405452Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.2406499Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.2407645Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.2408789Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.2409939Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.2411259Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.2412654Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.2413863Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.2416494Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.2419101Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.2420822Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2422633Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2424289Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2426014Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2427715Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2429509Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2431029Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.2432649Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.2434019Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.2435439Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2436573Z FAILED [0.4611s] [100%]
2025-12-04T12:15:05.2436755Z 
2025-12-04T12:15:05.2436918Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.2437497Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____
2025-12-04T12:15:05.2438064Z Traceback (most recent call last):
2025-12-04T12:15:05.2438710Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.2439409Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.2440182Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.2441133Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.2442043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.2442888Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.2443732Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.2444578Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.2445431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.2446424Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.2447422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.2448252Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.2449022Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.2449758Z     return self._compile_to_module()
2025-12-04T12:15:05.2450496Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.2451291Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.2452098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.2452890Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.2453649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.2454567Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.2455516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.2456437Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.2457215Z   File "/tmp/tmp4yizaawy/hq/chqaaow35caxdduunmhgnbhq3jxqvs2ryjclfayhdkcyja4wlspi.py", line 113, in <module>
2025-12-04T12:15:05.2458341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.2459053Z     kernel.precompile(
2025-12-04T12:15:05.2459810Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.2460632Z     self._precompile_worker()
2025-12-04T12:15:05.2461442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.2462367Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.2463284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2464233Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2465016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2465866Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2466701Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2467633Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2468330Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.2469129Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.2469798Z ^
2025-12-04T12:15:05.2470433Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2471240Z 
2025-12-04T12:15:05.2471958Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.2472890Z 
2025-12-04T12:15:05.2472895Z 
2025-12-04T12:15:05.2473116Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.2474117Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:05.2474837Z 
2025-12-04T12:15:05.2475123Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.2475756Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2476235Z frames [('total', 1)]
2025-12-04T12:15:05.2476544Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2476996Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.2477605Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2478076Z graph_break []
2025-12-04T12:15:05.2478525Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____
2025-12-04T12:15:05.2479096Z Traceback (most recent call last):
2025-12-04T12:15:05.2479744Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.2480450Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.2481228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.2482173Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.2483090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.2483947Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.2484780Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.2485595Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.2486427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.2487421Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.2488422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.2489243Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.2490021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.2490765Z     return self._compile_to_module()
2025-12-04T12:15:05.2491506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.2492311Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.2493131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.2493911Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.2494678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.2495555Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.2496560Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.2497426Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.2498249Z   File "/tmp/tmpgaidouh7/67/c67ognwug75odwxkol235o5w234fejsuwsmki46wz2slnxhn2r57.py", line 113, in <module>
2025-12-04T12:15:05.2499357Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.2500071Z     kernel.precompile(
2025-12-04T12:15:05.2500821Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.2501674Z     self._precompile_worker()
2025-12-04T12:15:05.2502532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.2503442Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.2504362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2505308Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2506089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2506935Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2507766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2508697Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2509394Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.2510197Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.2510860Z ^
2025-12-04T12:15:05.2511490Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2512082Z 
2025-12-04T12:15:05.2512799Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.2513657Z 
2025-12-04T12:15:05.2513662Z 
2025-12-04T12:15:05.2513884Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.2514835Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:05.2515566Z 
2025-12-04T12:15:05.2515851Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.2516482Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2516958Z frames [('total', 1)]
2025-12-04T12:15:05.2517257Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2517718Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.2518307Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2518774Z graph_break []
2025-12-04T12:15:05.2519154Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2519610Z frames [('total', 1)]
2025-12-04T12:15:05.2519909Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2520348Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2520937Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.2521410Z graph_break []
2025-12-04T12:15:05.2521718Z =================================== FAILURES ===================================
2025-12-04T12:15:05.2522311Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____
2025-12-04T12:15:05.2522858Z Traceback (most recent call last):
2025-12-04T12:15:05.2523501Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.2524201Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.2525010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.2525889Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.2526790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.2527678Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.2528511Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.2529366Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.2530187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.2531191Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.2532175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.2532994Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.2533766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.2533891Z     return self._compile_to_module()
2025-12-04T12:15:05.2534383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.2534566Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.2535084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.2535299Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.2535795Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.2536030Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.2536709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.2536840Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.2537358Z   File "/tmp/tmp1hojn4gp/lj/cljnn3iatte42t6knrtce34dmmsy5s4wq326ajh4ngohnbd535vb.py", line 113, in <module>
2025-12-04T12:15:05.2537827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.2537940Z     kernel.precompile(
2025-12-04T12:15:05.2538509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.2538628Z     self._precompile_worker()
2025-12-04T12:15:05.2539229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.2539423Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.2540017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2540234Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2540687Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2540932Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2541391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2541727Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2541972Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.2542457Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.2542550Z ^
2025-12-04T12:15:05.2543018Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2543026Z 
2025-12-04T12:15:05.2543744Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.2543782Z 
2025-12-04T12:15:05.2543786Z 
2025-12-04T12:15:05.2544048Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.2544637Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:05.2544645Z 
2025-12-04T12:15:05.2544928Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.2545169Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2545277Z frames [('total', 1)]
2025-12-04T12:15:05.2545410Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2545651Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.2545878Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2545997Z graph_break []
2025-12-04T12:15:05.2546219Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2546325Z frames [('total', 1)]
2025-12-04T12:15:05.2546456Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2546679Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2546911Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.2547058Z graph_break []
2025-12-04T12:15:05.2547279Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2547404Z frames [('total', 1)]
2025-12-04T12:15:05.2547529Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2547747Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2547994Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.2548097Z graph_break []
2025-12-04T12:15:05.2548751Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-319cee3df6121e1a.xml -
2025-12-04T12:15:05.2548947Z =========================== short test summary info ============================
2025-12-04T12:15:05.2549688Z FAILED [0.4611s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.2550137Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.2550232Z ^
2025-12-04T12:15:05.2550694Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2550699Z 
2025-12-04T12:15:05.2551426Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.2551432Z 
2025-12-04T12:15:05.2551439Z 
2025-12-04T12:15:05.2551659Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.2552265Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:05.2552270Z 
2025-12-04T12:15:05.2552541Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.2552742Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.2552944Z ================== 1 failed, 187 deselected, 2 rerun in 4.56s ==================
2025-12-04T12:15:05.2553081Z Got exit code 1
2025-12-04T12:15:05.2553606Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T12:15:05.2554017Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:05.2554487Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-452be63c68b4eb35.xml
2025-12-04T12:15:05.2554697Z ============================= test session starts ==============================
2025-12-04T12:15:05.2555082Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.2555212Z cachedir: .pytest_cache
2025-12-04T12:15:05.2555732Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.2555867Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.2555997Z configfile: pytest.ini
2025-12-04T12:15:05.2556588Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.2556819Z collecting ... collected 188 items / 13 deselected / 175 selected
2025-12-04T12:15:05.2556982Z stepcurrent: skipping 13 already run items.
2025-12-04T12:15:05.2557099Z Running 175 items in this shard
2025-12-04T12:15:05.2557107Z 
2025-12-04T12:15:05.2558276Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.2559252Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2559727Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.2560179Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.2560637Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.2561186Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.2561731Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.2562324Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.2562913Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.2563467Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.2563926Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.2564560Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.2565153Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.2565688Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.2566241Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.2566749Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.2567227Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.2567741Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T12:15:05.2568532Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.2569065Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.2569655Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.2570225Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T12:15:05.2570792Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T12:15:05.2571645Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T12:15:05.2572184Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.2572811Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.2573323Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp3.to(tl.float32)
2025-12-04T12:15:05.2573805Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.2574251Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.2574835Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.2575278Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.2575853Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.2576469Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.2577185Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.2577562Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.2579799Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.2580352Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.2581399Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2582142Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2583044Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2583730Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2584626Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2585398Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2586026Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.2587007Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2587443Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.2588340Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2588478Z ('RERUN', {'yellow': True}) [3.2675s] [  0%]
2025-12-04T12:15:05.2589643Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.2590614Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2591064Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.2591516Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.2591989Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.2592528Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.2593071Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.2593671Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.2594303Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.2594877Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.2595326Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.2595986Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.2596609Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.2597141Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.2597687Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.2598174Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.2598670Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.2599136Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T12:15:05.2599900Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.2600464Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.2601058Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.2601642Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T12:15:05.2602194Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T12:15:05.2602765Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T12:15:05.2603298Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.2603840Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.2604365Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp3.to(tl.float32)
2025-12-04T12:15:05.2604830Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.2605270Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.2605854Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.2606292Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.2606875Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.2607442Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.2608155Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.2608533Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.2610775Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.2611326Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.2612374Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2613014Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2613903Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2614630Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2615510Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2616350Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2616969Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.2617939Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2618327Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.2619222Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2619371Z ('RERUN', {'yellow': True}) [0.3303s] [  0%]
2025-12-04T12:15:05.2620523Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.2621507Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2621989Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.2622442Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.2622916Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.2623482Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.2624066Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.2624654Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.2625240Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.2625807Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.2626255Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.2626901Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.2627485Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.2628016Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.2628675Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.2629164Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.2629662Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.2630137Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T12:15:05.2630914Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.2631431Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.2632026Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.2633230Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T12:15:05.2633785Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T12:15:05.2634375Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T12:15:05.2634900Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.2635442Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.2636020Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp3.to(tl.float32)
2025-12-04T12:15:05.2636486Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.2636940Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.2637557Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.2638022Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.2638614Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.2639151Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.2639867Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.2640232Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.2642415Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.2642991Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.2644041Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2644672Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2645707Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2646407Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2647291Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2648084Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2648701Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.2649685Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2650097Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.2650988Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2651107Z FAILED [0.3306s] [  0%]
2025-12-04T12:15:05.2651143Z 
2025-12-04T12:15:05.2651294Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.2651600Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____
2025-12-04T12:15:05.2651758Z Traceback (most recent call last):
2025-12-04T12:15:05.2652162Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.2652340Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.2652839Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.2653106Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.2653622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.2653818Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.2654346Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.2654499Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.2655039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.2655376Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.2655937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.2656101Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.2656666Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.2656794Z     return self._compile_to_module()
2025-12-04T12:15:05.2657297Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.2657467Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.2657999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.2658131Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.2658629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.2658879Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.2659470Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.2659612Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.2660119Z   File "/tmp/tmpvldyvac4/7d/c7dmblsqrrhzhkzyttxoa6fsfxh7miolefeq4fdp2uvexshr7vii.py", line 58, in <module>
2025-12-04T12:15:05.2660582Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.2660714Z     kernel.precompile(
2025-12-04T12:15:05.2661274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.2661394Z     self._precompile_worker()
2025-12-04T12:15:05.2662002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.2662236Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.2662848Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2663048Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2663500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2663792Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2664263Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2664608Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2664835Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.2665375Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2665484Z ^
2025-12-04T12:15:05.2665943Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2665949Z 
2025-12-04T12:15:05.2666674Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.2666683Z 
2025-12-04T12:15:05.2666688Z 
2025-12-04T12:15:05.2666906Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.2667499Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.2667505Z 
2025-12-04T12:15:05.2667789Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.2668048Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2668170Z frames [('total', 1)]
2025-12-04T12:15:05.2668294Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2668533Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.2668769Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2668875Z graph_break []
2025-12-04T12:15:05.2669167Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____
2025-12-04T12:15:05.2669309Z Traceback (most recent call last):
2025-12-04T12:15:05.2669709Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.2669882Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.2670372Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.2670622Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.2671325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.2671521Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.2672031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.2672195Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.2672730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.2673069Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.2673590Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.2673743Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.2674318Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.2674443Z     return self._compile_to_module()
2025-12-04T12:15:05.2675251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.2675422Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.2675938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.2676153Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.2676691Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.2676925Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.2677525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.2677657Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.2678179Z   File "/tmp/tmpxzdjushk/sb/csbslejr7dl6ckros4dlobemrdlaidbvcnpnityrtogll3ghyblm.py", line 58, in <module>
2025-12-04T12:15:05.2678646Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.2678759Z     kernel.precompile(
2025-12-04T12:15:05.2679334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.2679456Z     self._precompile_worker()
2025-12-04T12:15:05.2680069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.2680252Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.2680896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2681116Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2681568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2681813Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2682270Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2682611Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2682858Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.2683397Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2683492Z ^
2025-12-04T12:15:05.2683973Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2683979Z 
2025-12-04T12:15:05.2684687Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.2684693Z 
2025-12-04T12:15:05.2684699Z 
2025-12-04T12:15:05.2684931Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.2685517Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.2685523Z 
2025-12-04T12:15:05.2685807Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.2686032Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2686140Z frames [('total', 1)]
2025-12-04T12:15:05.2686268Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2686509Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.2686783Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2686899Z graph_break []
2025-12-04T12:15:05.2687119Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2687223Z frames [('total', 1)]
2025-12-04T12:15:05.2687353Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2687570Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2687846Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.2687948Z graph_break []
2025-12-04T12:15:05.2688127Z =================================== FAILURES ===================================
2025-12-04T12:15:05.2688436Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____
2025-12-04T12:15:05.2688567Z Traceback (most recent call last):
2025-12-04T12:15:05.2688969Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.2689142Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.2689633Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.2689898Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.2690412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.2690605Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.2691130Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.2691279Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.2691824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.2692181Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.2692701Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.2692863Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.2693341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.2693465Z     return self._compile_to_module()
2025-12-04T12:15:05.2693964Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.2694132Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.2694658Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.2694792Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.2695291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.2695542Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.2696127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.2696266Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.2696843Z   File "/tmp/tmpn7g629zu/4t/c4tpztc2cf4nouw5kqxpychvhy47k5gxlv56f5ydrankt6acdhtg.py", line 58, in <module>
2025-12-04T12:15:05.2697308Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.2697435Z     kernel.precompile(
2025-12-04T12:15:05.2697990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.2698110Z     self._precompile_worker()
2025-12-04T12:15:05.2698762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.2698944Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.2699553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2699753Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2700237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2700525Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2700972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2701324Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2701556Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.2702094Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2702200Z ^
2025-12-04T12:15:05.2702658Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2702666Z 
2025-12-04T12:15:05.2703388Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.2703397Z 
2025-12-04T12:15:05.2703402Z 
2025-12-04T12:15:05.2703622Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.2704207Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.2704251Z 
2025-12-04T12:15:05.2704539Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.2704765Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2704884Z frames [('total', 1)]
2025-12-04T12:15:05.2705001Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2705242Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.2705478Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2705579Z graph_break []
2025-12-04T12:15:05.2705797Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2705916Z frames [('total', 1)]
2025-12-04T12:15:05.2706031Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2706249Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2706500Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.2706599Z graph_break []
2025-12-04T12:15:05.2706833Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2706938Z frames [('total', 1)]
2025-12-04T12:15:05.2707052Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2707282Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2707517Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.2707619Z graph_break []
2025-12-04T12:15:05.2708285Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-452be63c68b4eb35.xml -
2025-12-04T12:15:05.2708461Z =========================== short test summary info ============================
2025-12-04T12:15:05.2709220Z FAILED [0.3306s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.2709793Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2709885Z ^
2025-12-04T12:15:05.2710360Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2710365Z 
2025-12-04T12:15:05.2711073Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.2711111Z 
2025-12-04T12:15:05.2711116Z 
2025-12-04T12:15:05.2711352Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.2711964Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.2711970Z 
2025-12-04T12:15:05.2712255Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.2712442Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.2712645Z ================== 1 failed, 13 deselected, 2 rerun in 3.97s ===================
2025-12-04T12:15:05.2712770Z Got exit code 1
2025-12-04T12:15:05.2712881Z Retrying single test...
2025-12-04T12:15:05.2713355Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5a49841d6a2b730b.xml
2025-12-04T12:15:05.2713537Z ============================= test session starts ==============================
2025-12-04T12:15:05.2713898Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.2714027Z cachedir: .pytest_cache
2025-12-04T12:15:05.2714549Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.2714711Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.2714834Z configfile: pytest.ini
2025-12-04T12:15:05.2715429Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.2715654Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.2716332Z stepcurrent: skipping 13 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.2716454Z Running 1 items in this shard
2025-12-04T12:15:05.2716459Z 
2025-12-04T12:15:05.2717634Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.2718618Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2719072Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.2719528Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.2719995Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.2720551Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.2721099Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.2721696Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.2722325Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.2722890Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.2723356Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.2724017Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.2724643Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.2725181Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.2725709Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.2726213Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.2726695Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.2727177Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T12:15:05.2727951Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.2728532Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.2729122Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.2729699Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T12:15:05.2730262Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T12:15:05.2730836Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T12:15:05.2731370Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.2731911Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.2732416Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp3.to(tl.float32)
2025-12-04T12:15:05.2732896Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.2733342Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.2733923Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.2734363Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.2734938Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.2735547Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.2736269Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.2736719Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.2738983Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.2739541Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.2740590Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2741248Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2742156Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2742893Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2743789Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2744556Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2745181Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.2746152Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2746544Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.2747441Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2747576Z ('RERUN', {'yellow': True}) [3.3292s] [100%]
2025-12-04T12:15:05.2748738Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.2749710Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2750196Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.2750649Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.2751122Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.2751692Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.2752267Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.2752871Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.2753461Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.2754030Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.2754480Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.2755110Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.2755708Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.2756245Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.2756824Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.2757314Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.2757809Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.2758279Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T12:15:05.2759041Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.2759575Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.2760164Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.2760750Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T12:15:05.2761304Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T12:15:05.2761879Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T12:15:05.2762418Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.2762959Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.2763525Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp3.to(tl.float32)
2025-12-04T12:15:05.2763995Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.2764437Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.2765047Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.2765518Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.2766110Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.2766651Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.2767366Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.2767748Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.2769929Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.2770513Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.2771844Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2772502Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2773395Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2774092Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2774973Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2775756Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2776422Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.2777398Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2777878Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.2778775Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2778927Z ('RERUN', {'yellow': True}) [0.3379s] [100%]
2025-12-04T12:15:05.2780180Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.2781169Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2781606Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.2782057Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.2782539Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.2783081Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.2783638Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.2784221Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.2784862Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.2785429Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.2785878Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.2786525Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.2787107Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.2787638Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.2788182Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.2788667Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.2789161Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.2789630Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T12:15:05.2790410Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.2790924Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.2791546Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.2792133Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T12:15:05.2792684Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T12:15:05.2793300Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T12:15:05.2793853Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.2794396Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.2794915Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp3.to(tl.float32)
2025-12-04T12:15:05.2795380Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.2795834Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.2796404Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.2796844Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.2797431Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.2798001Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.2798726Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.2799089Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.2801280Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.2801819Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.2802879Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2803517Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2804410Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2805142Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2806037Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2806825Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2807496Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.2808486Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2808859Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.2809747Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2809866Z FAILED [0.3367s] [100%]
2025-12-04T12:15:05.2809874Z 
2025-12-04T12:15:05.2810020Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.2810325Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____
2025-12-04T12:15:05.2810453Z Traceback (most recent call last):
2025-12-04T12:15:05.2810851Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.2811061Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.2811554Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.2811819Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.2812331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.2812525Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.2813049Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.2813200Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.2813736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.2814071Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.2814595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.2814759Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.2815239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.2815363Z     return self._compile_to_module()
2025-12-04T12:15:05.2815861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.2816030Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.2816627Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.2816762Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.2817259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.2817513Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.2818142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.2818287Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.2818793Z   File "/tmp/tmpjh3s7agi/2g/c2gosmniklmk24r4ta44yvlqiqotibdvbnl4trn6ppqt2jvhd2xc.py", line 58, in <module>
2025-12-04T12:15:05.2819253Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.2819413Z     kernel.precompile(
2025-12-04T12:15:05.2820002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.2820124Z     self._precompile_worker()
2025-12-04T12:15:05.2820740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.2820928Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.2821536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2821738Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2822191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2822459Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2822907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2823259Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2823488Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.2824058Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2824167Z ^
2025-12-04T12:15:05.2824629Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2824635Z 
2025-12-04T12:15:05.2825363Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.2825372Z 
2025-12-04T12:15:05.2825377Z 
2025-12-04T12:15:05.2825596Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.2826187Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.2826193Z 
2025-12-04T12:15:05.2826478Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.2826707Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2826833Z frames [('total', 1)]
2025-12-04T12:15:05.2826951Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2827189Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.2827426Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2827530Z graph_break []
2025-12-04T12:15:05.2827826Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____
2025-12-04T12:15:05.2827966Z Traceback (most recent call last):
2025-12-04T12:15:05.2828363Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.2828531Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.2829023Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.2829275Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.2829842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.2830038Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.2830547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.2830706Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.2831287Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.2831735Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.2832255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.2832406Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.2832902Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.2833027Z     return self._compile_to_module()
2025-12-04T12:15:05.2833532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.2833697Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.2834212Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.2834362Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.2834864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.2835096Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.2835738Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.2835867Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.2836382Z   File "/tmp/tmptyhyzyxe/ru/crutiqfrgl6bg6v6h4bwcclkmn75z6da5uuud2wvt4dqyfn2fu2u.py", line 58, in <module>
2025-12-04T12:15:05.2836845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.2836959Z     kernel.precompile(
2025-12-04T12:15:05.2837527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.2837645Z     self._precompile_worker()
2025-12-04T12:15:05.2838256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.2838437Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.2839036Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2839255Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2839704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2839950Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2840407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2840745Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2840987Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.2841518Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2841612Z ^
2025-12-04T12:15:05.2842114Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2842121Z 
2025-12-04T12:15:05.2842836Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.2842843Z 
2025-12-04T12:15:05.2842847Z 
2025-12-04T12:15:05.2843079Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.2843696Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.2843702Z 
2025-12-04T12:15:05.2844012Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.2844237Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2844345Z frames [('total', 1)]
2025-12-04T12:15:05.2844475Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2844713Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.2844937Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2845050Z graph_break []
2025-12-04T12:15:05.2845267Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2845371Z frames [('total', 1)]
2025-12-04T12:15:05.2845499Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2845717Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2845968Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.2846071Z graph_break []
2025-12-04T12:15:05.2846220Z =================================== FAILURES ===================================
2025-12-04T12:15:05.2846522Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____
2025-12-04T12:15:05.2846680Z Traceback (most recent call last):
2025-12-04T12:15:05.2847080Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.2847252Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.2847742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.2848003Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.2848518Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.2848713Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.2849239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.2849388Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.2849936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.2850264Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.2850783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.2850943Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.2851424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.2851548Z     return self._compile_to_module()
2025-12-04T12:15:05.2852049Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.2852216Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.2852747Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.2852879Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.2853417Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.2853663Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.2854245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.2854384Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.2854919Z   File "/tmp/tmp925yvxqb/sb/csbnfjpd6wqmkeluniuygzgi6oasdo3qgjnbe6bsaweodo7s66dy.py", line 58, in <module>
2025-12-04T12:15:05.2855410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.2855536Z     kernel.precompile(
2025-12-04T12:15:05.2856089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.2856210Z     self._precompile_worker()
2025-12-04T12:15:05.2856890Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.2857079Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.2857689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2857893Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2858342Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2858604Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2859045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2859438Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2859672Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.2860208Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2860313Z ^
2025-12-04T12:15:05.2860770Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2860778Z 
2025-12-04T12:15:05.2861502Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.2861511Z 
2025-12-04T12:15:05.2861516Z 
2025-12-04T12:15:05.2861733Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.2862318Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.2862326Z 
2025-12-04T12:15:05.2862609Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.2862829Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2862948Z frames [('total', 1)]
2025-12-04T12:15:05.2863064Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2863302Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.2863539Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2863641Z graph_break []
2025-12-04T12:15:05.2863859Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2863977Z frames [('total', 1)]
2025-12-04T12:15:05.2864092Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2864315Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2864565Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.2864665Z graph_break []
2025-12-04T12:15:05.2864929Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2865038Z frames [('total', 1)]
2025-12-04T12:15:05.2865151Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2865383Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2865616Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.2865750Z graph_break []
2025-12-04T12:15:05.2866415Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5a49841d6a2b730b.xml -
2025-12-04T12:15:05.2866620Z =========================== short test summary info ============================
2025-12-04T12:15:05.2867366Z FAILED [0.3367s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.2867900Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2867990Z ^
2025-12-04T12:15:05.2868461Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2868467Z 
2025-12-04T12:15:05.2869175Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.2869184Z 
2025-12-04T12:15:05.2869189Z 
2025-12-04T12:15:05.2869427Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.2870012Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.2870018Z 
2025-12-04T12:15:05.2870334Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.2870520Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.2870723Z ================== 1 failed, 187 deselected, 2 rerun in 4.05s ==================
2025-12-04T12:15:05.2870838Z Got exit code 1
2025-12-04T12:15:05.2871167Z Retrying single test...
2025-12-04T12:15:05.2871642Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f1313a025d30dc09.xml
2025-12-04T12:15:05.2871820Z ============================= test session starts ==============================
2025-12-04T12:15:05.2872176Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.2872303Z cachedir: .pytest_cache
2025-12-04T12:15:05.2872823Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.2872952Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.2873076Z configfile: pytest.ini
2025-12-04T12:15:05.2873673Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.2873897Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.2874578Z stepcurrent: skipping 13 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.2874710Z Running 1 items in this shard
2025-12-04T12:15:05.2874715Z 
2025-12-04T12:15:05.2875868Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.2876961Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2877417Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.2877872Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.2878335Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.2878947Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.2879535Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.2880134Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.2880726Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.2881278Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.2881743Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.2882376Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.2882977Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.2883557Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.2884087Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.2884596Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.2885079Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.2885565Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T12:15:05.2886327Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.2886866Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.2887462Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.2888587Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T12:15:05.2889161Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T12:15:05.2889744Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T12:15:05.2890282Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.2890828Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.2891384Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp3.to(tl.float32)
2025-12-04T12:15:05.2891868Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.2892314Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.2892933Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.2893403Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.2893981Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.2894533Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.2895250Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.2895632Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.2898050Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.2898659Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.2899711Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2900361Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2901256Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2901944Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2902844Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2903613Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2904276Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.2905245Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2905664Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.2906553Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2906689Z ('RERUN', {'yellow': True}) [3.2936s] [100%]
2025-12-04T12:15:05.2907918Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.2908901Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2909346Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.2909796Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.2910267Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.2910802Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.2911344Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.2911941Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.2912561Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.2913127Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.2913577Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.2914207Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.2914801Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.2915338Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.2915876Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.2916368Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.2916858Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.2917328Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T12:15:05.2918096Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.2918629Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.2919262Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.2919850Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T12:15:05.2920402Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T12:15:05.2921039Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T12:15:05.2921571Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.2922114Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.2922638Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp3.to(tl.float32)
2025-12-04T12:15:05.2923104Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.2923543Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.2924124Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.2924566Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.2925152Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.2925736Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.2926449Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.2926822Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.2928995Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.2929552Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.2930594Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2931238Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2932129Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2932860Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2933749Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2934543Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2935218Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.2936191Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2936641Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.2937534Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2937688Z ('RERUN', {'yellow': True}) [0.3379s] [100%]
2025-12-04T12:15:05.2938834Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T12:15:05.2939813Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2940296Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.2940749Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.2941226Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.2941765Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.2942318Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.2942900Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.2943483Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.2944047Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.2944495Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.2945139Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.2945718Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.2946246Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.2946819Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.2947309Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.2947804Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.2948302Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T12:15:05.2949109Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.2949621Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T12:15:05.2950211Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.2950797Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T12:15:05.2951350Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T12:15:05.2951936Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T12:15:05.2952457Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.2953033Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.2953553Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp3.to(tl.float32)
2025-12-04T12:15:05.2954016Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.2954470Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.2955040Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.2955480Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.2956064Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.2956601Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.2957325Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.2957691Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.2959918Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.2960461Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.2961511Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2962207Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2963099Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2963793Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2964671Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2965456Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2966066Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.2967052Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2967452Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.2968345Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2968465Z FAILED [0.3385s] [100%]
2025-12-04T12:15:05.2968473Z 
2025-12-04T12:15:05.2968621Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.2968925Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____
2025-12-04T12:15:05.2969055Z Traceback (most recent call last):
2025-12-04T12:15:05.2969452Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.2969626Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.2970116Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.2970381Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.2970892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.2971294Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.2971825Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.2971974Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.2972507Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.2972845Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.2973463Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.2973627Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.2974109Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.2974233Z     return self._compile_to_module()
2025-12-04T12:15:05.2974735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.2974952Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.2975540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.2975674Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.2976171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.2976480Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.2977068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.2984026Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.2984621Z   File "/tmp/tmpqlyuc1go/wa/cwa3dm4i26la4q3qwl4q7xvx252yxnfo3tasybuwmubg533wb23j.py", line 58, in <module>
2025-12-04T12:15:05.2985103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.2985244Z     kernel.precompile(
2025-12-04T12:15:05.2985811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.2985936Z     self._precompile_worker()
2025-12-04T12:15:05.2986548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.2986888Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.2987489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.2987711Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.2988165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.2988429Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.2988879Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.2989220Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.2989467Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.2990009Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.2990118Z ^
2025-12-04T12:15:05.2990579Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.2990586Z 
2025-12-04T12:15:05.2991302Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.2991313Z 
2025-12-04T12:15:05.2991332Z 
2025-12-04T12:15:05.2991555Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.2992142Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.2992147Z 
2025-12-04T12:15:05.2992431Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.2992664Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.2992816Z frames [('total', 1)]
2025-12-04T12:15:05.2992950Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.2993193Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.2993431Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.2993535Z graph_break []
2025-12-04T12:15:05.2993829Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____
2025-12-04T12:15:05.2994022Z Traceback (most recent call last):
2025-12-04T12:15:05.2994424Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.2994631Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.2995139Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.2995395Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.2995924Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.2996118Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.2996628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.2996794Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.2997334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.2997670Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.2998195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.2998380Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.2998875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.2998999Z     return self._compile_to_module()
2025-12-04T12:15:05.2999486Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.2999665Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.3000181Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.3000325Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.3000824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.3001055Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.3001656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.3001785Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.3002294Z   File "/tmp/tmpg82tku6q/wa/cwasncznhdh27jaye6cm62ypurp26i5ly6adgfhkyfckfxttkofh.py", line 58, in <module>
2025-12-04T12:15:05.3002759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.3002872Z     kernel.precompile(
2025-12-04T12:15:05.3003439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.3003558Z     self._precompile_worker()
2025-12-04T12:15:05.3004160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.3004351Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.3004945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.3005193Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.3005645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.3005890Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.3006348Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.3006714Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.3006986Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.3007521Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.3007618Z ^
2025-12-04T12:15:05.3008091Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3008097Z 
2025-12-04T12:15:05.3008808Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.3008814Z 
2025-12-04T12:15:05.3008819Z 
2025-12-04T12:15:05.3009051Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3009637Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.3009643Z 
2025-12-04T12:15:05.3009911Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3010148Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3010289Z frames [('total', 1)]
2025-12-04T12:15:05.3010421Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3010663Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.3010887Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3011001Z graph_break []
2025-12-04T12:15:05.3011222Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3011327Z frames [('total', 1)]
2025-12-04T12:15:05.3011455Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3011674Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3011912Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.3012026Z graph_break []
2025-12-04T12:15:05.3012174Z =================================== FAILURES ===================================
2025-12-04T12:15:05.3012476Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____
2025-12-04T12:15:05.3012603Z Traceback (most recent call last):
2025-12-04T12:15:05.3013000Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.3013172Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.3013665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.3013915Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.3014441Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.3014640Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.3015166Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.3015314Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.3015850Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.3016228Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.3016898Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.3017071Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.3017556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.3017736Z     return self._compile_to_module()
2025-12-04T12:15:05.3018268Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.3018435Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.3018964Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.3019098Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.3019597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.3019844Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.3020434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.3020561Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.3021085Z   File "/tmp/tmp8lyshc2k/dk/cdkrpphdgfcpc657znenumilapgmbhcho7h5ybgzvvpgztpo4ori.py", line 58, in <module>
2025-12-04T12:15:05.3021550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.3021676Z     kernel.precompile(
2025-12-04T12:15:05.3022229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.3022388Z     self._precompile_worker()
2025-12-04T12:15:05.3023001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.3023181Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.3023789Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.3023990Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.3024442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.3024702Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.3025145Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.3025481Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.3025724Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.3026257Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.3026362Z ^
2025-12-04T12:15:05.3026820Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3026828Z 
2025-12-04T12:15:05.3027543Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.3027566Z 
2025-12-04T12:15:05.3027570Z 
2025-12-04T12:15:05.3027788Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3028373Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.3028381Z 
2025-12-04T12:15:05.3028698Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3028922Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3029029Z frames [('total', 1)]
2025-12-04T12:15:05.3029159Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3029396Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.3029688Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3029791Z graph_break []
2025-12-04T12:15:05.3030011Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3030162Z frames [('total', 1)]
2025-12-04T12:15:05.3030277Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3030496Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3030746Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.3030846Z graph_break []
2025-12-04T12:15:05.3031083Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3031187Z frames [('total', 1)]
2025-12-04T12:15:05.3031302Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3031533Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3031764Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.3031866Z graph_break []
2025-12-04T12:15:05.3032528Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f1313a025d30dc09.xml -
2025-12-04T12:15:05.3032707Z =========================== short test summary info ============================
2025-12-04T12:15:05.3033452Z FAILED [0.3385s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.3034018Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.3034108Z ^
2025-12-04T12:15:05.3034581Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3034587Z 
2025-12-04T12:15:05.3035293Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.3035301Z 
2025-12-04T12:15:05.3035305Z 
2025-12-04T12:15:05.3035537Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3036124Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.3036132Z 
2025-12-04T12:15:05.3036403Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3036600Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.3036803Z ================== 1 failed, 187 deselected, 2 rerun in 4.01s ==================
2025-12-04T12:15:05.3036922Z Got exit code 1
2025-12-04T12:15:05.3037429Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T12:15:05.3037838Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:05.3038325Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-03aedafc0832726c.xml
2025-12-04T12:15:05.3038493Z ============================= test session starts ==============================
2025-12-04T12:15:05.3038858Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.3038972Z cachedir: .pytest_cache
2025-12-04T12:15:05.3039525Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.3039666Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.3039774Z configfile: pytest.ini
2025-12-04T12:15:05.3040363Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.3040605Z collecting ... collected 188 items / 14 deselected / 174 selected
2025-12-04T12:15:05.3040855Z stepcurrent: skipping 14 already run items.
2025-12-04T12:15:05.3040985Z Running 174 items in this shard
2025-12-04T12:15:05.3040991Z 
2025-12-04T12:15:05.3042191Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T12:15:05.3043075Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3043520Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.3043965Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 160
2025-12-04T12:15:05.3044503Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.3044969Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.3045505Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.3046098Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.3046684Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.3047282Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.3047841Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.3048298Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.3048817Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.3049296Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.3049771Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.3050218Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.3050828Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T12:15:05.3051352Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.3051899Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.3052498Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.3053101Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T12:15:05.3053744Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.3054252Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T12:15:05.3054753Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.3055246Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.3055812Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.3056271Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.3056922Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.3057459Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.3058197Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.3058560Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.3060625Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.3061205Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.3062261Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.3062893Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.3063803Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.3064487Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.3065370Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.3066161Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.3066805Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.3067700Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3068067Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.3069615Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3069755Z ('RERUN', {'yellow': True}) [3.6567s] [  0%]
2025-12-04T12:15:05.3070922Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T12:15:05.3072156Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3072584Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.3073049Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 160
2025-12-04T12:15:05.3073570Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.3074046Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.3074669Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.3075210Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.3075809Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.3076394Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.3076968Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.3077406Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.3077922Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.3078407Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.3078865Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.3079324Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.3079919Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T12:15:05.3080442Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.3080995Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.3081646Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.3082234Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T12:15:05.3082857Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.3083411Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T12:15:05.3083934Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.3084380Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.3084962Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.3085401Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.3085985Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.3086517Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.3087227Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.3087608Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.3089676Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.3090237Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.3091283Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.3091931Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.3092825Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.3093514Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.3094403Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.3095173Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.3095824Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.3096764Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3097186Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.3098105Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3098257Z ('RERUN', {'yellow': True}) [0.4800s] [  0%]
2025-12-04T12:15:05.3099420Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T12:15:05.3100283Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3100726Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.3101173Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 160
2025-12-04T12:15:05.3101704Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.3102198Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.3102733Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.3103288Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.3103869Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.3104465Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.3105025Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.3105466Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.3106006Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.3106478Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.3106952Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.3107402Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.3108000Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T12:15:05.3108540Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.3109121Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.3109716Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.3110287Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T12:15:05.3110959Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.3111497Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T12:15:05.3111966Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.3112425Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.3112992Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.3113442Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.3114013Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.3114546Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.3115275Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.3115680Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.3117714Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.3118253Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.3119309Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.3119936Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.3120842Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.3121526Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.3122406Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.3123219Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.3123828Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.3124706Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3125135Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.3126047Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3126156Z FAILED [0.4755s] [  0%]
2025-12-04T12:15:05.3126163Z 
2025-12-04T12:15:05.3126311Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.3126623Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___
2025-12-04T12:15:05.3126749Z Traceback (most recent call last):
2025-12-04T12:15:05.3127161Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.3127321Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.3127812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.3128078Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.3128591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.3128834Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.3129359Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.3129507Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.3130052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.3130374Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.3130896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.3131060Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.3131540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.3131678Z     return self._compile_to_module()
2025-12-04T12:15:05.3132166Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.3132331Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.3132859Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.3132991Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.3133484Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.3133730Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.3134318Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.3134458Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.3134979Z   File "/tmp/tmp9ew304tp/px/cpxpbyg6h77wdwmfjopa4olavns7dy7budjyv3xi56ypvqtxsi7x.py", line 118, in <module>
2025-12-04T12:15:05.3135480Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.3135608Z     kernel.precompile(
2025-12-04T12:15:05.3136162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.3136370Z     self._precompile_worker()
2025-12-04T12:15:05.3136977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.3137201Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.3137845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.3138045Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.3138500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.3138764Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.3139208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.3139558Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.3139788Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.3140222Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3140328Z ^
2025-12-04T12:15:05.3140785Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3140791Z 
2025-12-04T12:15:05.3141516Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.3141558Z 
2025-12-04T12:15:05.3141563Z 
2025-12-04T12:15:05.3141781Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3142378Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.3142396Z 
2025-12-04T12:15:05.3142669Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3142897Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3143016Z frames [('total', 1)]
2025-12-04T12:15:05.3143134Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3143372Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3143608Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3143709Z graph_break []
2025-12-04T12:15:05.3144004Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___
2025-12-04T12:15:05.3144143Z Traceback (most recent call last):
2025-12-04T12:15:05.3144540Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.3144709Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.3145196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.3145446Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.3145994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.3146188Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.3146713Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.3146865Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.3147439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.3147780Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.3148300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.3148480Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.3148975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.3149133Z     return self._compile_to_module()
2025-12-04T12:15:05.3149635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.3149804Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.3150326Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.3150474Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.3150974Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.3151224Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.3151812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.3151943Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.3152457Z   File "/tmp/tmpi22i3h0l/ow/cowqt4322c7owwcibq4p5y2imfc6k52zopnev5y5666tvgnjkyzz.py", line 118, in <module>
2025-12-04T12:15:05.3152920Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.3153073Z     kernel.precompile(
2025-12-04T12:15:05.3153643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.3153762Z     self._precompile_worker()
2025-12-04T12:15:05.3154373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.3154554Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.3155153Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.3155373Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.3155826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.3156086Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.3156538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.3156872Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.3157115Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.3157549Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3157644Z ^
2025-12-04T12:15:05.3158120Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3158126Z 
2025-12-04T12:15:05.3158840Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.3158846Z 
2025-12-04T12:15:05.3158853Z 
2025-12-04T12:15:05.3159088Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3159721Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.3159730Z 
2025-12-04T12:15:05.3160018Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3160246Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3160354Z frames [('total', 1)]
2025-12-04T12:15:05.3160518Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3160756Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3161009Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3161124Z graph_break []
2025-12-04T12:15:05.3161351Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3161468Z frames [('total', 1)]
2025-12-04T12:15:05.3161583Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3161799Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3162050Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3162149Z graph_break []
2025-12-04T12:15:05.3162297Z =================================== FAILURES ===================================
2025-12-04T12:15:05.3162601Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___
2025-12-04T12:15:05.3162726Z Traceback (most recent call last):
2025-12-04T12:15:05.3163128Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.3163295Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.3163784Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.3164044Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.3164588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.3164782Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.3165303Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.3165450Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.3165997Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.3166322Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.3166842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.3167006Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.3167489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.3167626Z     return self._compile_to_module()
2025-12-04T12:15:05.3168113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.3168278Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.3168811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.3168944Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.3169444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.3169690Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.3170277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.3170419Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.3171160Z   File "/tmp/tmpljfnr47f/fd/cfdvrn6jloli4lt23dbt7sa6q4q3a6x3p4dfntgshsjvgzhlvbzs.py", line 118, in <module>
2025-12-04T12:15:05.3171631Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.3171757Z     kernel.precompile(
2025-12-04T12:15:05.3172310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.3172580Z     self._precompile_worker()
2025-12-04T12:15:05.3173242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.3173425Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.3174032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.3174855Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.3175314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.3175573Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.3176020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.3176438Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.3176674Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.3177109Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3177219Z ^
2025-12-04T12:15:05.3177678Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3177765Z 
2025-12-04T12:15:05.3178496Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.3178502Z 
2025-12-04T12:15:05.3178507Z 
2025-12-04T12:15:05.3178727Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3179322Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.3179345Z 
2025-12-04T12:15:05.3179618Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3179849Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3179974Z frames [('total', 1)]
2025-12-04T12:15:05.3180092Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3180332Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3180570Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3180675Z graph_break []
2025-12-04T12:15:05.3180897Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3181013Z frames [('total', 1)]
2025-12-04T12:15:05.3181131Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3181362Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3181599Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3181701Z graph_break []
2025-12-04T12:15:05.3181930Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3182035Z frames [('total', 1)]
2025-12-04T12:15:05.3182150Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3182379Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3182614Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3182712Z graph_break []
2025-12-04T12:15:05.3183430Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-03aedafc0832726c.xml -
2025-12-04T12:15:05.3183608Z =========================== short test summary info ============================
2025-12-04T12:15:05.3184367Z FAILED [0.4755s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.3184835Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3184926Z ^
2025-12-04T12:15:05.3185431Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3185437Z 
2025-12-04T12:15:05.3186151Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.3186160Z 
2025-12-04T12:15:05.3186167Z 
2025-12-04T12:15:05.3186400Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3187031Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.3187040Z 
2025-12-04T12:15:05.3187418Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3187606Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.3187808Z ================== 1 failed, 14 deselected, 2 rerun in 4.66s ===================
2025-12-04T12:15:05.3187929Z Got exit code 1
2025-12-04T12:15:05.3188038Z Retrying single test...
2025-12-04T12:15:05.3188507Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-89171bcc48f05a69.xml
2025-12-04T12:15:05.3188723Z ============================= test session starts ==============================
2025-12-04T12:15:05.3189080Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.3189203Z cachedir: .pytest_cache
2025-12-04T12:15:05.3189723Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.3189848Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.3189973Z configfile: pytest.ini
2025-12-04T12:15:05.3190564Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.3190790Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.3191484Z stepcurrent: skipping 14 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.3191604Z Running 1 items in this shard
2025-12-04T12:15:05.3191609Z 
2025-12-04T12:15:05.3192785Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T12:15:05.3193660Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3194106Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.3194553Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 160
2025-12-04T12:15:05.3195071Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.3195590Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.3196129Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.3196681Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.3197294Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.3197915Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.3198485Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.3198936Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.3199467Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.3199938Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.3200398Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.3200866Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.3201459Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T12:15:05.3202031Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.3202576Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.3203169Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.3203738Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T12:15:05.3204368Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.3204895Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T12:15:05.3205368Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.3205826Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.3206395Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.3206835Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.3207424Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.3207953Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.3208726Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.3209092Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.3211162Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.3211735Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.3212804Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.3213434Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.3214321Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.3215018Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.3215900Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.3216796Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.3217411Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.3218299Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3218669Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.3219896Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3220055Z ('RERUN', {'yellow': True}) [3.6558s] [100%]
2025-12-04T12:15:05.3221219Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T12:15:05.3222106Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3222537Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.3222986Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 160
2025-12-04T12:15:05.3223580Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.3224048Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.3224593Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.3225183Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.3225817Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.3226400Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.3226966Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.3227426Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.3227942Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.3228428Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.3228895Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.3229342Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.3229994Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T12:15:05.3230521Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.3231082Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.3231664Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.3232237Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T12:15:05.3232883Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.3233399Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T12:15:05.3233882Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.3234324Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.3234894Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.3235355Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.3235933Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.3236481Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.3237228Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.3237613Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.3239682Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.3240264Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.3241306Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.3241937Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.3242843Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.3243523Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.3244462Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.3245232Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.3245855Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.3246734Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3247103Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.3248003Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3248141Z ('RERUN', {'yellow': True}) [0.4912s] [100%]
2025-12-04T12:15:05.3249312Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T12:15:05.3250187Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3250626Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.3251206Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 160
2025-12-04T12:15:05.3251731Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.3252205Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.3252739Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.3253371Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.3253953Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.3254543Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.3255115Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.3255559Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.3256089Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.3256652Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.3257110Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.3257617Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.3258212Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T12:15:05.3258743Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.3259286Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.3259867Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.3260452Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T12:15:05.3261077Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.3261600Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T12:15:05.3262065Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.3262517Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.3263087Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.3263531Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.3264115Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.3264687Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.3265427Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.3265790Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.3267882Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.3268438Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.3269480Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.3270124Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.3271208Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.3271997Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.3272880Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.3273661Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.3274275Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.3275142Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3275526Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.3276421Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3276544Z FAILED [0.4915s] [100%]
2025-12-04T12:15:05.3276550Z 
2025-12-04T12:15:05.3276700Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.3277015Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___
2025-12-04T12:15:05.3277143Z Traceback (most recent call last):
2025-12-04T12:15:05.3277548Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.3277719Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.3278216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.3278531Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.3279061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.3279258Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.3279783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.3279973Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.3280553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.3280891Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.3281415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.3281581Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.3282061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.3282185Z     return self._compile_to_module()
2025-12-04T12:15:05.3282680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.3282847Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.3283363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.3283508Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.3284005Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.3284307Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.3284897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.3285030Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.3285555Z   File "/tmp/tmp01ubla4e/kp/ckpzfcagqn7xfv6oamrbmiph5ypibzgec26gkdhczk653eo546h4.py", line 118, in <module>
2025-12-04T12:15:05.3286024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.3286153Z     kernel.precompile(
2025-12-04T12:15:05.3286710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.3286829Z     self._precompile_worker()
2025-12-04T12:15:05.3287442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.3287624Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.3288220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.3288436Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.3288888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.3289149Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.3289597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.3289932Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.3290174Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.3290608Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3290713Z ^
2025-12-04T12:15:05.3291211Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3291217Z 
2025-12-04T12:15:05.3291933Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.3291940Z 
2025-12-04T12:15:05.3291976Z 
2025-12-04T12:15:05.3292207Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3292838Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.3292845Z 
2025-12-04T12:15:05.3293125Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3293353Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3293459Z frames [('total', 1)]
2025-12-04T12:15:05.3293591Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3293828Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3294050Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3294163Z graph_break []
2025-12-04T12:15:05.3294456Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___
2025-12-04T12:15:05.3294596Z Traceback (most recent call last):
2025-12-04T12:15:05.3294992Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.3295147Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.3295652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.3295902Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.3296568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.3296767Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.3297275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.3297439Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.3297977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.3298302Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.3298842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.3298991Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.3299494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.3299618Z     return self._compile_to_module()
2025-12-04T12:15:05.3300104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.3300283Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.3300798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.3300947Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.3301448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.3301680Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.3302279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.3302412Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.3302971Z   File "/tmp/tmpkysq29mq/fh/cfh37imijtyduvrv3zgukdrayd3uijhppizkbtkqlz5rozpxioov.py", line 118, in <module>
2025-12-04T12:15:05.3303448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.3303560Z     kernel.precompile(
2025-12-04T12:15:05.3304129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.3304285Z     self._precompile_worker()
2025-12-04T12:15:05.3304910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.3305104Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.3305698Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.3305915Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.3306365Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.3306614Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.3307068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.3307406Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.3307632Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.3308080Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3308170Z ^
2025-12-04T12:15:05.3308659Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3308703Z 
2025-12-04T12:15:05.3309422Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.3309429Z 
2025-12-04T12:15:05.3309434Z 
2025-12-04T12:15:05.3309664Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3310268Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.3310276Z 
2025-12-04T12:15:05.3310547Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3310788Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3310896Z frames [('total', 1)]
2025-12-04T12:15:05.3311015Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3311271Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3311497Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3311614Z graph_break []
2025-12-04T12:15:05.3311834Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3311939Z frames [('total', 1)]
2025-12-04T12:15:05.3312069Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3312289Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3312525Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3312644Z graph_break []
2025-12-04T12:15:05.3312793Z =================================== FAILURES ===================================
2025-12-04T12:15:05.3313105Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___
2025-12-04T12:15:05.3313233Z Traceback (most recent call last):
2025-12-04T12:15:05.3313633Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.3313804Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.3314361Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.3314616Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.3315146Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.3315377Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.3316287Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.3316482Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.3317026Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.3317369Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.3317892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.3318054Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.3318537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.3318661Z     return self._compile_to_module()
2025-12-04T12:15:05.3319168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.3319341Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.3319861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.3320013Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.3320568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.3320818Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.3321404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.3321538Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.3322064Z   File "/tmp/tmpfg502zcq/db/cdbad4etsaltwm57df5tkxb7v74k6iqxee6c3vhydwqlnukcffo2.py", line 118, in <module>
2025-12-04T12:15:05.3322532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.3322660Z     kernel.precompile(
2025-12-04T12:15:05.3323217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.3323337Z     self._precompile_worker()
2025-12-04T12:15:05.3323951Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.3324132Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.3324726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.3324936Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.3325387Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.3325643Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.3326088Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.3326420Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.3326660Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.3327126Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3327229Z ^
2025-12-04T12:15:05.3327686Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3327692Z 
2025-12-04T12:15:05.3328405Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.3328449Z 
2025-12-04T12:15:05.3328454Z 
2025-12-04T12:15:05.3328712Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3329308Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.3329316Z 
2025-12-04T12:15:05.3329791Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3330022Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3330128Z frames [('total', 1)]
2025-12-04T12:15:05.3330260Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3330496Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3330717Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3330838Z graph_break []
2025-12-04T12:15:05.3331058Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3331180Z frames [('total', 1)]
2025-12-04T12:15:05.3331296Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3331519Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3331764Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3331905Z graph_break []
2025-12-04T12:15:05.3332125Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3332247Z frames [('total', 1)]
2025-12-04T12:15:05.3332361Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3332588Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3332819Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3332917Z graph_break []
2025-12-04T12:15:05.3333581Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-89171bcc48f05a69.xml -
2025-12-04T12:15:05.3333757Z =========================== short test summary info ============================
2025-12-04T12:15:05.3334508Z FAILED [0.4915s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.3334949Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3335041Z ^
2025-12-04T12:15:05.3335512Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3335517Z 
2025-12-04T12:15:05.3336226Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.3336232Z 
2025-12-04T12:15:05.3336239Z 
2025-12-04T12:15:05.3336539Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3337146Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.3337151Z 
2025-12-04T12:15:05.3337423Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3337626Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.3337837Z ================== 1 failed, 187 deselected, 2 rerun in 4.68s ==================
2025-12-04T12:15:05.3337983Z Got exit code 1
2025-12-04T12:15:05.3338113Z Retrying single test...
2025-12-04T12:15:05.3338584Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6450e334481f0131.xml
2025-12-04T12:15:05.3338766Z ============================= test session starts ==============================
2025-12-04T12:15:05.3339121Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.3339269Z cachedir: .pytest_cache
2025-12-04T12:15:05.3339835Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.3339962Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.3340073Z configfile: pytest.ini
2025-12-04T12:15:05.3340683Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.3340912Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.3341599Z stepcurrent: skipping 14 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.3341719Z Running 1 items in this shard
2025-12-04T12:15:05.3341725Z 
2025-12-04T12:15:05.3342896Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T12:15:05.3343784Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3344248Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.3344706Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 160
2025-12-04T12:15:05.3345226Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.3345702Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.3346238Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.3346783Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.3347378Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.3347962Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.3348532Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.3348973Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.3349491Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.3349983Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.3350439Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.3350936Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.3351536Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T12:15:05.3352057Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.3352641Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.3353255Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.3353843Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T12:15:05.3354477Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.3354994Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T12:15:05.3355460Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.3355905Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.3356488Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.3356924Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.3357546Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.3358082Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.3358792Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.3359173Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.3361202Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.3361751Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.3362794Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.3363437Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.3364331Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.3365061Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.3365947Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.3366747Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.3367443Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.3368323Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3368704Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.3369601Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3369752Z ('RERUN', {'yellow': True}) [3.6549s] [100%]
2025-12-04T12:15:05.3370911Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T12:15:05.3371965Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3372501Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.3372947Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 160
2025-12-04T12:15:05.3373482Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.3373950Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.3374484Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.3375045Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.3375637Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.3376233Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.3376848Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.3377311Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.3377835Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.3378305Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.3378842Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.3379296Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.3379907Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T12:15:05.3380431Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.3381065Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.3381662Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.3382239Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T12:15:05.3382872Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.3383380Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T12:15:05.3383846Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.3384306Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.3384867Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.3385357Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.3385931Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.3386468Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.3387195Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.3387561Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.3389593Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.3390132Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.3391204Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.3391832Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.3392790Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.3393470Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.3394353Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.3395205Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.3395814Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.3396705Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3397070Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.3397974Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3398113Z ('RERUN', {'yellow': True}) [0.4917s] [100%]
2025-12-04T12:15:05.3399278Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T12:15:05.3400213Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3400646Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.3401109Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 160
2025-12-04T12:15:05.3401631Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.3402117Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.3402654Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.3403198Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.3406779Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.3407402Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.3407969Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.3408429Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.3408948Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.3409504Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.3409966Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.3410449Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.3411048Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T12:15:05.3411623Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.3412166Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T12:15:05.3412766Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.3413342Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T12:15:05.3413974Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.3414495Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T12:15:05.3414972Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T12:15:05.3415429Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T12:15:05.3416040Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T12:15:05.3416595Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T12:15:05.3417190Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T12:15:05.3417722Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T12:15:05.3418450Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T12:15:05.3418813Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.3420944Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.3421482Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.3422523Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.3423169Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.3424125Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.3424819Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.3425733Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.3426515Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.3427127Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.3428009Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3428377Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.3429269Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3429390Z FAILED [0.4894s] [100%]
2025-12-04T12:15:05.3429397Z 
2025-12-04T12:15:05.3429546Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.3429894Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___
2025-12-04T12:15:05.3430025Z Traceback (most recent call last):
2025-12-04T12:15:05.3430421Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.3430596Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.3431090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.3431342Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.3431872Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.3432066Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.3432589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.3432742Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.3433279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.3433613Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.3434191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.3434356Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.3434838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.3434960Z     return self._compile_to_module()
2025-12-04T12:15:05.3435456Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.3435620Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.3436175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.3436319Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.3436826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.3437075Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.3437664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.3437825Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.3438323Z   File "/tmp/tmpfscsojs_/ot/cot6ucnhr5vlnabl6vdrqw37vysspn5uhkni7ol5os7ojegf4fnd.py", line 118, in <module>
2025-12-04T12:15:05.3438787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.3438916Z     kernel.precompile(
2025-12-04T12:15:05.3439472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.3439592Z     self._precompile_worker()
2025-12-04T12:15:05.3440207Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.3440391Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.3440989Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.3441204Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.3441656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.3441918Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.3442400Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.3442743Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.3442983Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.3443419Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3443526Z ^
2025-12-04T12:15:05.3443984Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3443990Z 
2025-12-04T12:15:05.3444699Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.3444705Z 
2025-12-04T12:15:05.3444722Z 
2025-12-04T12:15:05.3444944Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3445543Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.3445549Z 
2025-12-04T12:15:05.3445832Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3446099Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3446207Z frames [('total', 1)]
2025-12-04T12:15:05.3446339Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3446580Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3446815Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3446917Z graph_break []
2025-12-04T12:15:05.3447211Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___
2025-12-04T12:15:05.3447348Z Traceback (most recent call last):
2025-12-04T12:15:05.3447750Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.3447939Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.3448445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.3448698Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.3449223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.3449452Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.3449965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.3450124Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.3450655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.3450981Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.3451513Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.3451663Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.3452157Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.3452282Z     return self._compile_to_module()
2025-12-04T12:15:05.3452769Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.3452947Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.3453464Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.3453718Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.3454219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.3454454Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.3455055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.3455183Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.3455662Z   File "/tmp/tmp_07vpckh/cq/ccqyax4rpybxh3lrtsr6g6wsyqiu2bsp56wq25f7dxxumkgep4du.py", line 118, in <module>
2025-12-04T12:15:05.3456141Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.3456254Z     kernel.precompile(
2025-12-04T12:15:05.3456901Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.3457023Z     self._precompile_worker()
2025-12-04T12:15:05.3457622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.3457816Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.3458454Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.3458671Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.3459125Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.3459372Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.3459833Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.3460168Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.3460414Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.3460875Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3460971Z ^
2025-12-04T12:15:05.3461441Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3461447Z 
2025-12-04T12:15:05.3462158Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.3462195Z 
2025-12-04T12:15:05.3462200Z 
2025-12-04T12:15:05.3462431Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3463027Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.3463035Z 
2025-12-04T12:15:05.3463306Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3463547Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3463652Z frames [('total', 1)]
2025-12-04T12:15:05.3463782Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3464022Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3464241Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3464357Z graph_break []
2025-12-04T12:15:05.3464576Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3464680Z frames [('total', 1)]
2025-12-04T12:15:05.3464808Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3465025Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3465258Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3465403Z graph_break []
2025-12-04T12:15:05.3465551Z =================================== FAILURES ===================================
2025-12-04T12:15:05.3465860Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___
2025-12-04T12:15:05.3465984Z Traceback (most recent call last):
2025-12-04T12:15:05.3466385Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T12:15:05.3466556Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T12:15:05.3467051Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.3467301Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.3467827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.3468021Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.3468544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.3468694Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.3469284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.3469620Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.3470139Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.3470306Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.3470783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.3471187Z     return self._compile_to_module()
2025-12-04T12:15:05.3471679Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.3471946Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.3472480Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.3472617Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.3473130Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.3473407Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.3473993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.3474136Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.3474645Z   File "/tmp/tmpywtwoc7l/gs/cgscbnakwbloqkvnt62ojqax3g45mhzel7n3jitvuxdbb575ndqo.py", line 118, in <module>
2025-12-04T12:15:05.3475133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.3475251Z     kernel.precompile(
2025-12-04T12:15:05.3475805Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.3475944Z     self._precompile_worker()
2025-12-04T12:15:05.3476540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.3476724Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.3477331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.3477532Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.3478000Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.3478296Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.3478746Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.3479105Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.3479334Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.3479782Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3479878Z ^
2025-12-04T12:15:05.3480339Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3480344Z 
2025-12-04T12:15:05.3481070Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.3481079Z 
2025-12-04T12:15:05.3481084Z 
2025-12-04T12:15:05.3481308Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3481915Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.3481972Z 
2025-12-04T12:15:05.3482247Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3482473Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3482602Z frames [('total', 1)]
2025-12-04T12:15:05.3482722Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3482975Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3483199Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3483300Z graph_break []
2025-12-04T12:15:05.3483536Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3483644Z frames [('total', 1)]
2025-12-04T12:15:05.3483797Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3484033Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3484267Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3484372Z graph_break []
2025-12-04T12:15:05.3484605Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3484711Z frames [('total', 1)]
2025-12-04T12:15:05.3484878Z stats [('calls_captured', 6)]
2025-12-04T12:15:05.3485099Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3485329Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3485447Z graph_break []
2025-12-04T12:15:05.3486095Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6450e334481f0131.xml -
2025-12-04T12:15:05.3486273Z =========================== short test summary info ============================
2025-12-04T12:15:05.3487041Z FAILED [0.4894s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.3487473Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.3487576Z ^
2025-12-04T12:15:05.3488034Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.3488043Z 
2025-12-04T12:15:05.3488747Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.3488765Z 
2025-12-04T12:15:05.3488770Z 
2025-12-04T12:15:05.3488987Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3489624Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.3489630Z 
2025-12-04T12:15:05.3489908Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3490091Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.3490306Z ================== 1 failed, 187 deselected, 2 rerun in 4.68s ==================
2025-12-04T12:15:05.3490410Z Got exit code 1
2025-12-04T12:15:05.3490923Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:05.3491346Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:05.3491818Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f7999da795e3cf34.xml
2025-12-04T12:15:05.3491988Z ============================= test session starts ==============================
2025-12-04T12:15:05.3492353Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.3492465Z cachedir: .pytest_cache
2025-12-04T12:15:05.3493035Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.3493163Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.3493274Z configfile: pytest.ini
2025-12-04T12:15:05.3493887Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.3494116Z collecting ... collected 188 items / 15 deselected / 173 selected
2025-12-04T12:15:05.3494261Z stepcurrent: skipping 15 already run items.
2025-12-04T12:15:05.3494391Z Running 173 items in this shard
2025-12-04T12:15:05.3494399Z 
2025-12-04T12:15:05.3494852Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,1,15_cuda PASSED [3.3639s] [  0%]
2025-12-04T12:15:05.3495346Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,15_cuda PASSED [0.2656s] [  1%]
2025-12-04T12:15:05.3495803Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,4096_cuda PASSED [0.5887s] [  1%]
2025-12-04T12:15:05.3496251Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,512_cuda PASSED [0.2871s] [  2%]
2025-12-04T12:15:05.3496840Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.4899s] [  2%]
2025-12-04T12:15:05.3497435Z inductor/test_fp8.py::TestFP8TypesCUDA::test_bad_cast_cuda SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  3%]
2025-12-04T12:15:05.3498316Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 W1204 11:53:06.009000 115060 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T12:15:05.3498456Z ('RERUN', {'yellow': True}) [0.4279s] [  4%]
2025-12-04T12:15:05.3498967Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4763s] [  4%]
2025-12-04T12:15:05.3499402Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 FAILED [0.4524s] [  4%]
2025-12-04T12:15:05.3499408Z 
2025-12-04T12:15:05.3499552Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.3499857Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________
2025-12-04T12:15:05.3499984Z Traceback (most recent call last):
2025-12-04T12:15:05.3500381Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T12:15:05.3500543Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T12:15:05.3501075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T12:15:05.3501205Z     return fn(*args, **kwargs)
2025-12-04T12:15:05.3501600Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 113, in fp8_matmul_unwrapped
2025-12-04T12:15:05.3501716Z     output = torch._scaled_mm(
2025-12-04T12:15:05.3502194Z RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+
2025-12-04T12:15:05.3502200Z 
2025-12-04T12:15:05.3502421Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3502970Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T12:15:05.3502976Z 
2025-12-04T12:15:05.3503246Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3503469Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3503601Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:15:05.3503718Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.3503941Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3504295Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3504431Z graph_break []
2025-12-04T12:15:05.3504625Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.3504843Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.3505587Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:15:05.3505707Z   warnings.warn(
2025-12-04T12:15:05.3505994Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________
2025-12-04T12:15:05.3506118Z Traceback (most recent call last):
2025-12-04T12:15:05.3506527Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T12:15:05.3506705Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T12:15:05.3507208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T12:15:05.3507325Z     return fn(*args, **kwargs)
2025-12-04T12:15:05.3507720Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 113, in fp8_matmul_unwrapped
2025-12-04T12:15:05.3507879Z     output = torch._scaled_mm(
2025-12-04T12:15:05.3508340Z RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+
2025-12-04T12:15:05.3508346Z 
2025-12-04T12:15:05.3508561Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3509109Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T12:15:05.3509117Z 
2025-12-04T12:15:05.3509385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3509623Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3509739Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:15:05.3509856Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.3510093Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3510433Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3510547Z graph_break []
2025-12-04T12:15:05.3510724Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.3510941Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.3512279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:15:05.3512461Z   warnings.warn(
2025-12-04T12:15:05.3512678Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3512808Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:15:05.3512926Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.3513165Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3513508Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3513609Z graph_break []
2025-12-04T12:15:05.3513803Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.3514018Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.3514748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:15:05.3514864Z   warnings.warn(
2025-12-04T12:15:05.3515015Z =================================== FAILURES ===================================
2025-12-04T12:15:05.3515313Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________
2025-12-04T12:15:05.3515437Z Traceback (most recent call last):
2025-12-04T12:15:05.3515834Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T12:15:05.3516039Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T12:15:05.3516529Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T12:15:05.3516644Z     return fn(*args, **kwargs)
2025-12-04T12:15:05.3517050Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 113, in fp8_matmul_unwrapped
2025-12-04T12:15:05.3517167Z     output = torch._scaled_mm(
2025-12-04T12:15:05.3517637Z RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+
2025-12-04T12:15:05.3517646Z 
2025-12-04T12:15:05.3517865Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3518436Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T12:15:05.3518442Z 
2025-12-04T12:15:05.3518727Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3518945Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3519075Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:15:05.3519224Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.3519447Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3519799Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3519900Z graph_break []
2025-12-04T12:15:05.3520075Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.3520309Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.3521040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:15:05.3521158Z   warnings.warn(
2025-12-04T12:15:05.3521382Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3521496Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:15:05.3521627Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.3521851Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3522190Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3522310Z graph_break []
2025-12-04T12:15:05.3522486Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.3522703Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.3523482Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:15:05.3523586Z   warnings.warn(
2025-12-04T12:15:05.3523811Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3523926Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:15:05.3524040Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.3524358Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3524759Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3524859Z graph_break []
2025-12-04T12:15:05.3525048Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.3525265Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.3526002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:15:05.3526107Z   warnings.warn(
2025-12-04T12:15:05.3526759Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f7999da795e3cf34.xml -
2025-12-04T12:15:05.3526995Z =========================== short test summary info ============================
2025-12-04T12:15:05.3527952Z FAILED [0.4524s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 - RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+
2025-12-04T12:15:05.3527961Z 
2025-12-04T12:15:05.3528193Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3528731Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T12:15:05.3528737Z 
2025-12-04T12:15:05.3529009Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3529242Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.3529485Z ======== 1 failed, 5 passed, 1 skipped, 15 deselected, 2 rerun in 6.41s ========
2025-12-04T12:15:05.3529605Z Got exit code 1
2025-12-04T12:15:05.3529716Z Retrying single test...
2025-12-04T12:15:05.3530198Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ad7a38726bbc8b50.xml
2025-12-04T12:15:05.3530379Z ============================= test session starts ==============================
2025-12-04T12:15:05.3530765Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.3530875Z cachedir: .pytest_cache
2025-12-04T12:15:05.3531413Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.3531543Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.3531672Z configfile: pytest.ini
2025-12-04T12:15:05.3532268Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.3532494Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.3533131Z stepcurrent: skipping 21 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T12:15:05.3533251Z Running 1 items in this shard
2025-12-04T12:15:05.3533259Z 
2025-12-04T12:15:05.3534207Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 11:53:23.880322369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3534213Z 
2025-12-04T12:15:05.3534732Z [W1204 11:53:39.749253585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3534778Z 
2025-12-04T12:15:05.3535310Z [W1204 11:53:39.749501944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3535316Z 
2025-12-04T12:15:05.3535834Z [W1204 11:53:39.752604165 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3535839Z 
2025-12-04T12:15:05.3536425Z [W1204 11:53:39.752803535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3536434Z 
2025-12-04T12:15:05.3536958Z [W1204 11:53:39.754748216 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3536963Z 
2025-12-04T12:15:05.3537473Z [W1204 11:53:39.755102951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3537480Z 
2025-12-04T12:15:05.3538008Z [W1204 11:53:39.755275209 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3538013Z 
2025-12-04T12:15:05.3538586Z [W1204 11:53:39.755806379 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3538591Z 
2025-12-04T12:15:05.3539115Z [W1204 11:53:39.755992289 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3539122Z 
2025-12-04T12:15:05.3539633Z [W1204 11:53:39.756564136 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3539638Z 
2025-12-04T12:15:05.3540161Z [W1204 11:53:39.756744702 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3540169Z 
2025-12-04T12:15:05.3540715Z [W1204 11:53:39.757161730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3540720Z 
2025-12-04T12:15:05.3541232Z [W1204 11:53:39.757340073 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3541254Z 
2025-12-04T12:15:05.3541765Z [W1204 11:53:39.757717761 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3541801Z 
2025-12-04T12:15:05.3542314Z [W1204 11:53:39.757895374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3542319Z 
2025-12-04T12:15:05.3542846Z [W1204 11:53:39.758276180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3542853Z 
2025-12-04T12:15:05.3543360Z [W1204 11:53:39.758456309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3543368Z 
2025-12-04T12:15:05.3543850Z W1204 11:53:39.719000 115336 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T12:15:05.3544363Z [W1204 11:53:39.127990794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3544368Z 
2025-12-04T12:15:05.3544522Z ('RERUN', {'yellow': True}) [19.3476s] [100%]
2025-12-04T12:15:05.3545450Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 11:53:40.742179499 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3545456Z 
2025-12-04T12:15:05.3545970Z [W1204 11:53:40.742608328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3546024Z 
2025-12-04T12:15:05.3546537Z [W1204 11:53:40.742792868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3546542Z 
2025-12-04T12:15:05.3547059Z [W1204 11:53:40.743381348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3547064Z 
2025-12-04T12:15:05.3547586Z [W1204 11:53:40.743573437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3547594Z 
2025-12-04T12:15:05.3548105Z [W1204 11:53:40.743937530 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3548110Z 
2025-12-04T12:15:05.3548637Z [W1204 11:53:40.744226615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3548644Z 
2025-12-04T12:15:05.3549158Z [W1204 11:53:40.744393221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3549162Z 
2025-12-04T12:15:05.3549726Z [W1204 11:53:40.744877476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3549732Z 
2025-12-04T12:15:05.3550244Z [W1204 11:53:40.745056507 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3550251Z 
2025-12-04T12:15:05.3550775Z [W1204 11:53:40.745498131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3550780Z 
2025-12-04T12:15:05.3551289Z [W1204 11:53:40.745678025 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3551296Z 
2025-12-04T12:15:05.3551848Z [W1204 11:53:40.746064328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3551853Z 
2025-12-04T12:15:05.3552376Z [W1204 11:53:40.746240294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3552383Z 
2025-12-04T12:15:05.3552892Z [W1204 11:53:40.746601410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3552949Z 
2025-12-04T12:15:05.3553474Z [W1204 11:53:40.746777402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3553479Z 
2025-12-04T12:15:05.3553982Z [W1204 11:53:40.747147147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3553989Z 
2025-12-04T12:15:05.3554506Z [W1204 11:53:40.747323648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3554513Z 
2025-12-04T12:15:05.3555023Z [W1204 11:53:40.841731853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3555028Z 
2025-12-04T12:15:05.3555172Z ('RERUN', {'yellow': True}) [0.4822s] [100%]
2025-12-04T12:15:05.3556100Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 11:53:40.201171908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3556109Z 
2025-12-04T12:15:05.3556949Z [W1204 11:53:40.201604707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3556969Z 
2025-12-04T12:15:05.3557484Z [W1204 11:53:40.201790233 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3557545Z 
2025-12-04T12:15:05.3558057Z [W1204 11:53:40.202360700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3558062Z 
2025-12-04T12:15:05.3558588Z [W1204 11:53:40.202554983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3558594Z 
2025-12-04T12:15:05.3559103Z [W1204 11:53:40.202913615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3559111Z 
2025-12-04T12:15:05.3559633Z [W1204 11:53:40.203212713 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3559638Z 
2025-12-04T12:15:05.3560149Z [W1204 11:53:40.203380198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3560156Z 
2025-12-04T12:15:05.3560678Z [W1204 11:53:40.203833442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3560683Z 
2025-12-04T12:15:05.3561236Z [W1204 11:53:40.204013243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3561242Z 
2025-12-04T12:15:05.3561754Z [W1204 11:53:40.204462945 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3561775Z 
2025-12-04T12:15:05.3562286Z [W1204 11:53:40.204657249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3562291Z 
2025-12-04T12:15:05.3562800Z [W1204 11:53:40.205044154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3562808Z 
2025-12-04T12:15:05.3563371Z [W1204 11:53:40.205221330 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3563377Z 
2025-12-04T12:15:05.3563890Z [W1204 11:53:40.205582023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3563896Z 
2025-12-04T12:15:05.3564417Z [W1204 11:53:40.205756718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3564457Z 
2025-12-04T12:15:05.3564964Z [W1204 11:53:40.206118057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3564969Z 
2025-12-04T12:15:05.3565486Z [W1204 11:53:40.206292667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3565494Z 
2025-12-04T12:15:05.3566002Z [W1204 11:53:40.302170471 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3566007Z 
2025-12-04T12:15:05.3566124Z FAILED [0.4588s] [100%]
2025-12-04T12:15:05.3566129Z 
2025-12-04T12:15:05.3566275Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.3566564Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________
2025-12-04T12:15:05.3566702Z Traceback (most recent call last):
2025-12-04T12:15:05.3567102Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T12:15:05.3567249Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T12:15:05.3567756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T12:15:05.3567869Z     return fn(*args, **kwargs)
2025-12-04T12:15:05.3568316Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 113, in fp8_matmul_unwrapped
2025-12-04T12:15:05.3568433Z     output = torch._scaled_mm(
2025-12-04T12:15:05.3568899Z RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+
2025-12-04T12:15:05.3569567Z Exception raised from _scaled_mm_out_cuda at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/ScaledBlas.cpp:492 (most recent call first):
2025-12-04T12:15:05.3569681Z C++ CapturedTraceback:
2025-12-04T12:15:05.3571240Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T12:15:05.3571731Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T12:15:05.3572074Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T12:15:05.3573049Z #7 at::native::_scaled_mm_out_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool, at::Tensor&) from ??:0
2025-12-04T12:15:05.3573855Z #8 at::native::_scaled_mm_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from ??:0
2025-12-04T12:15:05.3574977Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from RegisterCUDA_0.cpp:0
2025-12-04T12:15:05.3578365Z #10 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterCUDA_0.cpp:0
2025-12-04T12:15:05.3579320Z #11 torch::autograd::autogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from autograd_not_implemented_fallback.cpp:0
2025-12-04T12:15:05.3580117Z #12 at::_ops::_scaled_mm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from ??:0
2025-12-04T12:15:05.3580534Z #13 torch::autograd::THPVariable__scaled_mm(_object*, _object*, _object*) from python_torch_functions_2.cpp:0
2025-12-04T12:15:05.3580859Z #14 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T12:15:05.3581184Z #15 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3581600Z #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3581730Z #17 dynamo__custom_eval_frame from :0
2025-12-04T12:15:05.3582126Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3582392Z #19 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3582820Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3583239Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3583615Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3583925Z #23 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T12:15:05.3584188Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3584560Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3584832Z #26 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3585201Z #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3585476Z #28 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3585848Z #29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3586105Z #30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3586529Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3586936Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3587322Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3587727Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3588093Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3588516Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3588918Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3589327Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3589712Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3590116Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3590534Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3590830Z #42 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T12:15:05.3591087Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3591469Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3591821Z #45 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3592142Z #46 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3592441Z #47 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3592745Z #48 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3593164Z #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3593536Z #50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3593952Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3594323Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3594618Z #53 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3595001Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3595406Z #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3595776Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3596195Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3596567Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3596927Z #59 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3597234Z #60 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3597530Z #61 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3597815Z #62 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T12:15:05.3598075Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3598502Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3598909Z #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3599283Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3599706Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3600076Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3600339Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3600729Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3601188Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3601574Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3601980Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3602382Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3602657Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3603028Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3603449Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3603822Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3604230Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3604612Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3604962Z #81 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3605280Z #82 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3605576Z #83 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3605879Z #84 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3606294Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3606694Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3606957Z #87 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3607343Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3607751Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3608130Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3608535Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3608903Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3609284Z #93 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3609593Z #94 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3609906Z #95 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3610210Z #96 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3610672Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3611063Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3611472Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3611872Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3612291Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3612672Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3612959Z #103 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3613372Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3613793Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3614188Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3614639Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3615032Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3615393Z #109 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3615705Z #110 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3616031Z #111 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3616440Z #112 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3616874Z #113 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3617263Z #114 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3617679Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3618080Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3618497Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3618891Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3619342Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3619727Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3620157Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3620537Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3620831Z #123 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T12:15:05.3621159Z #124 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T12:15:05.3621434Z #125 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T12:15:05.3621733Z #126 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T12:15:05.3622092Z #127 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T12:15:05.3622424Z #128 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T12:15:05.3622732Z #129 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T12:15:05.3623037Z #130 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T12:15:05.3623324Z #131 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T12:15:05.3623525Z #132 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T12:15:05.3623633Z #133 _start from ??:0
2025-12-04T12:15:05.3623772Z #134 <unwind unsupported> from ??:0
2025-12-04T12:15:05.3623778Z 
2025-12-04T12:15:05.3623783Z 
2025-12-04T12:15:05.3624004Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3624546Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T12:15:05.3624568Z 
2025-12-04T12:15:05.3624838Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3625098Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3625233Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:15:05.3625349Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.3625693Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3625933Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3626063Z graph_break []
2025-12-04T12:15:05.3626252Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.3626471Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.3627679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T12:15:05.3627814Z   if out == self.unknown_value:
2025-12-04T12:15:05.3628541Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:15:05.3628648Z   warnings.warn(
2025-12-04T12:15:05.3628949Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________
2025-12-04T12:15:05.3629075Z Traceback (most recent call last):
2025-12-04T12:15:05.3629485Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T12:15:05.3629630Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T12:15:05.3630125Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T12:15:05.3630251Z     return fn(*args, **kwargs)
2025-12-04T12:15:05.3630684Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 113, in fp8_matmul_unwrapped
2025-12-04T12:15:05.3630799Z     output = torch._scaled_mm(
2025-12-04T12:15:05.3631275Z RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+
2025-12-04T12:15:05.3631933Z Exception raised from _scaled_mm_out_cuda at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/ScaledBlas.cpp:492 (most recent call first):
2025-12-04T12:15:05.3632058Z C++ CapturedTraceback:
2025-12-04T12:15:05.3633376Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T12:15:05.3633879Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T12:15:05.3634221Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T12:15:05.3635130Z #7 at::native::_scaled_mm_out_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool, at::Tensor&) from ??:0
2025-12-04T12:15:05.3635949Z #8 at::native::_scaled_mm_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from ??:0
2025-12-04T12:15:05.3637059Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from RegisterCUDA_0.cpp:0
2025-12-04T12:15:05.3640357Z #10 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterCUDA_0.cpp:0
2025-12-04T12:15:05.3641366Z #11 torch::autograd::autogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from autograd_not_implemented_fallback.cpp:0
2025-12-04T12:15:05.3642173Z #12 at::_ops::_scaled_mm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from ??:0
2025-12-04T12:15:05.3642578Z #13 torch::autograd::THPVariable__scaled_mm(_object*, _object*, _object*) from python_torch_functions_2.cpp:0
2025-12-04T12:15:05.3642917Z #14 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T12:15:05.3643226Z #15 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3643638Z #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3643777Z #17 dynamo__custom_eval_frame from :0
2025-12-04T12:15:05.3644153Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3644430Z #19 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3644838Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3645246Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3645636Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3645931Z #23 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T12:15:05.3646208Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3646580Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3646838Z #26 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3647224Z #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3647485Z #28 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3647860Z #29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3648130Z #30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3648557Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3648977Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3649351Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3649756Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3650141Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3650550Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3650964Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3651369Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3651740Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3652156Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3652558Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3652865Z #42 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T12:15:05.3653125Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3653492Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3653859Z #45 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3654169Z #46 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3654466Z #47 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3654780Z #48 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3655184Z #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3655573Z #50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3655977Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3656416Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3656729Z #53 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3657102Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3657524Z #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3657897Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3658302Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3658692Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3659041Z #59 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3659357Z #60 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3659655Z #61 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3659926Z #62 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T12:15:05.3660198Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3660600Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3661006Z #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3661394Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3661798Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3662178Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3662437Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3662811Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3663263Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3663635Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3664053Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3664456Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3664715Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3665097Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3665502Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3665887Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3666293Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3666663Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3667026Z #81 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3667331Z #82 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3667627Z #83 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3667941Z #84 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3668348Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3668763Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3669027Z #87 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3669397Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3669818Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3670191Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3670613Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3671160Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3671515Z #93 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3671838Z #94 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3672135Z #95 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3672453Z #96 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3672939Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3673313Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3673736Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3674119Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3674532Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3674925Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3675195Z #103 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3675639Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3676057Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3676436Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3682613Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3683058Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3683423Z #109 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3683751Z #110 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3684063Z #111 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3684388Z #112 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3684803Z #113 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3685187Z #114 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3685616Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3686001Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3686428Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3686804Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3687343Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3687740Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3688156Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3688552Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3688845Z #123 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T12:15:05.3689158Z #124 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T12:15:05.3689967Z #125 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T12:15:05.3690259Z #126 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T12:15:05.3690615Z #127 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T12:15:05.3690960Z #128 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T12:15:05.3691253Z #129 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T12:15:05.3691603Z #130 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T12:15:05.3691872Z #131 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T12:15:05.3692072Z #132 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T12:15:05.3692195Z #133 _start from ??:0
2025-12-04T12:15:05.3692322Z #134 <unwind unsupported> from ??:0
2025-12-04T12:15:05.3692329Z 
2025-12-04T12:15:05.3692334Z 
2025-12-04T12:15:05.3692569Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3693111Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T12:15:05.3693121Z 
2025-12-04T12:15:05.3693391Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3693671Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3693790Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:15:05.3693911Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.3694273Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3694498Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3694649Z graph_break []
2025-12-04T12:15:05.3694825Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.3695048Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.3696278Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T12:15:05.3696480Z   if out == self.unknown_value:
2025-12-04T12:15:05.3697223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:15:05.3697334Z   warnings.warn(
2025-12-04T12:15:05.3697557Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3697684Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:15:05.3697802Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.3698028Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3698382Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3698480Z graph_break []
2025-12-04T12:15:05.3698656Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.3698948Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.3699677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:15:05.3699797Z   warnings.warn(
2025-12-04T12:15:05.3699943Z =================================== FAILURES ===================================
2025-12-04T12:15:05.3700226Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________
2025-12-04T12:15:05.3700363Z Traceback (most recent call last):
2025-12-04T12:15:05.3700758Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T12:15:05.3700915Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T12:15:05.3701403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T12:15:05.3701519Z     return fn(*args, **kwargs)
2025-12-04T12:15:05.3701923Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 113, in fp8_matmul_unwrapped
2025-12-04T12:15:05.3702038Z     output = torch._scaled_mm(
2025-12-04T12:15:05.3702499Z RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+
2025-12-04T12:15:05.3703202Z Exception raised from _scaled_mm_out_cuda at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/ScaledBlas.cpp:492 (most recent call first):
2025-12-04T12:15:05.3703317Z C++ CapturedTraceback:
2025-12-04T12:15:05.3704656Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T12:15:05.3705142Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T12:15:05.3705513Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T12:15:05.3706398Z #7 at::native::_scaled_mm_out_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool, at::Tensor&) from ??:0
2025-12-04T12:15:05.3707192Z #8 at::native::_scaled_mm_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from ??:0
2025-12-04T12:15:05.3708578Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from RegisterCUDA_0.cpp:0
2025-12-04T12:15:05.3711827Z #10 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterCUDA_0.cpp:0
2025-12-04T12:15:05.3712741Z #11 torch::autograd::autogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from autograd_not_implemented_fallback.cpp:0
2025-12-04T12:15:05.3713577Z #12 at::_ops::_scaled_mm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from ??:0
2025-12-04T12:15:05.3713994Z #13 torch::autograd::THPVariable__scaled_mm(_object*, _object*, _object*) from python_torch_functions_2.cpp:0
2025-12-04T12:15:05.3714319Z #14 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T12:15:05.3714636Z #15 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3715047Z #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3715167Z #17 dynamo__custom_eval_frame from :0
2025-12-04T12:15:05.3715553Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3715818Z #19 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3716204Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3716612Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3717018Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3717328Z #23 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T12:15:05.3717590Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3717962Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3718233Z #26 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3718600Z #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3718870Z #28 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3719276Z #29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3719534Z #30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3719918Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3720327Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3720745Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3721152Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3721518Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3721938Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3722310Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3722724Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3723094Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3723503Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3723890Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3724184Z #42 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T12:15:05.3724438Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3724849Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3725200Z #45 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3725515Z #46 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3725812Z #47 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3726109Z #48 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3726529Z #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3726903Z #50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3727318Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3727687Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3727949Z #53 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3728332Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3728766Z #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3729145Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3729552Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3729920Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3730282Z #59 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3730587Z #60 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3730881Z #61 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3731192Z #62 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T12:15:05.3731450Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3731831Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3732236Z #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3732636Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3733049Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3733420Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3733691Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3734063Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3734464Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3734846Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3735250Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3735629Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3735886Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3736253Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3736745Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3737154Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3737558Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3737938Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3738287Z #81 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3738603Z #82 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3738895Z #83 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3739196Z #84 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3739610Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3739984Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3740254Z #87 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3740629Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3741071Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3741454Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3741860Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3742240Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3742588Z #93 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3742892Z #94 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3743230Z #95 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3743532Z #96 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3743944Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3744323Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3744774Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3745167Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3745581Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3745963Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3746243Z #103 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3746621Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3747047Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3747422Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3747836Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3748231Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3748585Z #109 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3748910Z #110 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3749243Z #111 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3749551Z #112 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3749975Z #113 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3750355Z #114 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3750771Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3751159Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3751572Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3751961Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3752374Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3752750Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3753203Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3753586Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3753894Z #123 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T12:15:05.3754202Z #124 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T12:15:05.3754472Z #125 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T12:15:05.3754770Z #126 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T12:15:05.3755125Z #127 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T12:15:05.3755481Z #128 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T12:15:05.3755789Z #129 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T12:15:05.3756072Z #130 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T12:15:05.3756355Z #131 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T12:15:05.3756583Z #132 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T12:15:05.3756692Z #133 _start from ??:0
2025-12-04T12:15:05.3756830Z #134 <unwind unsupported> from ??:0
2025-12-04T12:15:05.3756836Z 
2025-12-04T12:15:05.3756842Z 
2025-12-04T12:15:05.3757062Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3757619Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T12:15:05.3757627Z 
2025-12-04T12:15:05.3757903Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3758126Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3758257Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:15:05.3758381Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.3758724Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3758962Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3759063Z graph_break []
2025-12-04T12:15:05.3759255Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.3759478Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.3760694Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T12:15:05.3760863Z   if out == self.unknown_value:
2025-12-04T12:15:05.3761596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:15:05.3761714Z   warnings.warn(
2025-12-04T12:15:05.3761935Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3762054Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:15:05.3762185Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.3762410Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3762750Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3762866Z graph_break []
2025-12-04T12:15:05.3763046Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.3763279Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.3764012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:15:05.3764116Z   warnings.warn(
2025-12-04T12:15:05.3764383Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3764497Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:15:05.3764615Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.3764856Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3765194Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3765305Z graph_break []
2025-12-04T12:15:05.3765482Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.3765699Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.3766472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:15:05.3766579Z   warnings.warn(
2025-12-04T12:15:05.3767234Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ad7a38726bbc8b50.xml -
2025-12-04T12:15:05.3767418Z =========================== short test summary info ============================
2025-12-04T12:15:05.3768392Z FAILED [0.4588s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 - RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+
2025-12-04T12:15:05.3769059Z Exception raised from _scaled_mm_out_cuda at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/ScaledBlas.cpp:492 (most recent call first):
2025-12-04T12:15:05.3769175Z C++ CapturedTraceback:
2025-12-04T12:15:05.3770494Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T12:15:05.3771199Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T12:15:05.3771536Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T12:15:05.3772421Z #7 at::native::_scaled_mm_out_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool, at::Tensor&) from ??:0
2025-12-04T12:15:05.3773216Z #8 at::native::_scaled_mm_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from ??:0
2025-12-04T12:15:05.3774429Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from RegisterCUDA_0.cpp:0
2025-12-04T12:15:05.3777724Z #10 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterCUDA_0.cpp:0
2025-12-04T12:15:05.3778697Z #11 torch::autograd::autogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from autograd_not_implemented_fallback.cpp:0
2025-12-04T12:15:05.3779486Z #12 at::_ops::_scaled_mm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from ??:0
2025-12-04T12:15:05.3779899Z #13 torch::autograd::THPVariable__scaled_mm(_object*, _object*, _object*) from python_torch_functions_2.cpp:0
2025-12-04T12:15:05.3780224Z #14 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T12:15:05.3780533Z #15 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3781003Z #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3781127Z #17 dynamo__custom_eval_frame from :0
2025-12-04T12:15:05.3781518Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3781780Z #19 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3782195Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3782615Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3782988Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3783296Z #23 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T12:15:05.3783557Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3783928Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3784204Z #26 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3784574Z #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3784833Z #28 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3785216Z #29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3785476Z #30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3785858Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3786264Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3786679Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3787094Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3787465Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3787880Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3788249Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3788653Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3789031Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3789436Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3789819Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3790111Z #42 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T12:15:05.3790402Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3790784Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3791136Z #45 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3791439Z #46 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3791744Z #47 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3792047Z #48 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3792470Z #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3792891Z #50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3793296Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3793675Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3793932Z #53 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3794342Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3794749Z #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3795118Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3795538Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3795909Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3796269Z #59 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3796577Z #60 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3796870Z #61 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3797158Z #62 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T12:15:05.3797417Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3797788Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3798205Z #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3798603Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3799023Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3799393Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3799650Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3800035Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3800442Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3800823Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3801225Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3801594Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3801865Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3802233Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3802685Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3803058Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3803462Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3803848Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3804196Z #81 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3804502Z #82 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3804839Z #83 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3805139Z #84 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3805563Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3805935Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3806223Z #87 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3806606Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3807005Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3807382Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3807790Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3808156Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3808519Z #93 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3808823Z #94 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3809130Z #95 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3809432Z #96 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3809833Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3810212Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3810654Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3811040Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3811475Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3811850Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3812134Z #103 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3812511Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3812921Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3813314Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3813727Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3814120Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3814536Z #109 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3814849Z #110 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3815165Z #111 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3815473Z #112 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3815884Z #113 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3816276Z #114 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3816767Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3817200Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3817613Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3817994Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3818419Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3818829Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3819249Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3819625Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3819918Z #123 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T12:15:05.3820243Z #124 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T12:15:05.3820512Z #125 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T12:15:05.3820815Z #126 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T12:15:05.3821166Z #127 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T12:15:05.3821493Z #128 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T12:15:05.3821798Z #129 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T12:15:05.3822070Z #130 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T12:15:05.3822339Z #131 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T12:15:05.3822589Z #132 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T12:15:05.3822693Z #133 _start from ??:0
2025-12-04T12:15:05.3822831Z #134 <unwind unsupported> from ??:0
2025-12-04T12:15:05.3822838Z 
2025-12-04T12:15:05.3822843Z 
2025-12-04T12:15:05.3823077Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3823620Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T12:15:05.3823625Z 
2025-12-04T12:15:05.3823912Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3824095Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.3824315Z ================= 1 failed, 187 deselected, 2 rerun in 20.33s ==================
2025-12-04T12:15:05.3824418Z Got exit code 1
2025-12-04T12:15:05.3824530Z Retrying single test...
2025-12-04T12:15:05.3825016Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b434424093647de3.xml
2025-12-04T12:15:05.3825186Z ============================= test session starts ==============================
2025-12-04T12:15:05.3825539Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.3825665Z cachedir: .pytest_cache
2025-12-04T12:15:05.3826223Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.3826370Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.3826483Z configfile: pytest.ini
2025-12-04T12:15:05.3827079Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.3827323Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.3827946Z stepcurrent: skipping 21 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T12:15:05.3828084Z Running 1 items in this shard
2025-12-04T12:15:05.3828121Z 
2025-12-04T12:15:05.3829055Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 11:53:57.520878802 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3829062Z 
2025-12-04T12:15:05.3829581Z [W1204 11:54:12.862438374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3829634Z 
2025-12-04T12:15:05.3830147Z [W1204 11:54:12.862694584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3830153Z 
2025-12-04T12:15:05.3830664Z [W1204 11:54:12.865804269 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3830673Z 
2025-12-04T12:15:05.3831207Z [W1204 11:54:12.866009267 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3831212Z 
2025-12-04T12:15:05.3831724Z [W1204 11:54:12.867991971 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3831728Z 
2025-12-04T12:15:05.3832254Z [W1204 11:54:12.868359051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3832261Z 
2025-12-04T12:15:05.3832773Z [W1204 11:54:12.868545447 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3832778Z 
2025-12-04T12:15:05.3833300Z [W1204 11:54:12.869083168 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3833336Z 
2025-12-04T12:15:05.3833852Z [W1204 11:54:12.869269987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3833857Z 
2025-12-04T12:15:05.3834368Z [W1204 11:54:12.869830376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3834389Z 
2025-12-04T12:15:05.3834900Z [W1204 11:54:12.870048273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3834908Z 
2025-12-04T12:15:05.3835414Z [W1204 11:54:12.870512249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3835419Z 
2025-12-04T12:15:05.3835942Z [W1204 11:54:12.870690203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3835948Z 
2025-12-04T12:15:05.3836459Z [W1204 11:54:12.871068054 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3836466Z 
2025-12-04T12:15:05.3836994Z [W1204 11:54:12.871243652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3836999Z 
2025-12-04T12:15:05.3837558Z [W1204 11:54:12.871612440 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3837566Z 
2025-12-04T12:15:05.3838089Z [W1204 11:54:12.871788476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3838093Z 
2025-12-04T12:15:05.3838563Z W1204 11:54:12.829000 115512 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T12:15:05.3839089Z [W1204 11:54:12.238848376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3839096Z 
2025-12-04T12:15:05.3839345Z ('RERUN', {'yellow': True}) [18.8552s] [100%]
2025-12-04T12:15:05.3840274Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 11:54:13.869221007 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3840280Z 
2025-12-04T12:15:05.3840810Z [W1204 11:54:13.869639665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3840846Z 
2025-12-04T12:15:05.3841358Z [W1204 11:54:13.869828176 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3841363Z 
2025-12-04T12:15:05.3841890Z [W1204 11:54:13.870436268 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3841897Z 
2025-12-04T12:15:05.3842412Z [W1204 11:54:13.870634727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3842418Z 
2025-12-04T12:15:05.3842947Z [W1204 11:54:13.871003669 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3842952Z 
2025-12-04T12:15:05.3843461Z [W1204 11:54:13.871296948 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3843468Z 
2025-12-04T12:15:05.3843986Z [W1204 11:54:13.871463691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3843991Z 
2025-12-04T12:15:05.3844496Z [W1204 11:54:13.871926564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3844531Z 
2025-12-04T12:15:05.3845039Z [W1204 11:54:13.872107125 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3845060Z 
2025-12-04T12:15:05.3845568Z [W1204 11:54:13.872561805 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3845575Z 
2025-12-04T12:15:05.3846082Z [W1204 11:54:13.872741038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3846089Z 
2025-12-04T12:15:05.3846608Z [W1204 11:54:13.873125893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3846613Z 
2025-12-04T12:15:05.3847120Z [W1204 11:54:13.873303238 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3847125Z 
2025-12-04T12:15:05.3847644Z [W1204 11:54:13.873661680 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3847649Z 
2025-12-04T12:15:05.3848156Z [W1204 11:54:13.873837544 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3848161Z 
2025-12-04T12:15:05.3848708Z [W1204 11:54:13.874199063 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3848714Z 
2025-12-04T12:15:05.3849226Z [W1204 11:54:13.874373983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3849230Z 
2025-12-04T12:15:05.3849736Z [W1204 11:54:13.966294254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3849753Z 
2025-12-04T12:15:05.3849884Z ('RERUN', {'yellow': True}) [0.4767s] [100%]
2025-12-04T12:15:05.3850850Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 11:54:13.320873368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3850856Z 
2025-12-04T12:15:05.3851382Z [W1204 11:54:13.321318706 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3851387Z 
2025-12-04T12:15:05.3851893Z [W1204 11:54:13.321503287 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3851926Z 
2025-12-04T12:15:05.3852450Z [W1204 11:54:13.322088136 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3852454Z 
2025-12-04T12:15:05.3852962Z [W1204 11:54:13.322280672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3852970Z 
2025-12-04T12:15:05.3853493Z [W1204 11:54:13.322644089 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3853499Z 
2025-12-04T12:15:05.3854008Z [W1204 11:54:13.322935634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3854013Z 
2025-12-04T12:15:05.3854533Z [W1204 11:54:13.323104183 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3854541Z 
2025-12-04T12:15:05.3855048Z [W1204 11:54:13.323578800 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3855053Z 
2025-12-04T12:15:05.3855563Z [W1204 11:54:13.323759318 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3855598Z 
2025-12-04T12:15:05.3856121Z [W1204 11:54:13.324205490 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3856125Z 
2025-12-04T12:15:05.3856722Z [W1204 11:54:13.324385347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3856727Z 
2025-12-04T12:15:05.3857253Z [W1204 11:54:13.324786851 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3857260Z 
2025-12-04T12:15:05.3857773Z [W1204 11:54:13.324966001 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3857777Z 
2025-12-04T12:15:05.3858298Z [W1204 11:54:13.325325441 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3858305Z 
2025-12-04T12:15:05.3858816Z [W1204 11:54:13.325502454 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3858821Z 
2025-12-04T12:15:05.3859341Z [W1204 11:54:13.325862894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3859385Z 
2025-12-04T12:15:05.3859898Z [W1204 11:54:13.326039400 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3859905Z 
2025-12-04T12:15:05.3860413Z [W1204 11:54:14.423402432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.3860431Z 
2025-12-04T12:15:05.3860535Z FAILED [0.4551s] [100%]
2025-12-04T12:15:05.3860540Z 
2025-12-04T12:15:05.3860687Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.3860992Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________
2025-12-04T12:15:05.3861119Z Traceback (most recent call last):
2025-12-04T12:15:05.3861552Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T12:15:05.3861714Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T12:15:05.3862213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T12:15:05.3862340Z     return fn(*args, **kwargs)
2025-12-04T12:15:05.3862767Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 113, in fp8_matmul_unwrapped
2025-12-04T12:15:05.3862882Z     output = torch._scaled_mm(
2025-12-04T12:15:05.3863361Z RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+
2025-12-04T12:15:05.3864010Z Exception raised from _scaled_mm_out_cuda at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/ScaledBlas.cpp:492 (most recent call first):
2025-12-04T12:15:05.3864126Z C++ CapturedTraceback:
2025-12-04T12:15:05.3865452Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T12:15:05.3865936Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T12:15:05.3866290Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T12:15:05.3867164Z #7 at::native::_scaled_mm_out_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool, at::Tensor&) from ??:0
2025-12-04T12:15:05.3868010Z #8 at::native::_scaled_mm_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from ??:0
2025-12-04T12:15:05.3869125Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from RegisterCUDA_0.cpp:0
2025-12-04T12:15:05.3872573Z #10 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterCUDA_0.cpp:0
2025-12-04T12:15:05.3873553Z #11 torch::autograd::autogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from autograd_not_implemented_fallback.cpp:0
2025-12-04T12:15:05.3874367Z #12 at::_ops::_scaled_mm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from ??:0
2025-12-04T12:15:05.3874775Z #13 torch::autograd::THPVariable__scaled_mm(_object*, _object*, _object*) from python_torch_functions_2.cpp:0
2025-12-04T12:15:05.3875122Z #14 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T12:15:05.3875473Z #15 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3876360Z #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3876505Z #17 dynamo__custom_eval_frame from :0
2025-12-04T12:15:05.3876885Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3877149Z #19 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3877594Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3878000Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3878388Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3878687Z #23 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T12:15:05.3878953Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3879335Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3879597Z #26 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3879985Z #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3880247Z #28 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3880619Z #29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3880896Z #30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3881266Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3881745Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3882117Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3882525Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3882907Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3883312Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3883685Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3884108Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3884476Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3884898Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3885269Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3885568Z #42 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T12:15:05.3885876Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3886255Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3886619Z #45 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3886924Z #46 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3887217Z #47 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3887537Z #48 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3887976Z #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3888425Z #50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3888908Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3889297Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3889624Z #53 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3889994Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3890399Z #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3890783Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3891194Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3891575Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3891927Z #59 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3892231Z #60 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3892545Z #61 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3892814Z #62 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T12:15:05.3893085Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3893458Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3893905Z #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3894290Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3894699Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3895073Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3895350Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3895722Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3896139Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3896587Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3896996Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3897381Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3897640Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3898076Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3898483Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3898857Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3899276Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3899645Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3900008Z #81 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3900350Z #82 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3900646Z #83 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3900962Z #84 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3901371Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3901774Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3902050Z #87 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3902422Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3902840Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3903213Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3903620Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3904006Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3904357Z #93 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3904675Z #94 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3904972Z #95 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3905278Z #96 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3905698Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3906120Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3906542Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3906928Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3907347Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3907745Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3908018Z #103 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3908399Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3908832Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3909216Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3909651Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3910031Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3910428Z #109 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3910756Z #110 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3911062Z #111 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3911386Z #112 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3911808Z #113 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3912190Z #114 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3912657Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3913038Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3913472Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3913854Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3914310Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3914705Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3915120Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3915501Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3915816Z #123 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T12:15:05.3916129Z #124 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T12:15:05.3916414Z #125 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T12:15:05.3916705Z #126 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T12:15:05.3917057Z #127 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T12:15:05.3917409Z #128 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T12:15:05.3917706Z #129 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T12:15:05.3917999Z #130 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T12:15:05.3918310Z #131 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T12:15:05.3918515Z #132 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T12:15:05.3918637Z #133 _start from ??:0
2025-12-04T12:15:05.3918764Z #134 <unwind unsupported> from ??:0
2025-12-04T12:15:05.3918770Z 
2025-12-04T12:15:05.3918776Z 
2025-12-04T12:15:05.3919003Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3919562Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T12:15:05.3919570Z 
2025-12-04T12:15:05.3919842Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3920083Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3920201Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:15:05.3920318Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.3920678Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3920901Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3921015Z graph_break []
2025-12-04T12:15:05.3921195Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.3921453Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.3922684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T12:15:05.3922806Z   if out == self.unknown_value:
2025-12-04T12:15:05.3923534Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:15:05.3923650Z   warnings.warn(
2025-12-04T12:15:05.3924280Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________
2025-12-04T12:15:05.3924423Z Traceback (most recent call last):
2025-12-04T12:15:05.3924882Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T12:15:05.3925033Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T12:15:05.3925539Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T12:15:05.3925655Z     return fn(*args, **kwargs)
2025-12-04T12:15:05.3926103Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 113, in fp8_matmul_unwrapped
2025-12-04T12:15:05.3926235Z     output = torch._scaled_mm(
2025-12-04T12:15:05.3926692Z RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+
2025-12-04T12:15:05.3927372Z Exception raised from _scaled_mm_out_cuda at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/ScaledBlas.cpp:492 (most recent call first):
2025-12-04T12:15:05.3927491Z C++ CapturedTraceback:
2025-12-04T12:15:05.3928812Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T12:15:05.3929314Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T12:15:05.3929655Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T12:15:05.3930544Z #7 at::native::_scaled_mm_out_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool, at::Tensor&) from ??:0
2025-12-04T12:15:05.3931385Z #8 at::native::_scaled_mm_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from ??:0
2025-12-04T12:15:05.3932502Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from RegisterCUDA_0.cpp:0
2025-12-04T12:15:05.3935734Z #10 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterCUDA_0.cpp:0
2025-12-04T12:15:05.3936772Z #11 torch::autograd::autogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from autograd_not_implemented_fallback.cpp:0
2025-12-04T12:15:05.3937565Z #12 at::_ops::_scaled_mm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from ??:0
2025-12-04T12:15:05.3937984Z #13 torch::autograd::THPVariable__scaled_mm(_object*, _object*, _object*) from python_torch_functions_2.cpp:0
2025-12-04T12:15:05.3938311Z #14 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T12:15:05.3938651Z #15 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3939075Z #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3939203Z #17 dynamo__custom_eval_frame from :0
2025-12-04T12:15:05.3939594Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3939858Z #19 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3940267Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3940690Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3941065Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3941374Z #23 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T12:15:05.3941641Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3942015Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3942287Z #26 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3942661Z #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3942922Z #28 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3943309Z #29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3943570Z #30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3943951Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3944398Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3944773Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3945197Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3945567Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3945987Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3946358Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3946763Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3947149Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3947557Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3947945Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3948278Z #42 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T12:15:05.3948537Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3948918Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3949271Z #45 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3949576Z #46 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3949884Z #47 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3950188Z #48 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3950642Z #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3951013Z #50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3951421Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3951805Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3952096Z #53 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3952480Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3952884Z #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3953257Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3953678Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3954051Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3954417Z #59 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3954721Z #60 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3955019Z #61 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3955299Z #62 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T12:15:05.3955557Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3955927Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3956382Z #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3956753Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3957168Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3957541Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3957800Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3958184Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3958586Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3958963Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3959371Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3959741Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3960016Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3960420Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3960840Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3961212Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3961615Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3961997Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3962346Z #81 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3962683Z #82 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3962990Z #83 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3963296Z #84 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3963713Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3964119Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3964381Z #87 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3964766Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3965171Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3965556Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3965968Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3966340Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3966706Z #93 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3967016Z #94 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3967313Z #95 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3967631Z #96 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3968038Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3968457Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3968863Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3969247Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3969678Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3970059Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3970341Z #103 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.3970723Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3971357Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3971756Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3972172Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3972565Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3973030Z #109 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.3973345Z #110 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.3973664Z #111 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.3973973Z #112 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.3974390Z #113 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.3974786Z #114 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3975273Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3975669Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3976086Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3976547Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3977065Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3977449Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3977877Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.3978260Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.3978558Z #123 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T12:15:05.3978886Z #124 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T12:15:05.3979161Z #125 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T12:15:05.3979463Z #126 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T12:15:05.3979817Z #127 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T12:15:05.3980149Z #128 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T12:15:05.3980454Z #129 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T12:15:05.3980729Z #130 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T12:15:05.3981047Z #131 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T12:15:05.3981263Z #132 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T12:15:05.3981368Z #133 _start from ??:0
2025-12-04T12:15:05.3981511Z #134 <unwind unsupported> from ??:0
2025-12-04T12:15:05.3981518Z 
2025-12-04T12:15:05.3981523Z 
2025-12-04T12:15:05.3981749Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.3982296Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T12:15:05.3982304Z 
2025-12-04T12:15:05.3982593Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.3982822Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3982954Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:15:05.3983078Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.3983425Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3983667Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3983771Z graph_break []
2025-12-04T12:15:05.3983951Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.3984222Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.3985441Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T12:15:05.3985631Z   if out == self.unknown_value:
2025-12-04T12:15:05.3986360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:15:05.3986466Z   warnings.warn(
2025-12-04T12:15:05.3986703Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.3986820Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:15:05.3986969Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.3987206Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.3987549Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.3987663Z graph_break []
2025-12-04T12:15:05.3987844Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.3988100Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.3988852Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:15:05.3988963Z   warnings.warn(
2025-12-04T12:15:05.3989112Z =================================== FAILURES ===================================
2025-12-04T12:15:05.3989415Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________
2025-12-04T12:15:05.3989541Z Traceback (most recent call last):
2025-12-04T12:15:05.3989951Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T12:15:05.3990097Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T12:15:05.3990593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T12:15:05.3990725Z     return fn(*args, **kwargs)
2025-12-04T12:15:05.3991124Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 113, in fp8_matmul_unwrapped
2025-12-04T12:15:05.3991249Z     output = torch._scaled_mm(
2025-12-04T12:15:05.3991707Z RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+
2025-12-04T12:15:05.3992360Z Exception raised from _scaled_mm_out_cuda at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/ScaledBlas.cpp:492 (most recent call first):
2025-12-04T12:15:05.3992523Z C++ CapturedTraceback:
2025-12-04T12:15:05.3993847Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T12:15:05.3994343Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T12:15:05.3994681Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T12:15:05.3995551Z #7 at::native::_scaled_mm_out_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool, at::Tensor&) from ??:0
2025-12-04T12:15:05.3996366Z #8 at::native::_scaled_mm_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from ??:0
2025-12-04T12:15:05.3997506Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from RegisterCUDA_0.cpp:0
2025-12-04T12:15:05.4000789Z #10 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterCUDA_0.cpp:0
2025-12-04T12:15:05.4001691Z #11 torch::autograd::autogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from autograd_not_implemented_fallback.cpp:0
2025-12-04T12:15:05.4002518Z #12 at::_ops::_scaled_mm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from ??:0
2025-12-04T12:15:05.4002920Z #13 torch::autograd::THPVariable__scaled_mm(_object*, _object*, _object*) from python_torch_functions_2.cpp:0
2025-12-04T12:15:05.4003260Z #14 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T12:15:05.4003571Z #15 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.4003997Z #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.4004125Z #17 dynamo__custom_eval_frame from :0
2025-12-04T12:15:05.4004503Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4004782Z #19 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4005155Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4005560Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4005944Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4006274Z #23 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T12:15:05.4006552Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4006924Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4007186Z #26 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4007571Z #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4007832Z #28 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4008217Z #29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4008476Z #30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4008845Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4009267Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4009636Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4010093Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4010478Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4010888Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4011268Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4011674Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4012043Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4012496Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4012865Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4013174Z #42 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T12:15:05.4013436Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4013806Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4014205Z #45 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.4014510Z #46 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.4014825Z #47 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.4015131Z #48 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.4015543Z #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.4015928Z #50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4016417Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4016790Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4017072Z #53 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4017443Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4017863Z #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4018276Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4018685Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4019069Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4019421Z #59 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.4019742Z #60 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.4020040Z #61 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.4020314Z #62 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T12:15:05.4020588Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4020960Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4021367Z #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4021751Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4022216Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4022597Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4022859Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4023230Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4023652Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4024021Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4024437Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4024843Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4025105Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4025492Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4025897Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4026315Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4026720Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4027089Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4027454Z #81 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.4027761Z #82 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.4028055Z #83 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.4028375Z #84 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.4028779Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.4029167Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4029424Z #87 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4029794Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4030212Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4030618Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4031032Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4031404Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4031753Z #93 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.4032069Z #94 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.4032363Z #95 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.4032678Z #96 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.4033081Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.4033454Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4033874Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4034366Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4034780Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4035171Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4035439Z #103 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4035831Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4036248Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4036630Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4037092Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4037475Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4037850Z #109 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.4038165Z #110 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.4038497Z #111 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.4038818Z #112 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.4039234Z #113 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.4039626Z #114 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4040046Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4040425Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4040853Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4041230Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4041645Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4042033Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4042442Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4043362Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4043659Z #123 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T12:15:05.4043967Z #124 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T12:15:05.4044256Z #125 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T12:15:05.4044542Z #126 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T12:15:05.4044910Z #127 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T12:15:05.4045239Z #128 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T12:15:05.4045532Z #129 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T12:15:05.4045820Z #130 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T12:15:05.4046090Z #131 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T12:15:05.4046291Z #132 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T12:15:05.4046408Z #133 _start from ??:0
2025-12-04T12:15:05.4046535Z #134 <unwind unsupported> from ??:0
2025-12-04T12:15:05.4046541Z 
2025-12-04T12:15:05.4046546Z 
2025-12-04T12:15:05.4046814Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4047354Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T12:15:05.4047362Z 
2025-12-04T12:15:05.4047634Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4047873Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4047989Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:15:05.4048126Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4048477Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.4048732Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4048851Z graph_break []
2025-12-04T12:15:05.4049031Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4049259Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.4050487Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T12:15:05.4050643Z   if out == self.unknown_value:
2025-12-04T12:15:05.4051387Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:15:05.4051501Z   warnings.warn(
2025-12-04T12:15:05.4051722Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4051856Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:15:05.4051977Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4052202Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4052560Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.4052663Z graph_break []
2025-12-04T12:15:05.4052858Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4053082Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.4053815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:15:05.4053938Z   warnings.warn(
2025-12-04T12:15:05.4054604Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4054783Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:15:05.4054903Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4055129Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4055484Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.4055589Z graph_break []
2025-12-04T12:15:05.4055768Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4055998Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.4056821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:15:05.4056946Z   warnings.warn(
2025-12-04T12:15:05.4057600Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b434424093647de3.xml -
2025-12-04T12:15:05.4057778Z =========================== short test summary info ============================
2025-12-04T12:15:05.4058751Z FAILED [0.4551s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 - RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+
2025-12-04T12:15:05.4059450Z Exception raised from _scaled_mm_out_cuda at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/ScaledBlas.cpp:492 (most recent call first):
2025-12-04T12:15:05.4059584Z C++ CapturedTraceback:
2025-12-04T12:15:05.4060886Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T12:15:05.4061370Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T12:15:05.4061753Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T12:15:05.4062636Z #7 at::native::_scaled_mm_out_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool, at::Tensor&) from ??:0
2025-12-04T12:15:05.4063447Z #8 at::native::_scaled_mm_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from ??:0
2025-12-04T12:15:05.4064580Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from RegisterCUDA_0.cpp:0
2025-12-04T12:15:05.4067844Z #10 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterCUDA_0.cpp:0
2025-12-04T12:15:05.4068750Z #11 torch::autograd::autogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from autograd_not_implemented_fallback.cpp:0
2025-12-04T12:15:05.4069602Z #12 at::_ops::_scaled_mm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::ScalarType>, bool) from ??:0
2025-12-04T12:15:05.4070006Z #13 torch::autograd::THPVariable__scaled_mm(_object*, _object*, _object*) from python_torch_functions_2.cpp:0
2025-12-04T12:15:05.4070344Z #14 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T12:15:05.4070652Z #15 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.4071256Z #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.4071399Z #17 dynamo__custom_eval_frame from :0
2025-12-04T12:15:05.4071775Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4072043Z #19 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4072432Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4072842Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4073519Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4073826Z #23 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T12:15:05.4074096Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4074480Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4074739Z #26 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4075126Z #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4075384Z #28 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4075813Z #29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4076085Z #30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4076460Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4076882Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4077297Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4077703Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4078086Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4078495Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4078866Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4079286Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4079658Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4080080Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4080452Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4080747Z #42 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T12:15:05.4081020Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4081453Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4081817Z #45 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.4082122Z #46 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.4082423Z #47 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.4082742Z #48 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.4083152Z #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.4083534Z #50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4083941Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4084312Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4084589Z #53 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4084962Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4085400Z #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4085787Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4086194Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4086576Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4086927Z #59 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.4087231Z #60 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.4087541Z #61 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.4087842Z #62 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T12:15:05.4088118Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4088491Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4088897Z #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4089311Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4089715Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4090086Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4090360Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4090730Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4091147Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4091519Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4091923Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4092307Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4092568Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4092950Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4093354Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4093760Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4094177Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4094548Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4094908Z #81 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.4095213Z #82 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.4095507Z #83 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.4095821Z #84 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.4096229Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.4096687Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4096967Z #87 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4097340Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4097798Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4098171Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4098582Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4098967Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4099315Z #93 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.4099632Z #94 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.4099957Z #95 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.4100259Z #96 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.4100684Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.4101054Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4101508Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4101891Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4102306Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4102697Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4102967Z #103 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T12:15:05.4103344Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4103770Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4104145Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4104570Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4104947Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4105304Z #109 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T12:15:05.4105662Z #110 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T12:15:05.4105965Z #111 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T12:15:05.4106283Z #112 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T12:15:05.4106700Z #113 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T12:15:05.4107078Z #114 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4107506Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4107885Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4108310Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4108688Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4109102Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4109498Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4109941Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T12:15:05.4110322Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T12:15:05.4110630Z #123 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T12:15:05.4110940Z #124 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T12:15:05.4111224Z #125 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T12:15:05.4111512Z #126 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T12:15:05.4111867Z #127 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T12:15:05.4112255Z #128 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T12:15:05.4112550Z #129 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T12:15:05.4112838Z #130 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T12:15:05.4113110Z #131 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T12:15:05.4113357Z #132 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T12:15:05.4113476Z #133 _start from ??:0
2025-12-04T12:15:05.4113600Z #134 <unwind unsupported> from ??:0
2025-12-04T12:15:05.4113606Z 
2025-12-04T12:15:05.4113611Z 
2025-12-04T12:15:05.4113830Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4114386Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T12:15:05.4114394Z 
2025-12-04T12:15:05.4114665Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4114858Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.4115066Z ================= 1 failed, 187 deselected, 2 rerun in 19.83s ==================
2025-12-04T12:15:05.4115167Z Got exit code 1
2025-12-04T12:15:05.4115641Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T12:15:05.4116054Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:05.4116536Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ccd966f4e119e833.xml
2025-12-04T12:15:05.4116701Z ============================= test session starts ==============================
2025-12-04T12:15:05.4117086Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.4117212Z cachedir: .pytest_cache
2025-12-04T12:15:05.4117753Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.4117892Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.4118007Z configfile: pytest.ini
2025-12-04T12:15:05.4118603Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.4118847Z collecting ... collected 188 items / 22 deselected / 166 selected
2025-12-04T12:15:05.4118995Z stepcurrent: skipping 22 already run items.
2025-12-04T12:15:05.4119112Z Running 166 items in this shard
2025-12-04T12:15:05.4119118Z 
2025-12-04T12:15:05.4120004Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 W1204 11:54:30.500000 115688 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T12:15:05.4120768Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T12:15:05.4121760Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4122312Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.4122887Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:05.4123382Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.4123846Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.4124336Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x1 = (xindex % ks1)
2025-12-04T12:15:05.4124940Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x2 = triton_helpers.div_floor_integer(xindex,  ks1)
2025-12-04T12:15:05.4125510Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + load_seed_offset)
2025-12-04T12:15:05.4125970Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = x0
2025-12-04T12:15:05.4126531Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.rand(tmp0, (tmp1).to(tl.uint32))
2025-12-04T12:15:05.4127055Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp2.to(tl.float32)
2025-12-04T12:15:05.4127585Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float8e4nv)
2025-12-04T12:15:05.4128293Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x1 + x2*((1) * ((1) >= (ks1)) + (ks1) * ((ks1) > (1)))), tmp4, xmask)
2025-12-04T12:15:05.4128658Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.4130580Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*i64', 'out_ptr1': '*fp8e4nv', 'load_seed_offset': 'constexpr', 'ks1': 'i64', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'load_seed_offset': 1, 'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.4131148Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.4132199Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4132848Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4133740Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4134438Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4135356Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4136143Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4136836Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.4137784Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4138196Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.4139091Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4139242Z ('RERUN', {'yellow': True}) [3.8305s] [  0%]
2025-12-04T12:15:05.4140429Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T12:15:05.4141367Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4141915Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.4142477Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:05.4142989Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.4143425Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.4143912Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x1 = (xindex % ks1)
2025-12-04T12:15:05.4144507Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x2 = triton_helpers.div_floor_integer(xindex,  ks1)
2025-12-04T12:15:05.4145117Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + load_seed_offset)
2025-12-04T12:15:05.4145539Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = x0
2025-12-04T12:15:05.4146099Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.rand(tmp0, (tmp1).to(tl.uint32))
2025-12-04T12:15:05.4146615Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp2.to(tl.float32)
2025-12-04T12:15:05.4147142Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float8e4nv)
2025-12-04T12:15:05.4147822Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x1 + x2*((1) * ((1) >= (ks1)) + (ks1) * ((ks1) > (1)))), tmp4, xmask)
2025-12-04T12:15:05.4148191Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.4150130Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*i64', 'out_ptr1': '*fp8e4nv', 'load_seed_offset': 'constexpr', 'ks1': 'i64', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'load_seed_offset': 1, 'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.4150683Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.4151733Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4152404Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4153298Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4154026Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4154910Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4155697Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4156307Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.4157242Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4157629Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.4158519Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4158696Z ('RERUN', {'yellow': True}) [0.5737s] [  0%]
2025-12-04T12:15:05.4159841Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T12:15:05.4160786Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4161331Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.4161892Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:05.4162402Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.4162835Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.4163346Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x1 = (xindex % ks1)
2025-12-04T12:15:05.4163942Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x2 = triton_helpers.div_floor_integer(xindex,  ks1)
2025-12-04T12:15:05.4164502Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + load_seed_offset)
2025-12-04T12:15:05.4164940Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = x0
2025-12-04T12:15:05.4165498Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.rand(tmp0, (tmp1).to(tl.uint32))
2025-12-04T12:15:05.4166065Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp2.to(tl.float32)
2025-12-04T12:15:05.4166590Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float8e4nv)
2025-12-04T12:15:05.4167262Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x1 + x2*((1) * ((1) >= (ks1)) + (ks1) * ((ks1) > (1)))), tmp4, xmask)
2025-12-04T12:15:05.4167671Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.4169569Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*i64', 'out_ptr1': '*fp8e4nv', 'load_seed_offset': 'constexpr', 'ks1': 'i64', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'load_seed_offset': 1, 'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.4170122Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.4171350Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4171995Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4172885Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4173648Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4174533Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4175319Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4175927Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.4176936Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4177324Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.4178268Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4178402Z FAILED [0.5948s] [  0%]
2025-12-04T12:15:05.4178408Z 
2025-12-04T12:15:05.4178558Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.4178842Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T12:15:05.4178986Z Traceback (most recent call last):
2025-12-04T12:15:05.4179387Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T12:15:05.4179552Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T12:15:05.4180093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.4180348Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.4180881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.4181077Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.4181629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.4181793Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.4182327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.4182665Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.4183188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.4183337Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.4183836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.4183961Z     return self._compile_to_module()
2025-12-04T12:15:05.4184463Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.4184628Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.4185145Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.4185290Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.4185827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.4186061Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.4186661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.4186788Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.4187303Z   File "/tmp/tmpjb957dc9/jn/cjnexhgq3vjxjlw7edwbtm6qytyqlitar2t5iw5kgvsxyibeqhnf.py", line 60, in <module>
2025-12-04T12:15:05.4187769Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.4187881Z     kernel.precompile(
2025-12-04T12:15:05.4188449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.4188570Z     self._precompile_worker()
2025-12-04T12:15:05.4189180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.4189361Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.4189992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4190208Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4190662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4190910Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4191364Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4191699Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4191938Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.4192467Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4192560Z ^
2025-12-04T12:15:05.4193033Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4193040Z 
2025-12-04T12:15:05.4193752Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.4193789Z 
2025-12-04T12:15:05.4193794Z 
2025-12-04T12:15:05.4194024Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4194548Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:15:05.4194556Z 
2025-12-04T12:15:05.4194843Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4195074Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4195183Z frames [('total', 1)]
2025-12-04T12:15:05.4195316Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4195851Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.4196074Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4196190Z graph_break []
2025-12-04T12:15:05.4196370Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4196669Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T12:15:05.4196795Z Traceback (most recent call last):
2025-12-04T12:15:05.4197191Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T12:15:05.4197384Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T12:15:05.4197877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.4198128Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.4198661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.4198854Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.4199379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.4199531Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.4200067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.4200408Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.4200933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.4201099Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.4201612Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.4201736Z     return self._compile_to_module()
2025-12-04T12:15:05.4202234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.4202403Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.4202919Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.4203067Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.4203566Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.4203843Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.4204435Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.4204564Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.4205087Z   File "/tmp/tmp0jedbhwp/ay/cay2dvspppsjmoss6vkxbgpgym75gkayiiwjmtjezgn52evwc76g.py", line 60, in <module>
2025-12-04T12:15:05.4205583Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.4205706Z     kernel.precompile(
2025-12-04T12:15:05.4206262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.4206382Z     self._precompile_worker()
2025-12-04T12:15:05.4206992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.4207174Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.4207770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4207986Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4208437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4208696Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4209139Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4209473Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4209746Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.4210248Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4210352Z ^
2025-12-04T12:15:05.4210809Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4210818Z 
2025-12-04T12:15:05.4211531Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.4211539Z 
2025-12-04T12:15:05.4211544Z 
2025-12-04T12:15:05.4211780Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4212306Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:15:05.4212312Z 
2025-12-04T12:15:05.4212596Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4212823Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4212933Z frames [('total', 1)]
2025-12-04T12:15:05.4213069Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4213700Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.4213941Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4214046Z graph_break []
2025-12-04T12:15:05.4214231Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4214470Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4214577Z frames [('total', 1)]
2025-12-04T12:15:05.4214696Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4214934Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4215468Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.4215621Z graph_break []
2025-12-04T12:15:05.4215805Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4215953Z =================================== FAILURES ===================================
2025-12-04T12:15:05.4216254Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T12:15:05.4216487Z Traceback (most recent call last):
2025-12-04T12:15:05.4216930Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T12:15:05.4217095Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T12:15:05.4217586Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.4217850Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.4218370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.4218566Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.4219092Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.4219246Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.4219784Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.4220124Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.4220643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.4220805Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.4221321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.4221447Z     return self._compile_to_module()
2025-12-04T12:15:05.4221953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.4222123Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.4222655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.4222788Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.4223285Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.4223537Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.4224128Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.4224257Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.4224777Z   File "/tmp/tmp317z82xt/3z/c3zhiqe2blftfihjcp7wonscomrqgbh4l4xighgjwf3blmoz6ce3.py", line 60, in <module>
2025-12-04T12:15:05.4225241Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.4225400Z     kernel.precompile(
2025-12-04T12:15:05.4225959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.4226079Z     self._precompile_worker()
2025-12-04T12:15:05.4226687Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.4226865Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.4227936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4228143Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4228637Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4228904Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4229349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4229689Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4229962Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.4230459Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4230568Z ^
2025-12-04T12:15:05.4231029Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4231038Z 
2025-12-04T12:15:05.4231753Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.4231774Z 
2025-12-04T12:15:05.4231778Z 
2025-12-04T12:15:05.4231999Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4232522Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:15:05.4232530Z 
2025-12-04T12:15:05.4232814Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4233036Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4233157Z frames [('total', 1)]
2025-12-04T12:15:05.4233278Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4233811Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.4234168Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4234271Z graph_break []
2025-12-04T12:15:05.4234453Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4234693Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4234800Z frames [('total', 1)]
2025-12-04T12:15:05.4234918Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4235152Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4235688Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.4235802Z graph_break []
2025-12-04T12:15:05.4235979Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4236197Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4236316Z frames [('total', 1)]
2025-12-04T12:15:05.4236434Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4236657Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4237231Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.4237333Z graph_break []
2025-12-04T12:15:05.4237521Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4238179Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ccd966f4e119e833.xml -
2025-12-04T12:15:05.4238353Z =========================== short test summary info ============================
2025-12-04T12:15:05.4239065Z FAILED [0.5948s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.4239596Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4239705Z ^
2025-12-04T12:15:05.4240303Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4240309Z 
2025-12-04T12:15:05.4241023Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.4241066Z 
2025-12-04T12:15:05.4241071Z 
2025-12-04T12:15:05.4241303Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4241829Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:15:05.4241835Z 
2025-12-04T12:15:05.4242117Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4242302Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.4242507Z ================== 1 failed, 22 deselected, 2 rerun in 5.04s ===================
2025-12-04T12:15:05.4242622Z Got exit code 1
2025-12-04T12:15:05.4242730Z Retrying single test...
2025-12-04T12:15:05.4243219Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d16f18ba4de45d90.xml
2025-12-04T12:15:05.4243385Z ============================= test session starts ==============================
2025-12-04T12:15:05.4243740Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.4243866Z cachedir: .pytest_cache
2025-12-04T12:15:05.4244384Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.4244512Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.4244668Z configfile: pytest.ini
2025-12-04T12:15:05.4245263Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.4245499Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.4246106Z stepcurrent: skipping 22 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16
2025-12-04T12:15:05.4246223Z Running 1 items in this shard
2025-12-04T12:15:05.4246228Z 
2025-12-04T12:15:05.4247163Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 11:54:49.329949221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4247169Z 
2025-12-04T12:15:05.4247687Z [W1204 11:55:05.232876621 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4247695Z 
2025-12-04T12:15:05.4248222Z [W1204 11:55:05.233142957 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4248227Z 
2025-12-04T12:15:05.4248739Z [W1204 11:55:05.236255432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4248779Z 
2025-12-04T12:15:05.4249301Z [W1204 11:55:05.236458970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4249309Z 
2025-12-04T12:15:05.4249820Z [W1204 11:55:05.238470603 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4249825Z 
2025-12-04T12:15:05.4250344Z [W1204 11:55:05.238833323 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4250351Z 
2025-12-04T12:15:05.4250862Z [W1204 11:55:05.239006537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4250898Z 
2025-12-04T12:15:05.4251409Z [W1204 11:55:05.239571280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4251426Z 
2025-12-04T12:15:05.4251941Z [W1204 11:55:05.239765830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4251976Z 
2025-12-04T12:15:05.4252486Z [W1204 11:55:05.240399667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4252491Z 
2025-12-04T12:15:05.4253013Z [W1204 11:55:05.240613813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4253018Z 
2025-12-04T12:15:05.4253529Z [W1204 11:55:05.241045787 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4253534Z 
2025-12-04T12:15:05.4254061Z [W1204 11:55:05.241224039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4254066Z 
2025-12-04T12:15:05.4254576Z [W1204 11:55:05.241603480 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4254581Z 
2025-12-04T12:15:05.4255104Z [W1204 11:55:05.241780784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4255109Z 
2025-12-04T12:15:05.4255620Z [W1204 11:55:05.242167135 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4255625Z 
2025-12-04T12:15:05.4256133Z [W1204 11:55:05.242344300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4256212Z 
2025-12-04T12:15:05.4256772Z W1204 11:55:06.210000 115886 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T12:15:05.4256913Z ('RERUN', {'yellow': True}) [19.8220s] [100%]
2025-12-04T12:15:05.4257853Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 11:55:07.680953394 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4257861Z 
2025-12-04T12:15:05.4258379Z [W1204 11:55:07.681414332 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4258383Z 
2025-12-04T12:15:05.4258907Z [W1204 11:55:07.681609327 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4258915Z 
2025-12-04T12:15:05.4259428Z [W1204 11:55:07.682206823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4259433Z 
2025-12-04T12:15:05.4259954Z [W1204 11:55:07.682401354 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4260007Z 
2025-12-04T12:15:05.4260518Z [W1204 11:55:07.682771955 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4260526Z 
2025-12-04T12:15:05.4261031Z [W1204 11:55:07.683074694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4261048Z 
2025-12-04T12:15:05.4261556Z [W1204 11:55:07.683245103 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4261563Z 
2025-12-04T12:15:05.4262070Z [W1204 11:55:07.683706512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4262105Z 
2025-12-04T12:15:05.4262628Z [W1204 11:55:07.683886981 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4262633Z 
2025-12-04T12:15:05.4263144Z [W1204 11:55:07.684341405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4263182Z 
2025-12-04T12:15:05.4263704Z [W1204 11:55:07.684523240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4263709Z 
2025-12-04T12:15:05.4264216Z [W1204 11:55:07.684930294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4264220Z 
2025-12-04T12:15:05.4264745Z [W1204 11:55:07.685108262 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4264750Z 
2025-12-04T12:15:05.4265263Z [W1204 11:55:07.685473853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4265268Z 
2025-12-04T12:15:05.4265789Z [W1204 11:55:07.685650715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4265794Z 
2025-12-04T12:15:05.4266303Z [W1204 11:55:07.686018333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4266308Z 
2025-12-04T12:15:05.4266817Z [W1204 11:55:07.686196060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4266834Z 
2025-12-04T12:15:05.4266969Z ('RERUN', {'yellow': True}) [0.8658s] [100%]
2025-12-04T12:15:05.4267923Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 11:55:08.540588649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4267929Z 
2025-12-04T12:15:05.4268732Z [W1204 11:55:08.541130007 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4268738Z 
2025-12-04T12:15:05.4269253Z [W1204 11:55:08.541323048 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4269260Z 
2025-12-04T12:15:05.4269787Z [W1204 11:55:08.541906082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4269791Z 
2025-12-04T12:15:05.4270298Z [W1204 11:55:08.542102830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4270307Z 
2025-12-04T12:15:05.4270834Z [W1204 11:55:08.542459564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4270839Z 
2025-12-04T12:15:05.4271644Z [W1204 11:55:08.542763509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4271651Z 
2025-12-04T12:15:05.4272163Z [W1204 11:55:08.542935138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4272186Z 
2025-12-04T12:15:05.4272697Z [W1204 11:55:08.543397112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4272702Z 
2025-12-04T12:15:05.4273209Z [W1204 11:55:08.543580272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4273217Z 
2025-12-04T12:15:05.4273796Z [W1204 11:55:08.544024160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4273801Z 
2025-12-04T12:15:05.4274318Z [W1204 11:55:08.544207224 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4274325Z 
2025-12-04T12:15:05.4274852Z [W1204 11:55:08.544601688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4274902Z 
2025-12-04T12:15:05.4275412Z [W1204 11:55:08.544782826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4275418Z 
2025-12-04T12:15:05.4275938Z [W1204 11:55:08.545144186 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4275946Z 
2025-12-04T12:15:05.4276458Z [W1204 11:55:08.545330823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4276466Z 
2025-12-04T12:15:05.4276992Z [W1204 11:55:08.545695284 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4276997Z 
2025-12-04T12:15:05.4277506Z [W1204 11:55:08.545873675 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4277514Z 
2025-12-04T12:15:05.4277620Z FAILED [0.9733s] [100%]
2025-12-04T12:15:05.4277625Z 
2025-12-04T12:15:05.4277782Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.4278067Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T12:15:05.4278205Z Traceback (most recent call last):
2025-12-04T12:15:05.4278605Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T12:15:05.4278807Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T12:15:05.4279317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.4279567Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.4280086Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.4280293Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.4280823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.4280987Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.4281526Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.4281850Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.4282391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.4282541Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.4283075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.4283202Z     return self._compile_to_module()
2025-12-04T12:15:05.4283692Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.4283876Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.4284394Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.4284523Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.4285040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.4285311Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.4285920Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.4286050Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.4286532Z   File "/tmp/tmpz_1dg954/ya/cyapkdhf5yoal5x65ohvfufmwbn7mtthu52mkwtqnsx4utahfmjk.py", line 193, in <module>
2025-12-04T12:15:05.4287039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T12:15:05.4287159Z     self._wait_futures(scope)
2025-12-04T12:15:05.4287669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T12:15:05.4287792Z     kernel = result.result()
2025-12-04T12:15:05.4288243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T12:15:05.4288376Z     return self.result_fn()
2025-12-04T12:15:05.4288857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T12:15:05.4288993Z     raise e.with_name(kernel_name) from e
2025-12-04T12:15:05.4289392Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T12:15:05.4289401Z 
2025-12-04T12:15:05.4289612Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T12:15:05.4289754Z Traceback (most recent call last):
2025-12-04T12:15:05.4290299Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T12:15:05.4290403Z     result = job()
2025-12-04T12:15:05.4291014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T12:15:05.4291198Z     kernel.precompile(warm_cache_only=True)
2025-12-04T12:15:05.4291768Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T12:15:05.4291887Z     self._precompile_worker()
2025-12-04T12:15:05.4292486Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.4292685Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.4293279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4293480Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4293950Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4294201Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4294662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4295000Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4295217Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.4295734Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4295829Z ^
2025-12-04T12:15:05.4296394Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4296401Z 
2025-12-04T12:15:05.4296405Z 
2025-12-04T12:15:05.4297117Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.4297125Z 
2025-12-04T12:15:05.4297129Z 
2025-12-04T12:15:05.4297387Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4297928Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:15:05.4297933Z 
2025-12-04T12:15:05.4298203Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4298442Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4298601Z frames [('total', 1)]
2025-12-04T12:15:05.4298719Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4299389Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.4299613Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4299733Z graph_break []
2025-12-04T12:15:05.4299909Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4300130Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.4301354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T12:15:05.4301474Z   if out == self.unknown_value:
2025-12-04T12:15:05.4301772Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T12:15:05.4301898Z Traceback (most recent call last):
2025-12-04T12:15:05.4302294Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T12:15:05.4302455Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T12:15:05.4302944Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.4303234Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.4303760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.4303957Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.4304481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.4304631Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.4305167Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.4305504Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.4306025Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.4306189Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.4306674Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.4306797Z     return self._compile_to_module()
2025-12-04T12:15:05.4307332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.4307498Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.4308016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.4308157Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.4308657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.4308901Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.4309517Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.4309650Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.4310177Z   File "/tmp/tmpnyyinwfq/lc/clc5kr7ofp6ipkdzxvgogfgretxbny23pz4cfqnpum2ef7mimh4t.py", line 193, in <module>
2025-12-04T12:15:05.4310632Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T12:15:05.4310789Z     self._wait_futures(scope)
2025-12-04T12:15:05.4311287Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T12:15:05.4311405Z     kernel = result.result()
2025-12-04T12:15:05.4311860Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T12:15:05.4311976Z     return self.result_fn()
2025-12-04T12:15:05.4312457Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T12:15:05.4312604Z     raise e.with_name(kernel_name) from e
2025-12-04T12:15:05.4312985Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T12:15:05.4312991Z 
2025-12-04T12:15:05.4313214Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T12:15:05.4313339Z Traceback (most recent call last):
2025-12-04T12:15:05.4313881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T12:15:05.4314000Z     result = job()
2025-12-04T12:15:05.4314592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T12:15:05.4314734Z     kernel.precompile(warm_cache_only=True)
2025-12-04T12:15:05.4315342Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T12:15:05.4315464Z     self._precompile_worker()
2025-12-04T12:15:05.4316071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.4316256Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.4316851Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4317065Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4317519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4317777Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4318222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4318564Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4318761Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.4319289Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4319380Z ^
2025-12-04T12:15:05.4319852Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4319860Z 
2025-12-04T12:15:05.4319865Z 
2025-12-04T12:15:05.4320576Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.4320582Z 
2025-12-04T12:15:05.4320587Z 
2025-12-04T12:15:05.4320817Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4321342Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:15:05.4321381Z 
2025-12-04T12:15:05.4321664Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4321886Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4321994Z frames [('total', 1)]
2025-12-04T12:15:05.4322127Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4322780Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.4323048Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4323149Z graph_break []
2025-12-04T12:15:05.4323327Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4323560Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.4324771Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T12:15:05.4324888Z   if out == self.unknown_value:
2025-12-04T12:15:05.4325120Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4325225Z frames [('total', 1)]
2025-12-04T12:15:05.4325353Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4325578Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4326232Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.4326348Z graph_break []
2025-12-04T12:15:05.4326524Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4326703Z =================================== FAILURES ===================================
2025-12-04T12:15:05.4327000Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T12:15:05.4327124Z Traceback (most recent call last):
2025-12-04T12:15:05.4327532Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T12:15:05.4327678Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T12:15:05.4328165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.4328428Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.4328938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.4329142Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.4329656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.4329805Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.4330350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.4330703Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.4331227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.4331391Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.4331872Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.4332007Z     return self._compile_to_module()
2025-12-04T12:15:05.4332493Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.4332661Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.4333229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.4333365Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.4333877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.4334111Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.4334740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.4334880Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.4335386Z   File "/tmp/tmpzltiuvop/kw/ckwcj562jw3lbuugkfy6lz46lt2sifvytood63xi4pmhwrznmhdn.py", line 193, in <module>
2025-12-04T12:15:05.4335847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T12:15:05.4335980Z     self._wait_futures(scope)
2025-12-04T12:15:05.4336561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T12:15:05.4336692Z     kernel = result.result()
2025-12-04T12:15:05.4337142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T12:15:05.4337256Z     return self.result_fn()
2025-12-04T12:15:05.4337757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T12:15:05.4337889Z     raise e.with_name(kernel_name) from e
2025-12-04T12:15:05.4338271Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T12:15:05.4338291Z 
2025-12-04T12:15:05.4338500Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T12:15:05.4338674Z Traceback (most recent call last):
2025-12-04T12:15:05.4346391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T12:15:05.4346611Z     result = job()
2025-12-04T12:15:05.4347255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T12:15:05.4347416Z     kernel.precompile(warm_cache_only=True)
2025-12-04T12:15:05.4347979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T12:15:05.4348107Z     self._precompile_worker()
2025-12-04T12:15:05.4348721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.4348900Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.4349513Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4349713Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4350169Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4350549Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4350999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4351355Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4351544Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.4352501Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4352612Z ^
2025-12-04T12:15:05.4353072Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4353079Z 
2025-12-04T12:15:05.4353171Z 
2025-12-04T12:15:05.4353909Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.4353915Z 
2025-12-04T12:15:05.4353920Z 
2025-12-04T12:15:05.4354139Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4354705Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:15:05.4354726Z 
2025-12-04T12:15:05.4354998Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4355228Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4355350Z frames [('total', 1)]
2025-12-04T12:15:05.4355477Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4356139Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.4356377Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4356480Z graph_break []
2025-12-04T12:15:05.4356674Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4356898Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.4358115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T12:15:05.4358251Z   if out == self.unknown_value:
2025-12-04T12:15:05.4358470Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4358617Z frames [('total', 1)]
2025-12-04T12:15:05.4358751Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4358976Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4359642Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.4359742Z graph_break []
2025-12-04T12:15:05.4359918Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4360153Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4360260Z frames [('total', 1)]
2025-12-04T12:15:05.4360378Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4360611Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4361263Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.4361378Z graph_break []
2025-12-04T12:15:05.4361555Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4362208Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d16f18ba4de45d90.xml -
2025-12-04T12:15:05.4362425Z =========================== short test summary info ============================
2025-12-04T12:15:05.4363281Z FAILED [0.9733s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T12:15:05.4363290Z 
2025-12-04T12:15:05.4363513Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T12:15:05.4363638Z Traceback (most recent call last):
2025-12-04T12:15:05.4364183Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T12:15:05.4364478Z     result = job()
2025-12-04T12:15:05.4365122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T12:15:05.4365265Z     kernel.precompile(warm_cache_only=True)
2025-12-04T12:15:05.4365834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T12:15:05.4365951Z     self._precompile_worker()
2025-12-04T12:15:05.4366592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.4366768Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.4367360Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4367575Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4368030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4368284Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4368731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4369065Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4369260Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.4369756Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4369847Z ^
2025-12-04T12:15:05.4370313Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4370320Z 
2025-12-04T12:15:05.4370364Z 
2025-12-04T12:15:05.4371284Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.4371291Z 
2025-12-04T12:15:05.4371296Z 
2025-12-04T12:15:05.4371528Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4372053Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:15:05.4372058Z 
2025-12-04T12:15:05.4372337Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4372519Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.4372727Z ================= 1 failed, 187 deselected, 2 rerun in 21.71s ==================
2025-12-04T12:15:05.4372842Z Got exit code 1
2025-12-04T12:15:05.4372952Z Retrying single test...
2025-12-04T12:15:05.4373440Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4078dca354f1c797.xml
2025-12-04T12:15:05.4373608Z ============================= test session starts ==============================
2025-12-04T12:15:05.4373960Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.4374080Z cachedir: .pytest_cache
2025-12-04T12:15:05.4374687Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.4374817Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.4374940Z configfile: pytest.ini
2025-12-04T12:15:05.4375530Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.4375765Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.4376469Z stepcurrent: skipping 22 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16
2025-12-04T12:15:05.4376591Z Running 1 items in this shard
2025-12-04T12:15:05.4376654Z 
2025-12-04T12:15:05.4377587Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 11:55:25.665307544 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4377593Z 
2025-12-04T12:15:05.4378110Z [W1204 11:55:41.632861756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4378163Z 
2025-12-04T12:15:05.4378686Z [W1204 11:55:41.633115095 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4378691Z 
2025-12-04T12:15:05.4379200Z [W1204 11:55:41.636190118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4379208Z 
2025-12-04T12:15:05.4379735Z [W1204 11:55:41.636379338 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4379740Z 
2025-12-04T12:15:05.4380252Z [W1204 11:55:41.638308150 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4380260Z 
2025-12-04T12:15:05.4380768Z [W1204 11:55:41.638639530 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4380788Z 
2025-12-04T12:15:05.4381298Z [W1204 11:55:41.638807310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4381304Z 
2025-12-04T12:15:05.4381812Z [W1204 11:55:41.639323280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4381865Z 
2025-12-04T12:15:05.4382391Z [W1204 11:55:41.639504314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4382398Z 
2025-12-04T12:15:05.4382906Z [W1204 11:55:41.640100353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4382914Z 
2025-12-04T12:15:05.4383429Z [W1204 11:55:41.640300068 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4383438Z 
2025-12-04T12:15:05.4383943Z [W1204 11:55:41.640736483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4383948Z 
2025-12-04T12:15:05.4384467Z [W1204 11:55:41.640913317 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4384471Z 
2025-12-04T12:15:05.4384982Z [W1204 11:55:41.641287051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4384987Z 
2025-12-04T12:15:05.4385507Z [W1204 11:55:41.641460882 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4385512Z 
2025-12-04T12:15:05.4386051Z [W1204 11:55:41.641839376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4386057Z 
2025-12-04T12:15:05.4386567Z [W1204 11:55:41.642012794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4386586Z 
2025-12-04T12:15:05.4387051Z W1204 11:55:41.605000 116320 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T12:15:05.4387189Z ('RERUN', {'yellow': True}) [19.8783s] [100%]
2025-12-04T12:15:05.4388145Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 11:55:42.047258113 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4388152Z 
2025-12-04T12:15:05.4388664Z [W1204 11:55:42.047705038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4388669Z 
2025-12-04T12:15:05.4389193Z [W1204 11:55:42.047892171 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4389228Z 
2025-12-04T12:15:05.4389731Z [W1204 11:55:42.048485283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4389736Z 
2025-12-04T12:15:05.4390255Z [W1204 11:55:42.048698011 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4390263Z 
2025-12-04T12:15:05.4390776Z [W1204 11:55:42.049065292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4390781Z 
2025-12-04T12:15:05.4391284Z [W1204 11:55:42.049383283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4391305Z 
2025-12-04T12:15:05.4391815Z [W1204 11:55:42.049554122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4391823Z 
2025-12-04T12:15:05.4392325Z [W1204 11:55:42.050055056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4392331Z 
2025-12-04T12:15:05.4392845Z [W1204 11:55:42.050242151 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4392885Z 
2025-12-04T12:15:05.4393392Z [W1204 11:55:42.050698827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4393399Z 
2025-12-04T12:15:05.4393914Z [W1204 11:55:42.050878356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4393919Z 
2025-12-04T12:15:05.4394430Z [W1204 11:55:42.051268046 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4394437Z 
2025-12-04T12:15:05.4394953Z [W1204 11:55:42.051444935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4394958Z 
2025-12-04T12:15:05.4395466Z [W1204 11:55:42.051808164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4395471Z 
2025-12-04T12:15:05.4395989Z [W1204 11:55:42.051983558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4395994Z 
2025-12-04T12:15:05.4396503Z [W1204 11:55:42.052348990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4396508Z 
2025-12-04T12:15:05.4397062Z [W1204 11:55:42.052526374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4397067Z 
2025-12-04T12:15:05.4397213Z ('RERUN', {'yellow': True}) [0.8586s] [100%]
2025-12-04T12:15:05.4398119Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 11:55:43.915440839 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4398124Z 
2025-12-04T12:15:05.4398643Z [W1204 11:55:43.915862192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4398650Z 
2025-12-04T12:15:05.4399188Z [W1204 11:55:43.916046807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4399194Z 
2025-12-04T12:15:05.4399713Z [W1204 11:55:43.916647303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4399718Z 
2025-12-04T12:15:05.4400225Z [W1204 11:55:43.916847572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4400261Z 
2025-12-04T12:15:05.4400783Z [W1204 11:55:43.917215255 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4400788Z 
2025-12-04T12:15:05.4401292Z [W1204 11:55:43.917505746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4401300Z 
2025-12-04T12:15:05.4401807Z [W1204 11:55:43.917671384 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4401812Z 
2025-12-04T12:15:05.4402331Z [W1204 11:55:43.918138177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4402336Z 
2025-12-04T12:15:05.4402840Z [W1204 11:55:43.918319093 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4402848Z 
2025-12-04T12:15:05.4403367Z [W1204 11:55:43.918761094 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4403372Z 
2025-12-04T12:15:05.4403875Z [W1204 11:55:43.918938689 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4403911Z 
2025-12-04T12:15:05.4404434Z [W1204 11:55:43.919322728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4404439Z 
2025-12-04T12:15:05.4404944Z [W1204 11:55:43.919499414 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4404949Z 
2025-12-04T12:15:05.4405465Z [W1204 11:55:43.919853472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4405473Z 
2025-12-04T12:15:05.4405981Z [W1204 11:55:43.920059241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4405986Z 
2025-12-04T12:15:05.4406490Z [W1204 11:55:43.920446807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4406506Z 
2025-12-04T12:15:05.4407014Z [W1204 11:55:43.920634939 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:15:05.4407021Z 
2025-12-04T12:15:05.4407125Z FAILED [0.9665s] [100%]
2025-12-04T12:15:05.4407130Z 
2025-12-04T12:15:05.4407286Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.4407621Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T12:15:05.4407759Z Traceback (most recent call last):
2025-12-04T12:15:05.4408159Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T12:15:05.4408311Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T12:15:05.4408818Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.4409070Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.4409587Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.4409835Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.4410349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.4410513Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.4411049Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.4411405Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.4411935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.4412085Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.4412580Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.4412707Z     return self._compile_to_module()
2025-12-04T12:15:05.4413193Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.4413370Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.4413882Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.4414016Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.4414523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.4414756Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.4415347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.4415507Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.4415999Z   File "/tmp/tmpmt7jig_6/uf/cuflm2uv5axh3zr2dkqme6bl7pcttyxbrpkuupsvfhwpe45mxnhx.py", line 193, in <module>
2025-12-04T12:15:05.4416543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T12:15:05.4416662Z     self._wait_futures(scope)
2025-12-04T12:15:05.4417170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T12:15:05.4417290Z     kernel = result.result()
2025-12-04T12:15:05.4417734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T12:15:05.4417856Z     return self.result_fn()
2025-12-04T12:15:05.4418333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T12:15:05.4418464Z     raise e.with_name(kernel_name) from e
2025-12-04T12:15:05.4418861Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T12:15:05.4418869Z 
2025-12-04T12:15:05.4419079Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T12:15:05.4419217Z Traceback (most recent call last):
2025-12-04T12:15:05.4419800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T12:15:05.4419903Z     result = job()
2025-12-04T12:15:05.4420509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T12:15:05.4420651Z     kernel.precompile(warm_cache_only=True)
2025-12-04T12:15:05.4421204Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T12:15:05.4421334Z     self._precompile_worker()
2025-12-04T12:15:05.4421935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.4422161Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.4422760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4422961Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4423423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4423783Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4424238Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4424573Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4424760Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.4425273Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4425365Z ^
2025-12-04T12:15:05.4425828Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4425846Z 
2025-12-04T12:15:05.4425851Z 
2025-12-04T12:15:05.4426566Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.4426574Z 
2025-12-04T12:15:05.4426579Z 
2025-12-04T12:15:05.4426801Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4427337Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:15:05.4427377Z 
2025-12-04T12:15:05.4427646Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4427889Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4427996Z frames [('total', 1)]
2025-12-04T12:15:05.4428117Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4428795Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.4429018Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4429136Z graph_break []
2025-12-04T12:15:05.4429315Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4429538Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.4430756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T12:15:05.4430880Z   if out == self.unknown_value:
2025-12-04T12:15:05.4431167Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T12:15:05.4431309Z Traceback (most recent call last):
2025-12-04T12:15:05.4431739Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T12:15:05.4431895Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T12:15:05.4432386Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.4432637Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.4433162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.4433360Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.4433908Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.4434054Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.4434593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.4434925Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.4435472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.4435621Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.4436112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.4436232Z     return self._compile_to_module()
2025-12-04T12:15:05.4436730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.4436898Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.4437414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.4437558Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.4438056Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.4438299Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.4438885Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.4439013Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.4439528Z   File "/tmp/tmpld5b0idr/nx/cnxx7kaioecv5jz4hjemekyivxpjom344u7no6d4gmd7d27uagsq.py", line 193, in <module>
2025-12-04T12:15:05.4440028Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T12:15:05.4440147Z     self._wait_futures(scope)
2025-12-04T12:15:05.4440649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T12:15:05.4440768Z     kernel = result.result()
2025-12-04T12:15:05.4441224Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T12:15:05.4441343Z     return self.result_fn()
2025-12-04T12:15:05.4441820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T12:15:05.4441960Z     raise e.with_name(kernel_name) from e
2025-12-04T12:15:05.4442342Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T12:15:05.4442350Z 
2025-12-04T12:15:05.4442567Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T12:15:05.4442691Z Traceback (most recent call last):
2025-12-04T12:15:05.4443232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T12:15:05.4443343Z     result = job()
2025-12-04T12:15:05.4443969Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T12:15:05.4444112Z     kernel.precompile(warm_cache_only=True)
2025-12-04T12:15:05.4444682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T12:15:05.4444799Z     self._precompile_worker()
2025-12-04T12:15:05.4445408Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.4445590Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.4446216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4446429Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4446885Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4447143Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4447618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4447952Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4448146Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.4448646Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4448738Z ^
2025-12-04T12:15:05.4449204Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4449210Z 
2025-12-04T12:15:05.4449215Z 
2025-12-04T12:15:05.4449931Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.4449937Z 
2025-12-04T12:15:05.4449942Z 
2025-12-04T12:15:05.4450169Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4450691Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:15:05.4450697Z 
2025-12-04T12:15:05.4450976Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4451197Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4451342Z frames [('total', 1)]
2025-12-04T12:15:05.4451477Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4452139Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.4452376Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4452478Z graph_break []
2025-12-04T12:15:05.4452655Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4452890Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.4454095Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T12:15:05.4454214Z   if out == self.unknown_value:
2025-12-04T12:15:05.4454448Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4454553Z frames [('total', 1)]
2025-12-04T12:15:05.4454684Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4454904Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4455597Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.4455710Z graph_break []
2025-12-04T12:15:05.4455888Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4456039Z =================================== FAILURES ===================================
2025-12-04T12:15:05.4456442Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T12:15:05.4456573Z Traceback (most recent call last):
2025-12-04T12:15:05.4456981Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T12:15:05.4457131Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T12:15:05.4457668Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.4457934Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.4458454Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.4458648Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.4459198Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.4459346Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.4459888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.4460212Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.4460734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.4460893Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.4461376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.4461509Z     return self._compile_to_module()
2025-12-04T12:15:05.4461990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.4462150Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.4462680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.4462810Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.4463337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.4463582Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.4464168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.4464307Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.4464816Z   File "/tmp/tmp4aemnlh4/qc/cqcprnkar4ppm6efp767xjtczmidmbw3cmmyeqznbpp5lk67zulo.py", line 193, in <module>
2025-12-04T12:15:05.4465274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T12:15:05.4465403Z     self._wait_futures(scope)
2025-12-04T12:15:05.4465896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T12:15:05.4466022Z     kernel = result.result()
2025-12-04T12:15:05.4466465Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T12:15:05.4466579Z     return self.result_fn()
2025-12-04T12:15:05.4467067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T12:15:05.4467232Z     raise e.with_name(kernel_name) from e
2025-12-04T12:15:05.4467614Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T12:15:05.4467630Z 
2025-12-04T12:15:05.4467841Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T12:15:05.4467963Z Traceback (most recent call last):
2025-12-04T12:15:05.4468512Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T12:15:05.4468611Z     result = job()
2025-12-04T12:15:05.4469205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T12:15:05.4469361Z     kernel.precompile(warm_cache_only=True)
2025-12-04T12:15:05.4469948Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T12:15:05.4470081Z     self._precompile_worker()
2025-12-04T12:15:05.4470684Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.4470861Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.4471694Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4471891Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4472341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4472603Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4473049Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4473395Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4473578Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.4474073Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4474180Z ^
2025-12-04T12:15:05.4474635Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4474640Z 
2025-12-04T12:15:05.4474645Z 
2025-12-04T12:15:05.4475371Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.4475475Z 
2025-12-04T12:15:05.4475480Z 
2025-12-04T12:15:05.4475698Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4476236Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:15:05.4476242Z 
2025-12-04T12:15:05.4476513Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4476735Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4476859Z frames [('total', 1)]
2025-12-04T12:15:05.4476978Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4477634Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.4477869Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4477970Z graph_break []
2025-12-04T12:15:05.4478162Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4478383Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:15:05.4479651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T12:15:05.4479786Z   if out == self.unknown_value:
2025-12-04T12:15:05.4480005Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4480119Z frames [('total', 1)]
2025-12-04T12:15:05.4480238Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4480457Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4481127Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.4481229Z graph_break []
2025-12-04T12:15:05.4481453Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4481699Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4481804Z frames [('total', 1)]
2025-12-04T12:15:05.4481924Z stats [('calls_captured', 11)]
2025-12-04T12:15:05.4482162Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4482816Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.4483000Z graph_break []
2025-12-04T12:15:05.4483180Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T12:15:05.4483827Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4078dca354f1c797.xml -
2025-12-04T12:15:05.4484023Z =========================== short test summary info ============================
2025-12-04T12:15:05.4484890Z FAILED [0.9665s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T12:15:05.4484896Z 
2025-12-04T12:15:05.4485122Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T12:15:05.4485250Z Traceback (most recent call last):
2025-12-04T12:15:05.4485798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T12:15:05.4485920Z     result = job()
2025-12-04T12:15:05.4486513Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T12:15:05.4486671Z     kernel.precompile(warm_cache_only=True)
2025-12-04T12:15:05.4487229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T12:15:05.4487387Z     self._precompile_worker()
2025-12-04T12:15:05.4488000Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.4488184Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.4488781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4489000Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4489453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4489717Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4490161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4490501Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4490704Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.4491236Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4491341Z ^
2025-12-04T12:15:05.4491799Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4491807Z 
2025-12-04T12:15:05.4491812Z 
2025-12-04T12:15:05.4492520Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.4492541Z 
2025-12-04T12:15:05.4492546Z 
2025-12-04T12:15:05.4492764Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4493292Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:15:05.4493298Z 
2025-12-04T12:15:05.4493616Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4493803Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.4494012Z ================= 1 failed, 187 deselected, 2 rerun in 21.75s ==================
2025-12-04T12:15:05.4494130Z Got exit code 1
2025-12-04T12:15:05.4494573Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16
2025-12-04T12:15:05.4495034Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:05.4495507Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7591ded94ad5fda9.xml
2025-12-04T12:15:05.4495675Z ============================= test session starts ==============================
2025-12-04T12:15:05.4496050Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.4496163Z cachedir: .pytest_cache
2025-12-04T12:15:05.4496783Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.4496915Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.4497026Z configfile: pytest.ini
2025-12-04T12:15:05.4497635Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.4497864Z collecting ... collected 188 items / 23 deselected / 165 selected
2025-12-04T12:15:05.4498007Z stepcurrent: skipping 23 already run items.
2025-12-04T12:15:05.4498136Z Running 165 items in this shard
2025-12-04T12:15:05.4498142Z 
2025-12-04T12:15:05.4499058Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_False_cuda SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  0%]
2025-12-04T12:15:05.4500011Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_True_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  1%]
2025-12-04T12:15:05.4500898Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_False_cuda SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  1%]
2025-12-04T12:15:05.4501796Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_True_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  2%]
2025-12-04T12:15:05.4503219Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T12:15:05.4504357Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4504794Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.4505239Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.4505767Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.4506228Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.4506806Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.4507350Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.4507934Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.4508566Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.4509122Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.4509577Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.4510099Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.4510573Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.4511047Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.4511495Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.4512155Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.4512682Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.4513265Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T12:15:05.4513786Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.4514366Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4514913Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T12:15:05.4515490Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4516037Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T12:15:05.4516608Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4517174Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T12:15:05.4517697Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.4518183Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T12:15:05.4518677Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T12:15:05.4519155Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T12:15:05.4519776Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4520327Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T12:15:05.4520903Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4521396Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T12:15:05.4521876Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T12:15:05.4522364Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T12:15:05.4522818Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T12:15:05.4523307Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T12:15:05.4523848Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T12:15:05.4524332Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T12:15:05.4524836Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T12:15:05.4525441Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4526017Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T12:15:05.4526707Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4527191Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T12:15:05.4527646Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T12:15:05.4528227Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T12:15:05.4528664Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T12:15:05.4529244Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T12:15:05.4529780Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T12:15:05.4530306Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T12:15:05.4531041Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T12:15:05.4531752Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T12:15:05.4532128Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.4534267Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.4534846Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.4535890Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4536617Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4537512Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4538205Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4539089Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4539860Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4540529Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.4541619Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4542002Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.4542896Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4543046Z ('RERUN', {'yellow': True}) [3.3820s] [  3%]
2025-12-04T12:15:05.4544468Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T12:15:05.4545598Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4546034Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.4546478Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.4547005Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.4547468Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.4548043Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.4548584Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.4549166Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.4549791Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.4550345Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.4550798Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.4551319Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.4551790Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.4552258Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.4552705Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.4553360Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.4553887Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.4554468Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T12:15:05.4554987Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.4555573Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4556119Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T12:15:05.4556698Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4557241Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T12:15:05.4557815Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4558381Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T12:15:05.4558905Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.4559391Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T12:15:05.4559887Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T12:15:05.4560368Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T12:15:05.4561552Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4562115Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T12:15:05.4562693Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4563222Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T12:15:05.4563660Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T12:15:05.4564150Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T12:15:05.4564607Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T12:15:05.4565093Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T12:15:05.4565632Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T12:15:05.4566112Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T12:15:05.4566621Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T12:15:05.4567223Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4567801Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T12:15:05.4568491Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4568973Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T12:15:05.4569415Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T12:15:05.4570004Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T12:15:05.4570444Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T12:15:05.4571195Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T12:15:05.4571735Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T12:15:05.4572340Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T12:15:05.4573047Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T12:15:05.4573763Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T12:15:05.4574284Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.4576525Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.4577127Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.4578175Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4578822Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4579716Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4580414Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4581297Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4582067Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4582742Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.4583839Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4584224Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.4585118Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4585270Z ('RERUN', {'yellow': True}) [0.4131s] [  3%]
2025-12-04T12:15:05.4586703Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T12:15:05.4587842Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4588290Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.4588734Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.4589262Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.4589763Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.4590314Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.4590860Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.4591448Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.4592086Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.4592644Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.4593108Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.4593628Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.4594106Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.4594579Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.4595030Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.4595693Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.4596257Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.4596807Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T12:15:05.4597330Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.4597910Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4598459Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T12:15:05.4599037Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4599574Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T12:15:05.4600162Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4600729Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T12:15:05.4601253Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.4601742Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T12:15:05.4602243Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T12:15:05.4602721Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T12:15:05.4603682Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4604241Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T12:15:05.4604820Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4605359Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T12:15:05.4605795Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T12:15:05.4606288Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T12:15:05.4606751Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T12:15:05.4607232Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T12:15:05.4607781Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T12:15:05.4608261Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T12:15:05.4608769Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T12:15:05.4609369Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4609996Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T12:15:05.4610646Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4611130Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T12:15:05.4611575Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T12:15:05.4612166Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T12:15:05.4612605Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T12:15:05.4613201Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T12:15:05.4613734Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T12:15:05.4614311Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T12:15:05.4615037Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T12:15:05.4615747Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T12:15:05.4616130Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.4618359Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.4618949Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.4619995Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4620643Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4621542Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4622237Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4623125Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4623893Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4624569Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.4625657Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4626036Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.4626929Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4627050Z FAILED [0.4126s] [  3%]
2025-12-04T12:15:05.4627059Z 
2025-12-04T12:15:05.4627206Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.4627596Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _
2025-12-04T12:15:05.4627732Z Traceback (most recent call last):
2025-12-04T12:15:05.4628201Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.4628445Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.4628953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.4629203Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.4629728Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.4629927Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.4630548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.4630720Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.4631255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.4631591Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.4632149Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.4632303Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.4632799Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.4632927Z     return self._compile_to_module()
2025-12-04T12:15:05.4633416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.4633601Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.4634119Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.4634264Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.4634761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.4634997Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.4635601Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.4635732Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.4636251Z   File "/tmp/tmpz9880xg5/sk/cskb2xbapc5orxobaeeehsbmvlgzfbqdt5wuoetd2bvzx3hf7kwu.py", line 74, in <module>
2025-12-04T12:15:05.4636775Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.4636894Z     kernel.precompile(
2025-12-04T12:15:05.4637458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.4637584Z     self._precompile_worker()
2025-12-04T12:15:05.4638183Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.4638379Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.4638970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4639180Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4639630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4639882Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4640342Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4640715Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4640958Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.4641610Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4641702Z ^
2025-12-04T12:15:05.4642171Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4642177Z 
2025-12-04T12:15:05.4642890Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.4642899Z 
2025-12-04T12:15:05.4642933Z 
2025-12-04T12:15:05.4643168Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4643864Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T12:15:05.4643870Z 
2025-12-04T12:15:05.4644141Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4644412Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4644517Z frames [('total', 1)]
2025-12-04T12:15:05.4644649Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.4645113Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.4645337Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4645454Z graph_break []
2025-12-04T12:15:05.4645843Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _
2025-12-04T12:15:05.4645967Z Traceback (most recent call last):
2025-12-04T12:15:05.4646407Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.4646639Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.4647141Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.4647391Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.4647903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.4648114Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.4648655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.4648818Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.4649351Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.4649674Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.4650204Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.4650356Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.4650833Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.4650967Z     return self._compile_to_module()
2025-12-04T12:15:05.4651455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.4651635Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.4652153Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.4652284Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.4652822Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.4653055Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.4653656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.4653784Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.4654292Z   File "/tmp/tmpj0qljqxx/bh/cbhxnnqevtb5kwecrnfczthhoacmgjrdo6reybht6gqraqtjzvmv.py", line 74, in <module>
2025-12-04T12:15:05.4654771Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.4654915Z     kernel.precompile(
2025-12-04T12:15:05.4655472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.4655609Z     self._precompile_worker()
2025-12-04T12:15:05.4656208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.4656538Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.4657141Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4657341Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4657809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4658059Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4658520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4658856Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4659084Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.4659748Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4659842Z ^
2025-12-04T12:15:05.4660315Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4660321Z 
2025-12-04T12:15:05.4661030Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.4661085Z 
2025-12-04T12:15:05.4661090Z 
2025-12-04T12:15:05.4661312Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4662023Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T12:15:05.4662028Z 
2025-12-04T12:15:05.4662298Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4662539Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4662644Z frames [('total', 1)]
2025-12-04T12:15:05.4662762Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.4663239Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.4663465Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4663578Z graph_break []
2025-12-04T12:15:05.4663798Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4663904Z frames [('total', 1)]
2025-12-04T12:15:05.4664035Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.4664254Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4664763Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.4664881Z graph_break []
2025-12-04T12:15:05.4665030Z =================================== FAILURES ===================================
2025-12-04T12:15:05.4665418Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _
2025-12-04T12:15:05.4665558Z Traceback (most recent call last):
2025-12-04T12:15:05.4665984Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.4666238Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.4666761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.4667012Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.4667539Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.4667737Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.4668297Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.4668446Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.4668980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.4669322Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.4669849Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.4670014Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.4670498Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.4670625Z     return self._compile_to_module()
2025-12-04T12:15:05.4671420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.4671593Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.4672113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.4672260Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.4672856Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.4673107Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.4673692Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.4673823Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.4674349Z   File "/tmp/tmpxs429tji/st/cstau35jzagywdq344ovuqcdoffasluumixuoleq23pvvnkx3o4i.py", line 74, in <module>
2025-12-04T12:15:05.4674815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.4674944Z     kernel.precompile(
2025-12-04T12:15:05.4675501Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.4675619Z     self._precompile_worker()
2025-12-04T12:15:05.4676231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.4676416Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.4677073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4677287Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4677741Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4678004Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4678449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4678785Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4679033Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.4679739Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4679849Z ^
2025-12-04T12:15:05.4680311Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4680317Z 
2025-12-04T12:15:05.4681026Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.4681101Z 
2025-12-04T12:15:05.4681106Z 
2025-12-04T12:15:05.4681340Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4682033Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T12:15:05.4682041Z 
2025-12-04T12:15:05.4682328Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4682553Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4682657Z frames [('total', 1)]
2025-12-04T12:15:05.4682791Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.4683258Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.4683492Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4683594Z graph_break []
2025-12-04T12:15:05.4683812Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4683930Z frames [('total', 1)]
2025-12-04T12:15:05.4684047Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.4684264Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4684774Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.4684873Z graph_break []
2025-12-04T12:15:05.4685105Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4685208Z frames [('total', 1)]
2025-12-04T12:15:05.4685323Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.4685558Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4686014Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.4686113Z graph_break []
2025-12-04T12:15:05.4686775Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7591ded94ad5fda9.xml -
2025-12-04T12:15:05.4686951Z =========================== short test summary info ============================
2025-12-04T12:15:05.4687805Z FAILED [0.4126s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.4688457Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4688580Z ^
2025-12-04T12:15:05.4689058Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4689067Z 
2025-12-04T12:15:05.4689776Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.4689783Z 
2025-12-04T12:15:05.4689788Z 
2025-12-04T12:15:05.4690020Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4690712Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T12:15:05.4690720Z 
2025-12-04T12:15:05.4691037Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4691221Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.4691442Z ============= 1 failed, 4 skipped, 23 deselected, 2 rerun in 4.26s =============
2025-12-04T12:15:05.4691556Z Got exit code 1
2025-12-04T12:15:05.4691665Z Retrying single test...
2025-12-04T12:15:05.4692171Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4955a88ef6b89264.xml
2025-12-04T12:15:05.4692351Z ============================= test session starts ==============================
2025-12-04T12:15:05.4692703Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.4692827Z cachedir: .pytest_cache
2025-12-04T12:15:05.4693348Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.4693478Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.4693601Z configfile: pytest.ini
2025-12-04T12:15:05.4694194Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.4694419Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.4695206Z stepcurrent: skipping 27 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T12:15:05.4695325Z Running 1 items in this shard
2025-12-04T12:15:05.4695331Z 
2025-12-04T12:15:05.4696862Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T12:15:05.4698001Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4698448Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.4698893Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.4699409Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.4699887Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.4700429Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.4700985Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.4701603Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.4702192Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.4702763Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.4703206Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.4703770Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.4704243Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.4704705Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.4705162Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.4705835Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.4706376Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.4706924Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T12:15:05.4707445Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.4708027Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4708556Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T12:15:05.4709148Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4709677Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T12:15:05.4710297Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4710832Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T12:15:05.4711345Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.4711841Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T12:15:05.4712848Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T12:15:05.4713345Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T12:15:05.4713928Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4714476Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T12:15:05.4715112Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4715592Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T12:15:05.4716046Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T12:15:05.4716537Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T12:15:05.4716972Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T12:15:05.4717681Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T12:15:05.4718215Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T12:15:05.4718707Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T12:15:05.4719212Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T12:15:05.4719836Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4720429Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T12:15:05.4721070Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4721566Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T12:15:05.4722010Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T12:15:05.4722594Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T12:15:05.4723034Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T12:15:05.4723602Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T12:15:05.4724187Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T12:15:05.4724704Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T12:15:05.4725421Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T12:15:05.4726131Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T12:15:05.4726498Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.4728635Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.4729173Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.4730232Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4730865Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4731820Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4732502Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4733405Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4734207Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4734829Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.4735918Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4736361Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.4737279Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4737414Z ('RERUN', {'yellow': True}) [3.3979s] [100%]
2025-12-04T12:15:05.4738862Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T12:15:05.4740006Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4740447Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.4740893Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.4741407Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.4741884Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.4742422Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.4743006Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.4743589Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.4744176Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.4744748Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.4745190Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.4745752Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.4746227Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.4746689Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.4747146Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.4747822Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.4748358Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.4748911Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T12:15:05.4749420Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.4750019Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4750548Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T12:15:05.4751138Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4751669Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T12:15:05.4752283Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4752815Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T12:15:05.4753326Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.4753827Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T12:15:05.4754313Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T12:15:05.4754808Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T12:15:05.4755399Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4755941Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T12:15:05.4756559Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4757040Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T12:15:05.4757494Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T12:15:05.4757987Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T12:15:05.4758430Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T12:15:05.4758969Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T12:15:05.4759503Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T12:15:05.4759998Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T12:15:05.4760534Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T12:15:05.4761119Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4761714Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T12:15:05.4762363Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4762864Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T12:15:05.4763315Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T12:15:05.4763908Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T12:15:05.4764350Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T12:15:05.4764924Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T12:15:05.4765508Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T12:15:05.4766020Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T12:15:05.4766746Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T12:15:05.4767452Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T12:15:05.4767816Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.4769942Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.4770488Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.4771728Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4772359Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4773360Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4774042Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4774986Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4775759Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4776436Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.4777554Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4777920Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.4778828Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4778968Z ('RERUN', {'yellow': True}) [0.4171s] [100%]
2025-12-04T12:15:05.4780412Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T12:15:05.4781555Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4782004Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.4782446Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.4782957Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.4783429Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.4783966Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.4784575Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.4785160Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.4785741Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.4786306Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.4786745Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.4787305Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.4787779Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.4788241Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.4788754Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.4789398Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.4789934Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.4790487Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T12:15:05.4790995Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.4791589Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4792122Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T12:15:05.4792714Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4793244Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T12:15:05.4793864Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4794401Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T12:15:05.4794914Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.4795414Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T12:15:05.4795888Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T12:15:05.4796380Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T12:15:05.4796969Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4797504Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T12:15:05.4798121Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4798601Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T12:15:05.4799050Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T12:15:05.4799537Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T12:15:05.4799983Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T12:15:05.4800508Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T12:15:05.4801039Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T12:15:05.4801531Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T12:15:05.4802070Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T12:15:05.4802655Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4803243Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T12:15:05.4803888Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4804382Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T12:15:05.4804828Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T12:15:05.4805404Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T12:15:05.4805860Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T12:15:05.4806432Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T12:15:05.4807013Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T12:15:05.4807530Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T12:15:05.4808246Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T12:15:05.4808958Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T12:15:05.4809321Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.4811454Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.4811994Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.4813047Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4813676Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4814612Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4815297Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4816220Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4817073Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4817692Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.4818873Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4819238Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.4820150Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4820257Z FAILED [0.4229s] [100%]
2025-12-04T12:15:05.4820263Z 
2025-12-04T12:15:05.4820461Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.4820852Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _
2025-12-04T12:15:05.4820979Z Traceback (most recent call last):
2025-12-04T12:15:05.4821419Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.4821657Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.4822146Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.4822408Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.4822921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.4823129Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.4823642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.4823793Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.4824341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.4824725Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.4825262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.4825414Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.4825892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.4826029Z     return self._compile_to_module()
2025-12-04T12:15:05.4826512Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.4826679Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.4827240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.4827373Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.4827884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.4828116Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.4828730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.4828869Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.4829375Z   File "/tmp/tmpkbg3c47x/ry/crymrpytjiqolcjbegw2trazvtmrlfdw4y3cdd2mpwscx5v3qu57.py", line 74, in <module>
2025-12-04T12:15:05.4829852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.4829962Z     kernel.precompile(
2025-12-04T12:15:05.4830519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.4830652Z     self._precompile_worker()
2025-12-04T12:15:05.4831246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.4831424Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.4832033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4832232Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4832695Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4832991Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4833436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4833781Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4834010Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.4834674Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4834770Z ^
2025-12-04T12:15:05.4835230Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4835236Z 
2025-12-04T12:15:05.4835960Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.4835969Z 
2025-12-04T12:15:05.4835974Z 
2025-12-04T12:15:05.4836193Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4837014Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T12:15:05.4837022Z 
2025-12-04T12:15:05.4837292Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4837519Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4837643Z frames [('total', 1)]
2025-12-04T12:15:05.4837764Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.4838245Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.4838470Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4838574Z graph_break []
2025-12-04T12:15:05.4838978Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _
2025-12-04T12:15:05.4839134Z Traceback (most recent call last):
2025-12-04T12:15:05.4839561Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.4839810Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.4840298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.4840593Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.4841106Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.4841300Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.4841827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.4841979Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.4842525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.4842850Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.4843370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.4843535Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.4844015Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.4844151Z     return self._compile_to_module()
2025-12-04T12:15:05.4844635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.4844834Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.4845368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.4845514Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.4846012Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.4846258Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.4846845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.4846986Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.4847484Z   File "/tmp/tmp5w0ergsv/w4/cw4ykyvlsy2aspauo47l6mizex6p44cq6otdsbdeg7dqhi7wl6ku.py", line 74, in <module>
2025-12-04T12:15:05.4847949Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.4848081Z     kernel.precompile(
2025-12-04T12:15:05.4848642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.4848761Z     self._precompile_worker()
2025-12-04T12:15:05.4849398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.4849580Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.4850192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4850394Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4850847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4851110Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4851586Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4851936Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4852169Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.4852819Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4852962Z ^
2025-12-04T12:15:05.4853420Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4853426Z 
2025-12-04T12:15:05.4854150Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.4854158Z 
2025-12-04T12:15:05.4854163Z 
2025-12-04T12:15:05.4854385Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4855081Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T12:15:05.4855100Z 
2025-12-04T12:15:05.4855373Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4855598Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4855725Z frames [('total', 1)]
2025-12-04T12:15:05.4855845Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.4856420Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.4856663Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4856813Z graph_break []
2025-12-04T12:15:05.4857049Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4857159Z frames [('total', 1)]
2025-12-04T12:15:05.4857283Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.4857516Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4857983Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.4858083Z graph_break []
2025-12-04T12:15:05.4858248Z =================================== FAILURES ===================================
2025-12-04T12:15:05.4858635Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _
2025-12-04T12:15:05.4858761Z Traceback (most recent call last):
2025-12-04T12:15:05.4859203Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.4859436Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.4859947Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.4860199Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.4860739Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.4860949Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.4861458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.4861621Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.4862151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.4862471Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.4863005Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.4863230Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.4863724Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.4863847Z     return self._compile_to_module()
2025-12-04T12:15:05.4864332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.4864543Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.4865062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.4865191Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.4865696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.4865934Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.4866533Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.4866660Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.4867161Z   File "/tmp/tmpos7maqea/bw/cbwqdwvki3qydy47fp6sn2g4opjc43xlg5yiy5lzvdxmwgnalbhq.py", line 74, in <module>
2025-12-04T12:15:05.4867636Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.4867752Z     kernel.precompile(
2025-12-04T12:15:05.4868318Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.4868436Z     self._precompile_worker()
2025-12-04T12:15:05.4869029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.4869253Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.4869853Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4870053Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4870520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4870770Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4871391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4871728Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4871955Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.4872623Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4872713Z ^
2025-12-04T12:15:05.4873181Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4873268Z 
2025-12-04T12:15:05.4873980Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.4873990Z 
2025-12-04T12:15:05.4873995Z 
2025-12-04T12:15:05.4874212Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4874917Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T12:15:05.4874923Z 
2025-12-04T12:15:05.4875197Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4875480Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4875587Z frames [('total', 1)]
2025-12-04T12:15:05.4875705Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.4876186Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.4876408Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4876583Z graph_break []
2025-12-04T12:15:05.4876801Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4876905Z frames [('total', 1)]
2025-12-04T12:15:05.4877038Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.4877257Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4877715Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.4877832Z graph_break []
2025-12-04T12:15:05.4878051Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.4878168Z frames [('total', 1)]
2025-12-04T12:15:05.4878285Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.4878508Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.4878979Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.4879082Z graph_break []
2025-12-04T12:15:05.4879728Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4955a88ef6b89264.xml -
2025-12-04T12:15:05.4879915Z =========================== short test summary info ============================
2025-12-04T12:15:05.4880753Z FAILED [0.4229s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.4881464Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4881555Z ^
2025-12-04T12:15:05.4882013Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4882019Z 
2025-12-04T12:15:05.4882741Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.4882749Z 
2025-12-04T12:15:05.4882754Z 
2025-12-04T12:15:05.4882973Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.4883681Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T12:15:05.4883689Z 
2025-12-04T12:15:05.4883960Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.4884159Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.4884364Z ================== 1 failed, 187 deselected, 2 rerun in 4.28s ==================
2025-12-04T12:15:05.4884499Z Got exit code 1
2025-12-04T12:15:05.4884624Z Retrying single test...
2025-12-04T12:15:05.4885098Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2fae1650dec37ec0.xml
2025-12-04T12:15:05.4885265Z ============================= test session starts ==============================
2025-12-04T12:15:05.4885628Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.4885738Z cachedir: .pytest_cache
2025-12-04T12:15:05.4886267Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.4886397Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.4886540Z configfile: pytest.ini
2025-12-04T12:15:05.4887148Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.4887373Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.4888147Z stepcurrent: skipping 27 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T12:15:05.4888308Z Running 1 items in this shard
2025-12-04T12:15:05.4888313Z 
2025-12-04T12:15:05.4889735Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T12:15:05.4890841Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4891275Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.4891732Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.4892250Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.4892711Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.4893329Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.4893873Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.4894471Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.4895055Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.4895613Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.4896065Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.4896667Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.4897158Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.4897663Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.4898126Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.4898773Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.4899296Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.4899853Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T12:15:05.4900392Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.4900984Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4901517Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T12:15:05.4902128Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4902676Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T12:15:05.4903249Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4903799Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T12:15:05.4904312Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.4904794Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T12:15:05.4905285Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T12:15:05.4905766Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T12:15:05.4906363Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4906940Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T12:15:05.4907515Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4908012Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T12:15:05.4908447Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T12:15:05.4908952Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T12:15:05.4909390Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T12:15:05.4909874Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T12:15:05.4910415Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T12:15:05.4910927Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T12:15:05.4911453Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T12:15:05.4912043Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4912630Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T12:15:05.4913272Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4913785Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T12:15:05.4914245Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T12:15:05.4914819Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T12:15:05.4915300Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T12:15:05.4915868Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T12:15:05.4916400Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T12:15:05.4916929Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T12:15:05.4917635Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T12:15:05.4918351Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T12:15:05.4918715Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.4920830Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.4921401Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.4922465Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4923098Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4923992Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4924718Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4925599Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4926391Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4926998Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.4928183Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4928554Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.4929455Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4929637Z ('RERUN', {'yellow': True}) [3.3874s] [100%]
2025-12-04T12:15:05.4931062Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T12:15:05.4932176Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4932608Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.4933067Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.4933586Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.4934049Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.4934638Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.4935183Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.4935792Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.4936448Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.4937012Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.4937472Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.4937997Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.4938486Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.4938986Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.4939439Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.4940103Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.4940630Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.4941194Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T12:15:05.4941731Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.4942325Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4942859Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T12:15:05.4943470Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4944015Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T12:15:05.4944587Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4945137Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T12:15:05.4945647Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.4946134Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T12:15:05.4946629Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T12:15:05.4947112Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T12:15:05.4947712Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4948288Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T12:15:05.4948867Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4949357Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T12:15:05.4949793Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T12:15:05.4950297Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T12:15:05.4950737Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T12:15:05.4951218Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T12:15:05.4951761Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T12:15:05.4952268Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T12:15:05.4952784Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T12:15:05.4953373Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4953948Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T12:15:05.4954597Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4955106Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T12:15:05.4955564Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T12:15:05.4956131Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T12:15:05.4956612Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T12:15:05.4957185Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T12:15:05.4957720Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T12:15:05.4958247Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T12:15:05.4958950Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T12:15:05.4959665Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T12:15:05.4960029Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.4962130Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.4962692Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.4963734Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.4964377Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.4965275Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.4965997Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.4966874Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.4967659Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.4968266Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.4969405Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4969774Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.4970678Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.4970854Z ('RERUN', {'yellow': True}) [0.4139s] [100%]
2025-12-04T12:15:05.4972455Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T12:15:05.4973556Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.4973984Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.4974439Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.4974952Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.4975413Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.4976033Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.4976641Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.4977235Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.4977821Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.4978376Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.4978830Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.4979353Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.4979839Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.4980371Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.4980820Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.4981478Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.4982002Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.4982608Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T12:15:05.4983121Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.4983705Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4984251Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T12:15:05.4984877Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4985428Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T12:15:05.4986003Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4986554Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T12:15:05.4987064Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.4987550Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T12:15:05.4988048Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T12:15:05.4988531Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T12:15:05.4989158Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4989696Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T12:15:05.4990272Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4990759Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T12:15:05.4991198Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T12:15:05.4991702Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T12:15:05.4992139Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T12:15:05.4992624Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T12:15:05.4993165Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T12:15:05.4993675Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T12:15:05.4994196Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T12:15:05.4994784Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.4995362Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T12:15:05.4996041Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.4996524Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T12:15:05.4996984Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T12:15:05.4997561Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T12:15:05.4998032Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T12:15:05.4998622Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T12:15:05.4999154Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T12:15:05.4999687Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T12:15:05.5000396Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T12:15:05.5001113Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T12:15:05.5001478Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.5003580Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.5004142Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.5005185Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5005826Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5006719Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5007439Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5008320Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5009101Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5009708Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.5010839Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5011211Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.5012139Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5012257Z FAILED [0.4047s] [100%]
2025-12-04T12:15:05.5012264Z 
2025-12-04T12:15:05.5012410Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.5012809Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _
2025-12-04T12:15:05.5012938Z Traceback (most recent call last):
2025-12-04T12:15:05.5013364Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.5013619Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.5014107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.5014360Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.5014884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.5015080Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.5015606Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.5015792Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.5016390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.5016733Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.5017259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.5017427Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.5017907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.5018030Z     return self._compile_to_module()
2025-12-04T12:15:05.5018528Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.5018697Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.5019217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.5019361Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.5019894Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.5020140Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.5020725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.5020856Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.5021385Z   File "/tmp/tmp6r4qtrxw/pa/cpa6pnodeunjl3urwjgabufcb3mkfpptvgmgglw2mdae6d7fherb.py", line 74, in <module>
2025-12-04T12:15:05.5021847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.5021977Z     kernel.precompile(
2025-12-04T12:15:05.5022558Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.5022680Z     self._precompile_worker()
2025-12-04T12:15:05.5023290Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.5023471Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.5024095Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5024307Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5024757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5025030Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5025478Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5025814Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5026057Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.5026714Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5026822Z ^
2025-12-04T12:15:05.5027280Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5027286Z 
2025-12-04T12:15:05.5027998Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.5028066Z 
2025-12-04T12:15:05.5028071Z 
2025-12-04T12:15:05.5028288Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.5028986Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T12:15:05.5028992Z 
2025-12-04T12:15:05.5029281Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.5029510Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5029636Z frames [('total', 1)]
2025-12-04T12:15:05.5029758Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5030225Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.5030469Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5030573Z graph_break []
2025-12-04T12:15:05.5030963Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _
2025-12-04T12:15:05.5031108Z Traceback (most recent call last):
2025-12-04T12:15:05.5031540Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.5031777Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.5032349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.5032603Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.5033133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.5033330Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.5033841Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.5034010Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.5035099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.5035444Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.5035971Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.5036122Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.5036663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.5036788Z     return self._compile_to_module()
2025-12-04T12:15:05.5037290Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.5037456Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.5037980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.5038133Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.5038631Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.5038867Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.5039477Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.5039608Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.5040124Z   File "/tmp/tmptz337sga/7z/c7zlikqiugzxlfqtxc6m6urum2mjncirlkshav32ot6wlx7nfe72.py", line 74, in <module>
2025-12-04T12:15:05.5040591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.5040819Z     kernel.precompile(
2025-12-04T12:15:05.5041392Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.5041511Z     self._precompile_worker()
2025-12-04T12:15:05.5042126Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.5042305Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.5042901Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5043119Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5043572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5043815Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5044275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5044610Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5044849Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.5045531Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5045624Z ^
2025-12-04T12:15:05.5046099Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5046105Z 
2025-12-04T12:15:05.5046810Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.5046815Z 
2025-12-04T12:15:05.5046823Z 
2025-12-04T12:15:05.5047051Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.5047767Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T12:15:05.5047774Z 
2025-12-04T12:15:05.5048061Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.5048286Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5048391Z frames [('total', 1)]
2025-12-04T12:15:05.5048559Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5049028Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.5049252Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5049364Z graph_break []
2025-12-04T12:15:05.5049584Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5049706Z frames [('total', 1)]
2025-12-04T12:15:05.5049825Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5050048Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5050529Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.5050629Z graph_break []
2025-12-04T12:15:05.5050779Z =================================== FAILURES ===================================
2025-12-04T12:15:05.5051184Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _
2025-12-04T12:15:05.5051310Z Traceback (most recent call last):
2025-12-04T12:15:05.5051749Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.5051981Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.5052502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.5052764Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.5053279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.5053475Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.5053998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.5054148Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.5054692Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.5055011Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.5055529Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.5055694Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.5056175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.5056408Z     return self._compile_to_module()
2025-12-04T12:15:05.5056939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.5057107Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.5057644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.5057777Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.5058275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.5058527Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.5059148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.5059293Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.5059791Z   File "/tmp/tmpgno1frxj/54/c545r7kzyhgpvzb52bfaz3yevfp3bk4rzevqy3tgwq7t43llz2n3.py", line 74, in <module>
2025-12-04T12:15:05.5060256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.5060417Z     kernel.precompile(
2025-12-04T12:15:05.5060975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.5061106Z     self._precompile_worker()
2025-12-04T12:15:05.5061699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.5061881Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.5062486Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5062685Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5063137Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5063395Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5063844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5064192Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5064419Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.5065068Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5065202Z ^
2025-12-04T12:15:05.5065659Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5065665Z 
2025-12-04T12:15:05.5066392Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.5066398Z 
2025-12-04T12:15:05.5066405Z 
2025-12-04T12:15:05.5066622Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.5067325Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T12:15:05.5067331Z 
2025-12-04T12:15:05.5067601Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.5067825Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5067943Z frames [('total', 1)]
2025-12-04T12:15:05.5068065Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5068533Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.5068799Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5068902Z graph_break []
2025-12-04T12:15:05.5069131Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5069238Z frames [('total', 1)]
2025-12-04T12:15:05.5069356Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5069585Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5070046Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.5070149Z graph_break []
2025-12-04T12:15:05.5070381Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5070516Z frames [('total', 1)]
2025-12-04T12:15:05.5070632Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5070868Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5071528Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.5071646Z graph_break []
2025-12-04T12:15:05.5072399Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2fae1650dec37ec0.xml -
2025-12-04T12:15:05.5072574Z =========================== short test summary info ============================
2025-12-04T12:15:05.5073421Z FAILED [0.4047s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.5074072Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5074176Z ^
2025-12-04T12:15:05.5074637Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5074644Z 
2025-12-04T12:15:05.5075352Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.5075372Z 
2025-12-04T12:15:05.5075377Z 
2025-12-04T12:15:05.5075593Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.5076282Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T12:15:05.5076336Z 
2025-12-04T12:15:05.5076615Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.5076801Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.5077020Z ================== 1 failed, 187 deselected, 2 rerun in 4.25s ==================
2025-12-04T12:15:05.5077122Z Got exit code 1
2025-12-04T12:15:05.5077736Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T12:15:05.5078160Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:05.5078630Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0893388d06071d35.xml
2025-12-04T12:15:05.5078795Z ============================= test session starts ==============================
2025-12-04T12:15:05.5079162Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.5079275Z cachedir: .pytest_cache
2025-12-04T12:15:05.5079814Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.5079942Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.5080055Z configfile: pytest.ini
2025-12-04T12:15:05.5080710Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.5080941Z collecting ... collected 188 items / 28 deselected / 160 selected
2025-12-04T12:15:05.5081088Z stepcurrent: skipping 28 already run items.
2025-12-04T12:15:05.5081223Z Running 160 items in this shard
2025-12-04T12:15:05.5081228Z 
2025-12-04T12:15:05.5082666Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.5083885Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5084320Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.5084815Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.5085337Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.5085797Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.5086351Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.5086891Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.5092296Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.5092941Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.5093506Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.5093965Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.5094564Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5095052Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.5095517Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.5095964Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T12:15:05.5096573Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T12:15:05.5097225Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.5097934Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.5098623Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.5099209Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.5099766Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T12:15:05.5100278Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.5100771Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.5101210Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T12:15:05.5101741Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.5102187Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T12:15:05.5102651Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.5103219Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.5103688Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.5104202Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.5104795Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5105375Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T12:15:05.5106033Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.5106518Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T12:15:05.5106975Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T12:15:05.5107546Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T12:15:05.5108022Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T12:15:05.5108611Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T12:15:05.5109148Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T12:15:05.5109673Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T12:15:05.5110382Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T12:15:05.5111105Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T12:15:05.5111477Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.5113886Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.5114445Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.5115524Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5116170Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5117097Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5117791Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5118682Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5119465Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5120069Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.5121231Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5121595Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.5122525Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5122671Z ('RERUN', {'yellow': True}) [3.6018s] [  0%]
2025-12-04T12:15:05.5124101Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.5125259Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5125692Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.5126147Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.5126699Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.5127162Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.5127709Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.5128252Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.5128846Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.5129477Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.5130034Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.5130485Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.5131035Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5131514Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.5131971Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.5132421Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T12:15:05.5132914Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T12:15:05.5133559Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.5134257Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.5134944Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.5135516Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.5136069Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T12:15:05.5136650Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.5137133Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.5137571Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T12:15:05.5138061Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.5138497Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T12:15:05.5138963Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.5139490Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.5140001Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.5140521Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.5141116Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5141691Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T12:15:05.5142382Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.5142869Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T12:15:05.5143323Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T12:15:05.5143891Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T12:15:05.5144359Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T12:15:05.5144943Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T12:15:05.5145475Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T12:15:05.5146005Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T12:15:05.5146708Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T12:15:05.5147414Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T12:15:05.5147791Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.5150164Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.5150751Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.5151791Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5152436Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5153360Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5154054Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5154935Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5155714Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5156357Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.5157509Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5157918Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.5158810Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5158956Z ('RERUN', {'yellow': True}) [0.6032s] [  0%]
2025-12-04T12:15:05.5160396Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.5161554Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5161986Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.5162428Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.5162957Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.5163454Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.5164000Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.5164539Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.5165137Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.5165719Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.5166275Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.5166732Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.5167245Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5167761Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.5168221Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.5168664Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T12:15:05.5169162Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T12:15:05.5169808Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.5170546Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.5171455Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.5172071Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.5172635Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T12:15:05.5173138Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.5173634Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.5174063Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T12:15:05.5174541Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.5174992Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T12:15:05.5175455Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.5175990Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.5176578Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.5177106Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.5177693Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5178270Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T12:15:05.5178921Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.5179404Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T12:15:05.5179862Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T12:15:05.5180438Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T12:15:05.5180954Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T12:15:05.5181543Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T12:15:05.5182079Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T12:15:05.5182606Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T12:15:05.5183307Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T12:15:05.5184064Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T12:15:05.5184446Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.5186821Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.5187405Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.5188455Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5189095Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5189990Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5190711Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5191599Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5192387Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5192999Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.5194146Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5194530Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.5195449Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5195572Z FAILED [0.6044s] [  0%]
2025-12-04T12:15:05.5195582Z 
2025-12-04T12:15:05.5195728Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.5196119Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _
2025-12-04T12:15:05.5196259Z Traceback (most recent call last):
2025-12-04T12:15:05.5196682Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.5196931Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.5197455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.5197707Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.5198235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.5198428Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.5198984Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.5199131Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.5199669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.5200005Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.5200526Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.5200686Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.5201166Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.5201289Z     return self._compile_to_module()
2025-12-04T12:15:05.5201785Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.5201950Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.5202466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.5202609Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.5203137Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.5203387Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.5203979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.5204107Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.5204631Z   File "/tmp/tmp1bn6eiho/ok/coklw6jhzyywldqoqlnuc7qwmoiowja3jt4to6qo2t4hp7gnezio.py", line 137, in <module>
2025-12-04T12:15:05.5205096Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.5205212Z     kernel.precompile(
2025-12-04T12:15:05.5205781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.5205904Z     self._precompile_worker()
2025-12-04T12:15:05.5206516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.5206701Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.5207329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5207544Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5207997Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5208260Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5208704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5209040Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5209285Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.5210026Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5210132Z ^
2025-12-04T12:15:05.5210596Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5210602Z 
2025-12-04T12:15:05.5211313Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.5211350Z 
2025-12-04T12:15:05.5211356Z 
2025-12-04T12:15:05.5211593Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.5212300Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T12:15:05.5212308Z 
2025-12-04T12:15:05.5212595Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.5212824Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5212932Z frames [('total', 1)]
2025-12-04T12:15:05.5213069Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5213538Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.5213787Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5213892Z graph_break []
2025-12-04T12:15:05.5214283Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _
2025-12-04T12:15:05.5214422Z Traceback (most recent call last):
2025-12-04T12:15:05.5214844Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.5215113Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.5215621Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.5215871Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.5216466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.5216665Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.5217177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.5217335Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.5217870Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.5218206Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.5218729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.5218876Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.5219411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.5219532Z     return self._compile_to_module()
2025-12-04T12:15:05.5220014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.5220194Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.5220709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.5220851Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.5221351Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.5221612Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.5222205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.5222333Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.5222846Z   File "/tmp/tmp840scpu5/zs/czsvjiyapphd5wxcf5yncv246rn65nyv2s2d6t3fzqsnes5hem2f.py", line 137, in <module>
2025-12-04T12:15:05.5223342Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.5223455Z     kernel.precompile(
2025-12-04T12:15:05.5224020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.5224138Z     self._precompile_worker()
2025-12-04T12:15:05.5224738Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.5224932Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.5225531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5225742Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5226194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5226439Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5226898Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5227231Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5227518Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.5228229Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5228321Z ^
2025-12-04T12:15:05.5228800Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5228806Z 
2025-12-04T12:15:05.5229515Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.5229524Z 
2025-12-04T12:15:05.5229529Z 
2025-12-04T12:15:05.5229756Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.5230458Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T12:15:05.5230466Z 
2025-12-04T12:15:05.5230732Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.5230970Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5231075Z frames [('total', 1)]
2025-12-04T12:15:05.5231206Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5231704Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.5231931Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5232046Z graph_break []
2025-12-04T12:15:05.5232268Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5232372Z frames [('total', 1)]
2025-12-04T12:15:05.5232501Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5232717Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5233190Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.5233321Z graph_break []
2025-12-04T12:15:05.5233470Z =================================== FAILURES ===================================
2025-12-04T12:15:05.5233880Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _
2025-12-04T12:15:05.5234005Z Traceback (most recent call last):
2025-12-04T12:15:05.5234428Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.5234704Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.5235195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.5235456Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.5235973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.5236169Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.5236689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.5236841Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.5237386Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.5237709Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.5238227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.5238388Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.5238867Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.5239023Z     return self._compile_to_module()
2025-12-04T12:15:05.5239522Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.5239689Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.5240216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.5240345Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.5240843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.5241087Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.5241674Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.5241815Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.5242321Z   File "/tmp/tmpscng8l2m/7e/c7eckh5miwnuwcfxqtk2rg3a7uh4hwkjx2zmjj4tq7p24eezw6nj.py", line 137, in <module>
2025-12-04T12:15:05.5242781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.5242906Z     kernel.precompile(
2025-12-04T12:15:05.5243495Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.5243615Z     self._precompile_worker()
2025-12-04T12:15:05.5244225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.5244406Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.5245009Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5245210Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5245686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5245945Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5246391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5246737Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5247081Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.5247790Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5247895Z ^
2025-12-04T12:15:05.5248352Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5248360Z 
2025-12-04T12:15:05.5249088Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.5249093Z 
2025-12-04T12:15:05.5249097Z 
2025-12-04T12:15:05.5249319Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.5250014Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T12:15:05.5250037Z 
2025-12-04T12:15:05.5250307Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.5250530Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5250653Z frames [('total', 1)]
2025-12-04T12:15:05.5250768Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5251266Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.5251501Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5251601Z graph_break []
2025-12-04T12:15:05.5251830Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5251936Z frames [('total', 1)]
2025-12-04T12:15:05.5252054Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5252286Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5252750Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.5252851Z graph_break []
2025-12-04T12:15:05.5253079Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5253183Z frames [('total', 1)]
2025-12-04T12:15:05.5253299Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5253530Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5253987Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.5254101Z graph_break []
2025-12-04T12:15:05.5254772Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0893388d06071d35.xml -
2025-12-04T12:15:05.5254950Z =========================== short test summary info ============================
2025-12-04T12:15:05.5255807Z FAILED [0.6044s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.5256585Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5256695Z ^
2025-12-04T12:15:05.5257193Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5257199Z 
2025-12-04T12:15:05.5257912Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.5257932Z 
2025-12-04T12:15:05.5257937Z 
2025-12-04T12:15:05.5258153Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.5258881Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T12:15:05.5258887Z 
2025-12-04T12:15:05.5259169Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.5259350Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.5259568Z ================== 1 failed, 28 deselected, 2 rerun in 4.85s ===================
2025-12-04T12:15:05.5259671Z Got exit code 1
2025-12-04T12:15:05.5259782Z Retrying single test...
2025-12-04T12:15:05.5260266Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b62e3abe6013e6ef.xml
2025-12-04T12:15:05.5260434Z ============================= test session starts ==============================
2025-12-04T12:15:05.5260788Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.5260914Z cachedir: .pytest_cache
2025-12-04T12:15:05.5261434Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.5261572Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.5261681Z configfile: pytest.ini
2025-12-04T12:15:05.5262271Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.5262542Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.5263328Z stepcurrent: skipping 28 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T12:15:05.5263446Z Running 1 items in this shard
2025-12-04T12:15:05.5263452Z 
2025-12-04T12:15:05.5264887Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.5266043Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5266506Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.5266985Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.5267523Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.5267988Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.5268522Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.5269073Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.5269693Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.5270296Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.5270857Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.5271583Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.5272102Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5272577Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.5273054Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.5273507Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T12:15:05.5274008Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T12:15:05.5274663Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.5275358Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.5276058Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.5276663Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.5277226Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T12:15:05.5277734Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.5278207Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.5278653Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T12:15:05.5279135Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.5279588Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T12:15:05.5280057Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.5280624Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.5281108Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.5281614Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.5282216Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5282795Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T12:15:05.5283479Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.5283985Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T12:15:05.5284433Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T12:15:05.5285066Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T12:15:05.5285510Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T12:15:05.5286099Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T12:15:05.5286639Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T12:15:05.5287154Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T12:15:05.5287881Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T12:15:05.5288591Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T12:15:05.5288971Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.5291351Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.5291936Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.5292979Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5293626Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5294543Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5295224Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5296119Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5296958Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5297622Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.5298780Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5299198Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.5300089Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5300226Z ('RERUN', {'yellow': True}) [3.6016s] [100%]
2025-12-04T12:15:05.5301667Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.5302818Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5303261Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.5303704Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.5304264Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.5304730Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.5305265Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.5305819Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.5306406Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.5306997Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.5307556Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.5307998Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.5308562Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5309035Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.5309513Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.5309956Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T12:15:05.5310435Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T12:15:05.5311124Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.5311818Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.5312515Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.5313073Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.5313629Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T12:15:05.5314138Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.5314608Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.5315055Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T12:15:05.5315528Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.5315980Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T12:15:05.5316444Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.5316962Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.5317483Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.5317988Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.5318591Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5319169Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T12:15:05.5319805Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.5320302Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T12:15:05.5320748Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T12:15:05.5321338Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T12:15:05.5321812Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T12:15:05.5322381Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T12:15:05.5322929Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T12:15:05.5323445Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T12:15:05.5324229Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T12:15:05.5324937Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T12:15:05.5325317Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.5327703Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.5328290Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.5329332Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5329972Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5330865Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5331581Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5332478Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5333253Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5333877Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.5335034Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5335418Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.5336410Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5336553Z ('RERUN', {'yellow': True}) [0.6071s] [100%]
2025-12-04T12:15:05.5337990Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.5339167Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5339618Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.5340067Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.5340636Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.5341096Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.5341629Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.5342188Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.5342772Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.5343365Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.5343921Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.5344364Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.5344892Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5345397Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.5345867Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.5346312Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T12:15:05.5346793Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T12:15:05.5347452Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.5348138Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.5348838Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.5349392Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.5349952Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T12:15:05.5350464Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.5350934Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.5351376Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T12:15:05.5351896Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.5352345Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T12:15:05.5352813Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.5353330Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.5353846Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.5354351Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.5354947Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5355531Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T12:15:05.5356171Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.5356666Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T12:15:05.5357111Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T12:15:05.5357699Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T12:15:05.5358140Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T12:15:05.5358747Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T12:15:05.5359293Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T12:15:05.5359806Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T12:15:05.5360521Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T12:15:05.5361225Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T12:15:05.5361653Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.5364065Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.5364617Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.5365697Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5366346Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5367242Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5367953Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5368847Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5369620Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5370243Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.5371572Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5371958Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.5372921Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5373027Z FAILED [0.6075s] [100%]
2025-12-04T12:15:05.5373033Z 
2025-12-04T12:15:05.5373196Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.5373591Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _
2025-12-04T12:15:05.5373731Z Traceback (most recent call last):
2025-12-04T12:15:05.5374158Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.5374392Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.5374894Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.5375145Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.5375659Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.5375868Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.5376506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.5376674Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.5377208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.5377533Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.5378067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.5378220Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.5378721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.5378889Z     return self._compile_to_module()
2025-12-04T12:15:05.5379378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.5379559Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.5380078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.5380257Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.5380770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.5381003Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.5381603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.5381735Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.5382220Z   File "/tmp/tmp3ow_8pip/nu/cnuadydymofs7cfakozh54wylilyzojo24q3sbdpr4da73bj2m6f.py", line 137, in <module>
2025-12-04T12:15:05.5382701Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.5382815Z     kernel.precompile(
2025-12-04T12:15:05.5383385Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.5383512Z     self._precompile_worker()
2025-12-04T12:15:05.5384107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.5384305Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.5384899Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5385134Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5385598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5385848Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5386311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5386651Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5386879Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.5387601Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5387694Z ^
2025-12-04T12:15:05.5388169Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5388178Z 
2025-12-04T12:15:05.5388891Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.5388927Z 
2025-12-04T12:15:05.5388933Z 
2025-12-04T12:15:05.5389167Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.5389867Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T12:15:05.5389876Z 
2025-12-04T12:15:05.5390148Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.5390394Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5390504Z frames [('total', 1)]
2025-12-04T12:15:05.5390623Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5391134Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.5391361Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5391478Z graph_break []
2025-12-04T12:15:05.5391873Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _
2025-12-04T12:15:05.5392001Z Traceback (most recent call last):
2025-12-04T12:15:05.5392470Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.5392701Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.5393191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.5393458Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.5393972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.5394178Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.5394689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.5394836Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.5395383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.5395704Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.5396233Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.5396379Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.5396891Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.5397027Z     return self._compile_to_module()
2025-12-04T12:15:05.5397511Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.5397688Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.5398203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.5398334Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.5398847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.5399079Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.5399661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.5399802Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.5400287Z   File "/tmp/tmp81s24w_8/j7/cj7zein4hfa3cnabppz3mxtx4kpx22bzur7ibnf2tol37wabiyvh.py", line 137, in <module>
2025-12-04T12:15:05.5400791Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.5400906Z     kernel.precompile(
2025-12-04T12:15:05.5401459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.5401593Z     self._precompile_worker()
2025-12-04T12:15:05.5402189Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.5402385Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.5402977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5403184Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5403677Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5403927Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5404366Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5404744Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5404971Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.5405691Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5405783Z ^
2025-12-04T12:15:05.5406239Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5406245Z 
2025-12-04T12:15:05.5406970Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.5406978Z 
2025-12-04T12:15:05.5406982Z 
2025-12-04T12:15:05.5407199Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.5407908Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T12:15:05.5407916Z 
2025-12-04T12:15:05.5408185Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.5408421Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5408526Z frames [('total', 1)]
2025-12-04T12:15:05.5408679Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5409158Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.5409380Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5409480Z graph_break []
2025-12-04T12:15:05.5409714Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5409819Z frames [('total', 1)]
2025-12-04T12:15:05.5409935Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5410171Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5410631Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.5410745Z graph_break []
2025-12-04T12:15:05.5410893Z =================================== FAILURES ===================================
2025-12-04T12:15:05.5411286Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _
2025-12-04T12:15:05.5411425Z Traceback (most recent call last):
2025-12-04T12:15:05.5411851Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.5412100Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.5412620Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.5412870Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.5413394Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.5413587Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.5414094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.5414258Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.5414820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.5415155Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.5415680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.5415828Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.5416431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.5416557Z     return self._compile_to_module()
2025-12-04T12:15:05.5417056Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.5417224Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.5417743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.5417899Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.5418395Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.5418628Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.5419236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.5419366Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.5419882Z   File "/tmp/tmp0ive0a57/am/cam4c5zjqj46cwwdvx4zna4nvoqzxflr655jvqum53fitaq3jjdf.py", line 137, in <module>
2025-12-04T12:15:05.5420347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.5420514Z     kernel.precompile(
2025-12-04T12:15:05.5421082Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.5421201Z     self._precompile_worker()
2025-12-04T12:15:05.5421807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.5421986Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.5422578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5422793Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5423243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5423489Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5423945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5424285Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5424525Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.5425265Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5425359Z ^
2025-12-04T12:15:05.5425830Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5425836Z 
2025-12-04T12:15:05.5426542Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.5426551Z 
2025-12-04T12:15:05.5426555Z 
2025-12-04T12:15:05.5426785Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.5427511Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T12:15:05.5427518Z 
2025-12-04T12:15:05.5427804Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.5428026Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5428163Z frames [('total', 1)]
2025-12-04T12:15:05.5428294Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5428754Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.5428976Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5429089Z graph_break []
2025-12-04T12:15:05.5429308Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5429431Z frames [('total', 1)]
2025-12-04T12:15:05.5429546Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5429766Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5430239Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.5430340Z graph_break []
2025-12-04T12:15:05.5430559Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5430678Z frames [('total', 1)]
2025-12-04T12:15:05.5430793Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5431010Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5431477Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.5431575Z graph_break []
2025-12-04T12:15:05.5432279Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b62e3abe6013e6ef.xml -
2025-12-04T12:15:05.5432459Z =========================== short test summary info ============================
2025-12-04T12:15:05.5433298Z FAILED [0.6075s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.5434017Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5434110Z ^
2025-12-04T12:15:05.5434582Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5434587Z 
2025-12-04T12:15:05.5435291Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.5435300Z 
2025-12-04T12:15:05.5435304Z 
2025-12-04T12:15:05.5435539Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.5436270Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T12:15:05.5436276Z 
2025-12-04T12:15:05.5436543Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.5436740Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.5436945Z ================== 1 failed, 187 deselected, 2 rerun in 4.86s ==================
2025-12-04T12:15:05.5437056Z Got exit code 1
2025-12-04T12:15:05.5437164Z Retrying single test...
2025-12-04T12:15:05.5437637Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0fe50dbde6f69754.xml
2025-12-04T12:15:05.5437817Z ============================= test session starts ==============================
2025-12-04T12:15:05.5438196Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.5438310Z cachedir: .pytest_cache
2025-12-04T12:15:05.5438843Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.5438971Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.5439092Z configfile: pytest.ini
2025-12-04T12:15:05.5439716Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.5439956Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.5440750Z stepcurrent: skipping 28 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T12:15:05.5440870Z Running 1 items in this shard
2025-12-04T12:15:05.5440874Z 
2025-12-04T12:15:05.5442968Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.5444125Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5444575Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.5445024Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.5445601Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.5446082Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.5446617Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.5447172Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.5447758Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.5448349Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.5448926Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.5449373Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.5449941Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5450417Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.5450876Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.5451337Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T12:15:05.5451824Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T12:15:05.5452597Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.5453294Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.5453985Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.5454657Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.5455244Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T12:15:05.5455773Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.5456246Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.5456790Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T12:15:05.5457275Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.5457712Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T12:15:05.5458191Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.5458709Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.5459243Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.5459750Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.5460340Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5460933Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T12:15:05.5461574Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.5462076Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T12:15:05.5462531Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T12:15:05.5463139Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T12:15:05.5463591Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T12:15:05.5464166Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T12:15:05.5464711Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T12:15:05.5465224Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T12:15:05.5465957Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T12:15:05.5466678Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T12:15:05.5467040Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.5469481Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.5470020Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.5471364Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5472502Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5474310Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5475454Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5476358Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5477131Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5477738Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.5478899Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5479319Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.5480223Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5480360Z ('RERUN', {'yellow': True}) [3.5989s] [100%]
2025-12-04T12:15:05.5481796Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.5482993Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5483426Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.5483885Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.5484450Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.5484923Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.5485457Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.5486013Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.5486600Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.5487183Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.5487758Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.5488199Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.5488761Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5489238Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.5489702Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.5490160Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T12:15:05.5490639Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T12:15:05.5491293Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.5491984Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.5492671Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.5493249Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.5493803Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T12:15:05.5494322Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.5494789Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.5495224Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T12:15:05.5495751Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.5496193Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T12:15:05.5496749Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.5497311Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.5497785Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.5498307Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.5498898Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5499498Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T12:15:05.5500137Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.5500636Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T12:15:05.5501419Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T12:15:05.5501995Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T12:15:05.5502500Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T12:15:05.5503082Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T12:15:05.5503632Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T12:15:05.5504148Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T12:15:05.5504857Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T12:15:05.5505580Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T12:15:05.5505955Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.5508378Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.5508918Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.5510012Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5510645Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5511555Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5512276Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5513170Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5513948Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5514553Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.5515717Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5516082Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.5517026Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5517162Z ('RERUN', {'yellow': True}) [0.6077s] [100%]
2025-12-04T12:15:05.5518611Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.5519755Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5520188Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.5520651Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.5521209Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.5521687Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.5522221Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.5522760Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.5523348Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.5523992Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.5524567Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.5525010Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.5525583Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5526056Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.5526513Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.5526975Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T12:15:05.5527460Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T12:15:05.5528121Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.5528810Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.5529497Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.5530032Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.5530625Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T12:15:05.5531145Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.5531611Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.5532042Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T12:15:05.5532532Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.5532968Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T12:15:05.5533447Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.5533967Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.5534474Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.5534995Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.5535585Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5536171Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T12:15:05.5536883Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.5537422Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T12:15:05.5537873Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T12:15:05.5538450Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T12:15:05.5538941Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T12:15:05.5539516Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T12:15:05.5540065Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T12:15:05.5540586Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T12:15:05.5541292Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T12:15:05.5542006Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T12:15:05.5542372Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.5544772Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.5545351Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.5546410Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5547036Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5547952Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5548665Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5549557Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5550337Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5550982Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.5552150Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5552519Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.5553459Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5553565Z FAILED [0.6071s] [100%]
2025-12-04T12:15:05.5553572Z 
2025-12-04T12:15:05.5553719Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.5554130Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _
2025-12-04T12:15:05.5554261Z Traceback (most recent call last):
2025-12-04T12:15:05.5554702Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.5554944Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.5555435Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.5555702Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.5556215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.5556424Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.5556937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.5557126Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.5557678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.5558001Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.5558523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.5558691Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.5559174Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.5559312Z     return self._compile_to_module()
2025-12-04T12:15:05.5559797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.5559965Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.5560500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.5560634Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.5561184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.5561419Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.5562007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.5562153Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.5562667Z   File "/tmp/tmpe4lf01oz/ep/cepmldjvpqzthlxmqq3znbbpq4dlfr2pccils4j5vyxsk46xyz34.py", line 137, in <module>
2025-12-04T12:15:05.5563131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.5563265Z     kernel.precompile(
2025-12-04T12:15:05.5563856Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.5563993Z     self._precompile_worker()
2025-12-04T12:15:05.5564591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.5564773Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.5565417Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5565617Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5566081Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5566331Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5566775Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5567123Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5567353Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.5568077Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5568171Z ^
2025-12-04T12:15:05.5568629Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5568635Z 
2025-12-04T12:15:05.5569359Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.5569410Z 
2025-12-04T12:15:05.5569415Z 
2025-12-04T12:15:05.5569637Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.5570348Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T12:15:05.5570356Z 
2025-12-04T12:15:05.5570628Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.5570855Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5571165Z frames [('total', 1)]
2025-12-04T12:15:05.5571288Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5571770Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.5571994Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5572098Z graph_break []
2025-12-04T12:15:05.5572508Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _
2025-12-04T12:15:05.5572635Z Traceback (most recent call last):
2025-12-04T12:15:05.5573058Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.5573401Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.5573891Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.5574157Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.5574667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.5574860Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.5575390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.5575540Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.5576122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.5576533Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.5577055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.5577276Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.5577756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.5577881Z     return self._compile_to_module()
2025-12-04T12:15:05.5578379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.5578546Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.5579076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.5579209Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.5579709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.5579956Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.5580546Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.5580690Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.5581197Z   File "/tmp/tmp6vhdluxb/af/cafmv2us7q56iwaosl5v5h7evchdxekfd3w3uckrh4vrz5p6mlss.py", line 137, in <module>
2025-12-04T12:15:05.5581661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.5581836Z     kernel.precompile(
2025-12-04T12:15:05.5582398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.5582517Z     self._precompile_worker()
2025-12-04T12:15:05.5583129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.5583309Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.5583913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5584110Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5584561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5584823Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5585268Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5585614Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5585877Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.5586589Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5586696Z ^
2025-12-04T12:15:05.5587154Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5587160Z 
2025-12-04T12:15:05.5587881Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.5587889Z 
2025-12-04T12:15:05.5587895Z 
2025-12-04T12:15:05.5588148Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.5588850Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T12:15:05.5588858Z 
2025-12-04T12:15:05.5589141Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.5589364Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5589518Z frames [('total', 1)]
2025-12-04T12:15:05.5589637Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5590103Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.5590340Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5590448Z graph_break []
2025-12-04T12:15:05.5590667Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5590787Z frames [('total', 1)]
2025-12-04T12:15:05.5590906Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5591140Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5591605Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.5591708Z graph_break []
2025-12-04T12:15:05.5591868Z =================================== FAILURES ===================================
2025-12-04T12:15:05.5592267Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _
2025-12-04T12:15:05.5592394Z Traceback (most recent call last):
2025-12-04T12:15:05.5592832Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.5593068Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.5593622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.5593869Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.5594811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.5595025Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.5595533Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.5595699Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.5596234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.5596557Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.5597095Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.5597245Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.5597779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.5597915Z     return self._compile_to_module()
2025-12-04T12:15:05.5598398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.5598576Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.5599092Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.5599221Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.5599729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.5599965Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.5600595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.5600726Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.5601231Z   File "/tmp/tmp74h9pahr/3r/c3rhqx7dwe25fhfgmap3vvnziwycxz4cgpkgvh7ubpuf7rc225wi.py", line 137, in <module>
2025-12-04T12:15:05.5601708Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.5601857Z     kernel.precompile(
2025-12-04T12:15:05.5602413Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.5602545Z     self._precompile_worker()
2025-12-04T12:15:05.5603142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.5603338Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.5603930Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5604130Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5604591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5605012Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5605470Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5605805Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5606031Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.5606814Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5606908Z ^
2025-12-04T12:15:05.5607366Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5607387Z 
2025-12-04T12:15:05.5608099Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.5608107Z 
2025-12-04T12:15:05.5608112Z 
2025-12-04T12:15:05.5608329Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.5609038Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T12:15:05.5609048Z 
2025-12-04T12:15:05.5609316Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.5609551Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5609658Z frames [('total', 1)]
2025-12-04T12:15:05.5609776Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5610291Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.5610514Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5610618Z graph_break []
2025-12-04T12:15:05.5610855Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5610962Z frames [('total', 1)]
2025-12-04T12:15:05.5611093Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5611311Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5611770Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.5611888Z graph_break []
2025-12-04T12:15:05.5612141Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5612246Z frames [('total', 1)]
2025-12-04T12:15:05.5612382Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5612601Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5613069Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.5613203Z graph_break []
2025-12-04T12:15:05.5613856Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0fe50dbde6f69754.xml -
2025-12-04T12:15:05.5614043Z =========================== short test summary info ============================
2025-12-04T12:15:05.5614873Z FAILED [0.6071s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.5615600Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.5615691Z ^
2025-12-04T12:15:05.5616150Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5616156Z 
2025-12-04T12:15:05.5616959Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.5616966Z 
2025-12-04T12:15:05.5616971Z 
2025-12-04T12:15:05.5617190Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.5617897Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T12:15:05.5617969Z 
2025-12-04T12:15:05.5618240Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.5618423Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.5618644Z ================== 1 failed, 187 deselected, 2 rerun in 4.86s ==================
2025-12-04T12:15:05.5618745Z Got exit code 1
2025-12-04T12:15:05.5619381Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T12:15:05.5619798Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:05.5620268Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1381d94cd6abec18.xml
2025-12-04T12:15:05.5620465Z ============================= test session starts ==============================
2025-12-04T12:15:05.5620823Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.5620953Z cachedir: .pytest_cache
2025-12-04T12:15:05.5621474Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.5621638Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.5621764Z configfile: pytest.ini
2025-12-04T12:15:05.5622357Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.5622589Z collecting ... collected 188 items / 29 deselected / 159 selected
2025-12-04T12:15:05.5622749Z stepcurrent: skipping 29 already run items.
2025-12-04T12:15:05.5622867Z Running 159 items in this shard
2025-12-04T12:15:05.5622872Z 
2025-12-04T12:15:05.5624264Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.5625363Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.5625870Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T12:15:05.5626329Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.5626790Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.5627348Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.5627891Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.5628496Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.5628989Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.5629545Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.5630012Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.5630445Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.5631101Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.5631690Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.5632299Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.5632895Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.5633429Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.5633972Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5634463Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.5634992Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.5635459Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.5636270Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.5636817Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.5637406Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5638171Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.5638776Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.5639208Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.5639872Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.5640489Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.5641182Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.5641884Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.5642385Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.5642863Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.5643335Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.5643978Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.5644537Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.5645103Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.5645680Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.5646212Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.5646754Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5647247Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.5647745Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.5648242Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.5649061Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.5649604Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.5650100Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.5650579Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.5651109Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.5651592Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.5652096Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.5652668Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.5653178Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.5653696Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.5654312Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5654896Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.5655483Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T12:15:05.5655992Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.5656536Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.5657129Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.5657627Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.5658201Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.5658762Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.5659394Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T12:15:05.5659984Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.5660529Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T12:15:05.5660893Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.5663265Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.5663811Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.5664884Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5665514Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5666418Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5667144Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5668037Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5668804Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5669425Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.5670515Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.5670881Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.5672032Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5672168Z ('RERUN', {'yellow': True}) [3.4490s] [  0%]
2025-12-04T12:15:05.5673517Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.5674608Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.5675067Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T12:15:05.5675525Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.5675989Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.5676625Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.5677171Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.5677772Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.5678264Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.5678820Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.5679330Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.5679766Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.5680380Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.5681009Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.5681633Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.5682215Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.5682747Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.5683291Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5683780Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.5684273Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.5684738Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.5685593Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.5686132Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.5686726Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5687461Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.5688059Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.5688472Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.5689129Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.5689772Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.5690456Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.5691158Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.5691650Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.5692128Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.5692634Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.5693283Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.5693804Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.5694393Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.5694975Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.5695505Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.5696043Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5696593Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.5697085Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.5697552Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.5698383Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.5698948Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.5699443Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.5699923Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.5700427Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.5700900Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.5701394Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.5701932Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.5702443Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.5702993Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.5703599Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5704183Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.5704769Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T12:15:05.5705277Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.5705777Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.5706367Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.5706819Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.5707422Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.5707973Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.5708598Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T12:15:05.5709194Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.5709746Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T12:15:05.5710118Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.5712367Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.5712951Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.5713999Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5714629Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5715532Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5716215Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5717140Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5717913Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5718533Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.5719677Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.5720058Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.5720950Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5721117Z ('RERUN', {'yellow': True}) [0.4378s] [  0%]
2025-12-04T12:15:05.5722468Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.5723554Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.5724001Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T12:15:05.5724451Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.5724911Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.5725458Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.5725999Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.5726632Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.5727122Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.5727687Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.5728136Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.5728564Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.5729179Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.5729773Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.5730396Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.5731010Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.5731543Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.5732089Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5732580Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.5733106Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.5733576Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.5734386Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.5734956Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.5735545Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5736333Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.5736950Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.5737367Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.5738016Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.5738640Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.5739330Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.5740089Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.5740587Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.5741063Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.5741540Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.5742188Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.5742714Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.5743282Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.5743861Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.5744440Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.5744972Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5745461Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.5745956Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.5746426Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.5747291Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.5747826Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.5748353Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.5748827Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.5749326Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.5749802Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.5750300Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.5750844Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.5751354Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.5751882Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.5752485Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5753093Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.5753684Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T12:15:05.5754198Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.5754662Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.5755258Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.5755713Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.5756302Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.5756844Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.5757493Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T12:15:05.5758083Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.5758628Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T12:15:05.5759000Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.5761282Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.5761859Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.5762897Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5763544Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5764441Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5765122Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5766022Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5766788Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5767447Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.5768542Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.5768922Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.5769813Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5769921Z FAILED [0.4316s] [  0%]
2025-12-04T12:15:05.5769927Z 
2025-12-04T12:15:05.5770087Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.5770493Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _
2025-12-04T12:15:05.5770631Z Traceback (most recent call last):
2025-12-04T12:15:05.5771309Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.5771549Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.5772057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.5772307Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.5772821Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.5773033Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.5773595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.5773756Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.5774296Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.5774616Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.5775199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.5775350Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.5775846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.5775970Z     return self._compile_to_module()
2025-12-04T12:15:05.5776515Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.5776697Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.5777212Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.5777346Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.5777857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.5778094Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.5778697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.5778829Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.5779329Z   File "/tmp/tmp52lu5q0o/hm/chmvvir2m2bfeysrqounc4phn2vjd2go73w5dudbgovks33bt2rm.py", line 65, in <module>
2025-12-04T12:15:05.5779862Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.5779976Z     kernel.precompile(
2025-12-04T12:15:05.5780544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.5780663Z     self._precompile_worker()
2025-12-04T12:15:05.5781260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.5781459Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.5782056Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5782269Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5782723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5782971Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5783427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5783796Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5784023Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.5784691Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.5784782Z ^
2025-12-04T12:15:05.5785247Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5785253Z 
2025-12-04T12:15:05.5785964Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.5786004Z 
2025-12-04T12:15:05.5786009Z 
2025-12-04T12:15:05.5786238Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.5786948Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T12:15:05.5786953Z 
2025-12-04T12:15:05.5787221Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.5787490Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5787597Z frames [('total', 1)]
2025-12-04T12:15:05.5787716Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5788196Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.5788421Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5788536Z graph_break []
2025-12-04T12:15:05.5788937Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _
2025-12-04T12:15:05.5789062Z Traceback (most recent call last):
2025-12-04T12:15:05.5789506Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.5789740Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.5790242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.5790490Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.5790999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.5791204Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.5791745Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.5791891Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.5792437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.5792757Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.5793285Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.5793435Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.5793913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.5794053Z     return self._compile_to_module()
2025-12-04T12:15:05.5794538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.5794719Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.5795232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.5795392Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.5795899Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.5796132Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.5796721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.5796862Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.5797358Z   File "/tmp/tmp7u5s5bo3/tx/ctxhqvm6vjk4z3jw337ib5ariseyq2rg7cjxzy3h6fn5exx6hodp.py", line 65, in <module>
2025-12-04T12:15:05.5797836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.5797980Z     kernel.precompile(
2025-12-04T12:15:05.5798537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.5798668Z     self._precompile_worker()
2025-12-04T12:15:05.5799259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.5799484Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.5800076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5800275Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5800738Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5800988Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5801431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5801782Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5802008Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.5802676Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.5802774Z ^
2025-12-04T12:15:05.5803233Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5803239Z 
2025-12-04T12:15:05.5803967Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.5804009Z 
2025-12-04T12:15:05.5804016Z 
2025-12-04T12:15:05.5804236Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.5804959Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T12:15:05.5804965Z 
2025-12-04T12:15:05.5805234Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.5805473Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5805579Z frames [('total', 1)]
2025-12-04T12:15:05.5805698Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5806175Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.5806401Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5806502Z graph_break []
2025-12-04T12:15:05.5806737Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5806841Z frames [('total', 1)]
2025-12-04T12:15:05.5806969Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5807219Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5807684Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.5807800Z graph_break []
2025-12-04T12:15:05.5807948Z =================================== FAILURES ===================================
2025-12-04T12:15:05.5808350Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _
2025-12-04T12:15:05.5808487Z Traceback (most recent call last):
2025-12-04T12:15:05.5808912Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.5809158Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.5809680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.5809946Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.5810471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.5810709Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.5811223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.5811387Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.5811922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.5812259Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.5812783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.5813454Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.5813955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.5814081Z     return self._compile_to_module()
2025-12-04T12:15:05.5814589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.5814756Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.5815277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.5815425Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.5815985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.5816217Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.5816894Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.5817024Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.5817534Z   File "/tmp/tmps8718jlj/6w/c6whfj672da2swy4iwrciepo4f2th75hpztdfso67ti34msth4rb.py", line 65, in <module>
2025-12-04T12:15:05.5818005Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.5818119Z     kernel.precompile(
2025-12-04T12:15:05.5818693Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.5818816Z     self._precompile_worker()
2025-12-04T12:15:05.5819430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.5819613Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.5820243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5820455Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5820910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5821161Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5821623Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5821959Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5822204Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.5822890Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.5822984Z ^
2025-12-04T12:15:05.5823463Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5823470Z 
2025-12-04T12:15:05.5824182Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.5824221Z 
2025-12-04T12:15:05.5824226Z 
2025-12-04T12:15:05.5824462Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.5825170Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T12:15:05.5825178Z 
2025-12-04T12:15:05.5825469Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.5825708Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5825865Z frames [('total', 1)]
2025-12-04T12:15:05.5826054Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5826526Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.5826754Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5826872Z graph_break []
2025-12-04T12:15:05.5827090Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5827208Z frames [('total', 1)]
2025-12-04T12:15:05.5827326Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5827543Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5828063Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.5828164Z graph_break []
2025-12-04T12:15:05.5828382Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.5828499Z frames [('total', 1)]
2025-12-04T12:15:05.5828616Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.5828834Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.5829301Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.5829402Z graph_break []
2025-12-04T12:15:05.5830067Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1381d94cd6abec18.xml -
2025-12-04T12:15:05.5830242Z =========================== short test summary info ============================
2025-12-04T12:15:05.5831098Z FAILED [0.4316s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.5831797Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.5831888Z ^
2025-12-04T12:15:05.5832359Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5832368Z 
2025-12-04T12:15:05.5833076Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.5833082Z 
2025-12-04T12:15:05.5833087Z 
2025-12-04T12:15:05.5833316Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.5834022Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T12:15:05.5834063Z 
2025-12-04T12:15:05.5834333Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.5834531Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.5834735Z ================== 1 failed, 29 deselected, 2 rerun in 4.36s ===================
2025-12-04T12:15:05.5834848Z Got exit code 1
2025-12-04T12:15:05.5834992Z Retrying single test...
2025-12-04T12:15:05.5835467Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2b9aebe063e8f7ef.xml
2025-12-04T12:15:05.5835649Z ============================= test session starts ==============================
2025-12-04T12:15:05.5836000Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.5836115Z cachedir: .pytest_cache
2025-12-04T12:15:05.5836654Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.5836780Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.5836901Z configfile: pytest.ini
2025-12-04T12:15:05.5837497Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.5837720Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.5838523Z stepcurrent: skipping 29 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T12:15:05.5838640Z Running 1 items in this shard
2025-12-04T12:15:05.5838645Z 
2025-12-04T12:15:05.5840002Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.5841134Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.5841574Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T12:15:05.5842038Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.5842499Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.5843043Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.5843590Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.5844223Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.5844714Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.5845277Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.5845738Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.5846166Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.5846813Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.5847401Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.5848009Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.5848639Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.5849170Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.5849709Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5850204Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.5850685Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.5851166Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.5851981Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.5852521Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.5853110Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5853893Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.5854501Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.5854900Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.5855572Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.5856191Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.5857244Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.5858018Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.5858502Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.5858997Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.5859469Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.5860116Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.5860675Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.5861243Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.5861821Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.5862400Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.5862937Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5863427Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.5863925Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.5864391Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.5865209Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.5865752Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.5866249Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.5866723Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.5867333Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.5867794Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.5868308Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.5868843Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.5869352Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.5869872Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.5870470Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5871247Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.5871916Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T12:15:05.5872437Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.5872900Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.5873490Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.5873947Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.5874563Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.5875120Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.5875747Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T12:15:05.5876376Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.5876922Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T12:15:05.5877286Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.5879558Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.5880093Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.5881198Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5881836Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5882736Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5883416Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5884309Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5885083Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5885738Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.5886834Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.5887202Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.5888103Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5888272Z ('RERUN', {'yellow': True}) [3.4101s] [100%]
2025-12-04T12:15:05.5889642Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.5890752Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.5891201Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T12:15:05.5891655Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.5892120Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.5892671Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.5893211Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.5893808Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.5898990Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.5899704Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.5900180Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.5900619Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.5901239Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.5901832Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.5902445Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.5903041Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.5903577Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.5904172Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5904664Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.5905148Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.5905630Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.5906441Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.5907020Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.5907612Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5908336Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.5908989Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.5909389Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.5910060Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.5910682Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.5911370Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.5912074Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.5912554Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.5913082Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.5913559Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.5914207Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.5914743Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.5915291Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.5915888Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.5916416Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.5916960Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5917479Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.5917977Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.5918442Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.5919259Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.5919808Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.5920362Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.5920840Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.5921342Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.5921834Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.5922342Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.5922876Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.5923386Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.5923909Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.5924501Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5925094Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.5925677Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T12:15:05.5926187Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.5926692Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.5927269Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.5927737Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.5928311Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.5928866Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.5929488Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T12:15:05.5930083Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.5930662Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T12:15:05.5931025Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.5933310Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.5933845Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.5934908Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5935565Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5936552Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5937242Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5938135Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5938912Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5939524Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.5940624Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.5941032Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.5941944Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5942081Z ('RERUN', {'yellow': True}) [0.4382s] [100%]
2025-12-04T12:15:05.5943448Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.5944533Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.5945012Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T12:15:05.5945465Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.5945929Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.5946470Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.5947007Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.5947633Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.5948129Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.5948684Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.5949147Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.5949608Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.5950219Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.5950807Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.5951413Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.5952006Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.5952533Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.5953068Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5953557Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.5954067Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.5954549Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.5955364Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.5955901Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.5956486Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5957218Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.5957828Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.5958257Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.5958928Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.5959547Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.5960229Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.5960960Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.5961439Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.5961930Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.5962403Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.5963079Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.5963601Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.5964155Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.5964737Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.5965266Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.5965801Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.5966287Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.5966775Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.5967275Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.5968490Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.5969039Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.5969534Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.5970020Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.5970522Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.5971150Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.5971663Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.5972303Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.5972988Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.5973513Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.5974106Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.5974701Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.5975348Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T12:15:05.5975863Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.5976390Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.5977039Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.5977497Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.5978069Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.5978624Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.5979253Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T12:15:05.5979839Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.5980387Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T12:15:05.5980744Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.5983020Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.5983606Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.5984664Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.5985295Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.5986227Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.5986908Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.5987804Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.5988577Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.5989228Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.5990320Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.5990716Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.5991627Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.5991733Z FAILED [0.4366s] [100%]
2025-12-04T12:15:05.5991742Z 
2025-12-04T12:15:05.5991899Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.5992302Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _
2025-12-04T12:15:05.5992430Z Traceback (most recent call last):
2025-12-04T12:15:05.5992871Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.5993106Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.5993610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.5993860Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.5994374Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.5994585Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.5995131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.5995283Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.5995834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.5996153Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.5996682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.5996831Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.5997312Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.5997445Z     return self._compile_to_module()
2025-12-04T12:15:05.5997930Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.5998108Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.5998618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.5998781Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.5999291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.5999524Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.6000111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.6000250Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.6000751Z   File "/tmp/tmp04b30vau/b5/cb5uctpdchzyz4l3v67hhpm7kst32xrpmiutmqx7ct3b3ewapdmd.py", line 65, in <module>
2025-12-04T12:15:05.6001224Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.6001374Z     kernel.precompile(
2025-12-04T12:15:05.6001934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.6002066Z     self._precompile_worker()
2025-12-04T12:15:05.6002664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.6002891Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.6003487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6003685Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6004150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6004403Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6004849Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6005199Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6005429Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.6006093Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6006188Z ^
2025-12-04T12:15:05.6006648Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6006658Z 
2025-12-04T12:15:05.6007381Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.6007421Z 
2025-12-04T12:15:05.6007426Z 
2025-12-04T12:15:05.6007647Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.6008374Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T12:15:05.6008379Z 
2025-12-04T12:15:05.6008652Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.6008899Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6009007Z frames [('total', 1)]
2025-12-04T12:15:05.6009127Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6009607Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.6009835Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6009935Z graph_break []
2025-12-04T12:15:05.6010354Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _
2025-12-04T12:15:05.6010480Z Traceback (most recent call last):
2025-12-04T12:15:05.6010948Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.6011184Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.6011671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.6011936Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.6012451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.6012644Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.6013173Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.6013348Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.6013896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.6014219Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.6014739Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.6014930Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.6015410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.6015547Z     return self._compile_to_module()
2025-12-04T12:15:05.6016031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.6016203Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.6016807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.6016937Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.6017453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.6017683Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.6018275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.6018416Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.6018909Z   File "/tmp/tmp0ofaig06/ba/cbajx6g5cj6ofn6vfid3h3la7c4yxxeb3l7voh32dnrorf2hallf.py", line 65, in <module>
2025-12-04T12:15:05.6019421Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.6019554Z     kernel.precompile(
2025-12-04T12:15:05.6020103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.6020236Z     self._precompile_worker()
2025-12-04T12:15:05.6020831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.6021012Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.6021617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6021821Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6022283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6022529Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6022980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6023357Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6023586Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.6024232Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6024335Z ^
2025-12-04T12:15:05.6024797Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6024803Z 
2025-12-04T12:15:05.6025523Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.6025532Z 
2025-12-04T12:15:05.6025537Z 
2025-12-04T12:15:05.6025788Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.6026521Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T12:15:05.6026527Z 
2025-12-04T12:15:05.6026796Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.6027052Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6027178Z frames [('total', 1)]
2025-12-04T12:15:05.6027294Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6027765Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.6027998Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6028100Z graph_break []
2025-12-04T12:15:05.6028335Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6028442Z frames [('total', 1)]
2025-12-04T12:15:05.6028558Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6028789Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6029249Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.6029351Z graph_break []
2025-12-04T12:15:05.6029513Z =================================== FAILURES ===================================
2025-12-04T12:15:05.6029911Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _
2025-12-04T12:15:05.6030049Z Traceback (most recent call last):
2025-12-04T12:15:05.6030470Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.6030735Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.6031237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.6031487Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.6032010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.6032204Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.6032715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.6032875Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.6033406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.6033729Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.6034264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.6034411Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.6034937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.6035061Z     return self._compile_to_module()
2025-12-04T12:15:05.6035544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.6035726Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.6036239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.6036382Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.6036880Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.6037144Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.6037741Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.6037871Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.6038362Z   File "/tmp/tmpbiarnudx/d6/cd6z47mesl7pokl4t3s4opldztddi63mkv53jbtnhrypqh7tb67y.py", line 65, in <module>
2025-12-04T12:15:05.6038868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.6038979Z     kernel.precompile(
2025-12-04T12:15:05.6039542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.6039658Z     self._precompile_worker()
2025-12-04T12:15:05.6040259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.6040455Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.6041047Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6041255Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6041699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6041944Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6042395Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6042727Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6042985Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.6043650Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6043737Z ^
2025-12-04T12:15:05.6044211Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6044217Z 
2025-12-04T12:15:05.6044926Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.6044934Z 
2025-12-04T12:15:05.6044939Z 
2025-12-04T12:15:05.6045167Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.6045874Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T12:15:05.6045882Z 
2025-12-04T12:15:05.6046148Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.6046382Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6046485Z frames [('total', 1)]
2025-12-04T12:15:05.6046610Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6047108Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.6047329Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6047444Z graph_break []
2025-12-04T12:15:05.6047661Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6047762Z frames [('total', 1)]
2025-12-04T12:15:05.6047891Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6048105Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6048569Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.6048678Z graph_break []
2025-12-04T12:15:05.6048924Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6049040Z frames [('total', 1)]
2025-12-04T12:15:05.6049155Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6049374Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6049841Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.6049970Z graph_break []
2025-12-04T12:15:05.6050627Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2b9aebe063e8f7ef.xml -
2025-12-04T12:15:05.6050812Z =========================== short test summary info ============================
2025-12-04T12:15:05.6051662Z FAILED [0.4366s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.6052318Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6052409Z ^
2025-12-04T12:15:05.6052867Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6052887Z 
2025-12-04T12:15:05.6053594Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.6053600Z 
2025-12-04T12:15:05.6053604Z 
2025-12-04T12:15:05.6053819Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.6054536Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T12:15:05.6055078Z 
2025-12-04T12:15:05.6055356Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.6055552Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.6055758Z ================== 1 failed, 187 deselected, 2 rerun in 4.33s ==================
2025-12-04T12:15:05.6055860Z Got exit code 1
2025-12-04T12:15:05.6055985Z Retrying single test...
2025-12-04T12:15:05.6056575Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-649ba93d0ac5919c.xml
2025-12-04T12:15:05.6056741Z ============================= test session starts ==============================
2025-12-04T12:15:05.6057112Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.6057222Z cachedir: .pytest_cache
2025-12-04T12:15:05.6057757Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.6057885Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.6057993Z configfile: pytest.ini
2025-12-04T12:15:05.6058646Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.6058868Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.6059676Z stepcurrent: skipping 29 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T12:15:05.6059795Z Running 1 items in this shard
2025-12-04T12:15:05.6059801Z 
2025-12-04T12:15:05.6061141Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.6062285Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6062720Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T12:15:05.6063229Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.6063687Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.6064230Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.6064778Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.6065356Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.6065861Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.6066414Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.6066876Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.6067308Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.6067941Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.6068536Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.6069147Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.6069736Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6070282Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6070812Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6071527Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6072013Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6072565Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.6073392Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6073935Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.6074528Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6075299Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.6075926Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.6076329Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.6077110Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.6077729Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.6078405Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.6079124Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.6079602Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.6080095Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.6080572Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.6081225Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.6081794Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.6082339Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.6082941Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6083477Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6084017Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6084510Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6084992Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6085478Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.6086331Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6086878Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.6087378Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.6087842Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.6088396Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.6088854Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.6089365Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.6089903Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.6090430Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.6090967Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.6091564Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6092161Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.6092750Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T12:15:05.6093261Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.6093730Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.6094311Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.6094820Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.6095398Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.6095959Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.6096651Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T12:15:05.6097229Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.6097788Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T12:15:05.6098149Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.6100484Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.6101022Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.6102119Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6102753Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6103658Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6104365Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6105251Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6106039Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6106650Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.6107748Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6108116Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.6109022Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6109201Z ('RERUN', {'yellow': True}) [3.4173s] [100%]
2025-12-04T12:15:05.6110550Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.6111642Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6112081Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T12:15:05.6112549Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.6113014Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.6113600Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.6114145Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.6114732Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.6115241Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.6115793Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.6116307Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.6116739Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.6117339Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.6117976Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.6118579Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.6119165Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6119704Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6120244Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6120733Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6121217Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6121695Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.6122507Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6123085Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.6123677Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6124397Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.6125011Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.6125413Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.6126087Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.6126707Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.6127429Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.6128134Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.6128616Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.6129105Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.6129616Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.6130269Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.6130791Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.6131373Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.6131965Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6132500Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6133043Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6133530Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6134010Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6134486Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.6135303Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6135845Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.6136436Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.6136919Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.6137420Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.6137878Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.6138389Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.6138927Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.6139442Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.6139961Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.6140599Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6141191Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.6141781Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T12:15:05.6142289Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.6142794Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.6143371Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.6143840Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.6144410Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.6144995Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.6145624Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T12:15:05.6146204Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.6146763Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T12:15:05.6147126Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.6149382Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.6149962Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.6151021Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6151648Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6152552Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6153231Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6154159Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6154932Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6155548Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.6156654Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6157056Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.6157964Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6158099Z ('RERUN', {'yellow': True}) [0.4300s] [100%]
2025-12-04T12:15:05.6159480Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.6160565Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6161001Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T12:15:05.6161465Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.6161923Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.6162470Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.6163008Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.6163631Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.6164142Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.6164699Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.6165165Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.6165598Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.6166195Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.6166801Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.6167417Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.6168060Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6168593Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6169140Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6169629Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6170112Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6170622Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.6171625Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6172164Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.6172842Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6173563Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.6174189Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.6174591Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.6175255Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.6175878Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.6176637Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.6177397Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.6177881Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.6178372Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.6178844Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.6179493Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.6180018Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.6180570Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.6181170Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6181741Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6182290Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6182780Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6183258Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6183740Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.6184612Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6185160Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.6185660Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.6186175Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.6186678Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.6187133Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.6187650Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.6188193Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.6188702Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.6189228Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.6189823Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6190417Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.6191045Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T12:15:05.6191557Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.6192022Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.6192606Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.6193081Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.6193662Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.6194219Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.6194879Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T12:15:05.6195469Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.6196020Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T12:15:05.6196383Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.6198699Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.6199260Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.6200318Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6200947Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6201859Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6202535Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6203430Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6204208Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6204850Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.6205961Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6206329Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.6207240Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6207413Z FAILED [0.4309s] [100%]
2025-12-04T12:15:05.6207419Z 
2025-12-04T12:15:05.6207584Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.6207991Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _
2025-12-04T12:15:05.6208119Z Traceback (most recent call last):
2025-12-04T12:15:05.6208589Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.6208827Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.6209316Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.6209581Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.6210092Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.6210297Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.6210808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.6210983Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.6211531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.6211853Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.6212385Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.6212573Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.6213051Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.6213187Z     return self._compile_to_module()
2025-12-04T12:15:05.6213670Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.6213838Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.6214369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.6214503Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.6215010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.6215242Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.6215830Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.6215973Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.6216541Z   File "/tmp/tmpfgf7f6rd/ni/cnifx2r3rbe5jbjqoggcoruaazjzi2rqlgur3m6o7zrwlznlmmgt.py", line 65, in <module>
2025-12-04T12:15:05.6217078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.6217193Z     kernel.precompile(
2025-12-04T12:15:05.6217748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.6217886Z     self._precompile_worker()
2025-12-04T12:15:05.6218482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.6218665Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.6219273Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6219473Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6219938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6220186Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6220630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6220979Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6221237Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.6221900Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6221993Z ^
2025-12-04T12:15:05.6222450Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6222456Z 
2025-12-04T12:15:05.6223182Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.6223190Z 
2025-12-04T12:15:05.6223196Z 
2025-12-04T12:15:05.6223444Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.6224168Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T12:15:05.6224174Z 
2025-12-04T12:15:05.6224441Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.6224710Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6224816Z frames [('total', 1)]
2025-12-04T12:15:05.6224935Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6225412Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.6225635Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6225738Z graph_break []
2025-12-04T12:15:05.6226157Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _
2025-12-04T12:15:05.6226283Z Traceback (most recent call last):
2025-12-04T12:15:05.6226704Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.6226956Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.6227444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.6227707Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.6228217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.6228409Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.6228929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.6229113Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.6229661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.6229984Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.6230504Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.6230670Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.6231151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.6231287Z     return self._compile_to_module()
2025-12-04T12:15:05.6231772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.6231939Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.6232471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.6232601Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.6233127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.6233374Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.6233961Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.6234103Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.6234600Z   File "/tmp/tmpmync84u0/kk/ckkub6tjslzx3ruzlurj5vywnkw4brxz65lt67c5f6jmknaelh5c.py", line 65, in <module>
2025-12-04T12:15:05.6235059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.6235187Z     kernel.precompile(
2025-12-04T12:15:05.6235772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.6235902Z     self._precompile_worker()
2025-12-04T12:15:05.6236499Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.6236680Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.6237319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6237517Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6237970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6238232Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6238676Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6239021Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6239252Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.6239899Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6240006Z ^
2025-12-04T12:15:05.6240462Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6240467Z 
2025-12-04T12:15:05.6241189Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.6241225Z 
2025-12-04T12:15:05.6241230Z 
2025-12-04T12:15:05.6241451Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.6242154Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T12:15:05.6242174Z 
2025-12-04T12:15:05.6242441Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.6242663Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6242784Z frames [('total', 1)]
2025-12-04T12:15:05.6242903Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6243368Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.6243603Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6243705Z graph_break []
2025-12-04T12:15:05.6243938Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6244045Z frames [('total', 1)]
2025-12-04T12:15:05.6244162Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6244398Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6244894Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.6244995Z graph_break []
2025-12-04T12:15:05.6245159Z =================================== FAILURES ===================================
2025-12-04T12:15:05.6245561Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _
2025-12-04T12:15:05.6245697Z Traceback (most recent call last):
2025-12-04T12:15:05.6246127Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.6246363Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.6246898Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.6247148Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.6247667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.6247876Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.6248419Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.6248587Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.6249119Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.6249441Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.6249980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.6250128Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.6250626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.6250750Z     return self._compile_to_module()
2025-12-04T12:15:05.6251232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.6251413Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.6251927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.6252059Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.6252565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.6252827Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.6253428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.6253558Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.6254032Z   File "/tmp/tmp8jd5_ty5/r5/cr5wejk7pc2vqewa74ajy7amhts2g4s63a6vitf6ahiuvth5izp7.py", line 65, in <module>
2025-12-04T12:15:05.6254509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.6254623Z     kernel.precompile(
2025-12-04T12:15:05.6255191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.6255311Z     self._precompile_worker()
2025-12-04T12:15:05.6255907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.6256102Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.6256796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6257050Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6257523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6257773Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6258232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6258569Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6258801Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.6259502Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6259597Z ^
2025-12-04T12:15:05.6260075Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6260081Z 
2025-12-04T12:15:05.6261365Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.6261438Z 
2025-12-04T12:15:05.6261443Z 
2025-12-04T12:15:05.6261667Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.6262392Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T12:15:05.6262401Z 
2025-12-04T12:15:05.6262671Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.6262917Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6263025Z frames [('total', 1)]
2025-12-04T12:15:05.6263145Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6263631Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.6263855Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6263977Z graph_break []
2025-12-04T12:15:05.6264201Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6264309Z frames [('total', 1)]
2025-12-04T12:15:05.6264442Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6264662Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6265127Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.6265279Z graph_break []
2025-12-04T12:15:05.6265503Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6265625Z frames [('total', 1)]
2025-12-04T12:15:05.6265743Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6265970Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6266447Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.6266550Z graph_break []
2025-12-04T12:15:05.6267201Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-649ba93d0ac5919c.xml -
2025-12-04T12:15:05.6267397Z =========================== short test summary info ============================
2025-12-04T12:15:05.6268254Z FAILED [0.4309s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.6268921Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6269013Z ^
2025-12-04T12:15:05.6269504Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6269510Z 
2025-12-04T12:15:05.6270234Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.6270242Z 
2025-12-04T12:15:05.6270246Z 
2025-12-04T12:15:05.6270467Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.6271373Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T12:15:05.6271383Z 
2025-12-04T12:15:05.6271727Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.6271923Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.6272132Z ================== 1 failed, 187 deselected, 2 rerun in 4.32s ==================
2025-12-04T12:15:05.6272237Z Got exit code 1
2025-12-04T12:15:05.6272882Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T12:15:05.6273343Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:05.6273955Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df60bd1ca7e6baab.xml
2025-12-04T12:15:05.6274137Z ============================= test session starts ==============================
2025-12-04T12:15:05.6274496Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.6274627Z cachedir: .pytest_cache
2025-12-04T12:15:05.6275143Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.6275272Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.6275398Z configfile: pytest.ini
2025-12-04T12:15:05.6275990Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.6276230Z collecting ... collected 188 items / 30 deselected / 158 selected
2025-12-04T12:15:05.6276375Z stepcurrent: skipping 30 already run items.
2025-12-04T12:15:05.6276491Z Running 158 items in this shard
2025-12-04T12:15:05.6276497Z 
2025-12-04T12:15:05.6277962Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.6279295Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6279742Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.6280199Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.6280663Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.6281216Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.6281760Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.6282415Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.6283003Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.6283577Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.6284034Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.6284796Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.6285342Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.6285896Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T12:15:05.6286501Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6287077Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6287603Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6288109Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6288591Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6289072Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T12:15:05.6289576Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T12:15:05.6290344Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6291053Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.6291786Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.6292330Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.6292820Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.6293287Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T12:15:05.6293781Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.6294235Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T12:15:05.6294739Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.6295270Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.6295812Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.6296420Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.6297026Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6297620Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T12:15:05.6298234Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T12:15:05.6298756Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T12:15:05.6299225Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T12:15:05.6299810Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T12:15:05.6300315Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T12:15:05.6300899Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T12:15:05.6301451Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T12:15:05.6302160Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T12:15:05.6302756Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T12:15:05.6303274Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T12:15:05.6303984Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T12:15:05.6304703Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.6307388Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.6307946Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.6308999Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6309682Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6310585Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6311281Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6312164Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6313008Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6313630Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.6314880Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6315298Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.6316195Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6316341Z ('RERUN', {'yellow': True}) [3.5129s] [  0%]
2025-12-04T12:15:05.6317779Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.6319036Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6319500Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.6319957Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.6320431Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.6320965Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.6321521Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.6322108Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.6322692Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.6323267Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.6323756Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.6324406Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.6324938Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.6325497Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T12:15:05.6326077Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6326645Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6327186Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6327677Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6328208Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6328673Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T12:15:05.6329176Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T12:15:05.6329959Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6330650Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.6331349Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.6331876Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.6332366Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.6332879Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T12:15:05.6333376Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.6333844Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T12:15:05.6334328Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.6334876Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.6335364Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.6335890Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.6336573Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6337201Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T12:15:05.6337781Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T12:15:05.6338285Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T12:15:05.6338753Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T12:15:05.6339347Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T12:15:05.6339850Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T12:15:05.6340443Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T12:15:05.6340983Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T12:15:05.6341732Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T12:15:05.6342329Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T12:15:05.6342845Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T12:15:05.6343569Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T12:15:05.6343937Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.6346563Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.6347149Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.6348205Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6348833Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6349735Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6350415Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6351349Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6352131Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6352745Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.6354047Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6354417Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.6355320Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6355486Z ('RERUN', {'yellow': True}) [0.5125s] [  0%]
2025-12-04T12:15:05.6356916Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.6358175Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6358610Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.6359081Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.6359547Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.6360096Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.6360681Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.6361266Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.6361872Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.6362429Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.6362894Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.6363529Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.6364061Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.6364628Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T12:15:05.6365264Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6365811Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6366341Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6366845Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6367331Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6367828Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T12:15:05.6368353Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T12:15:05.6369122Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6369858Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.6370543Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.6371288Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.6371796Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.6372250Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T12:15:05.6372761Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.6373217Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T12:15:05.6373702Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.6374363Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.6374854Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.6375397Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.6375993Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6376636Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T12:15:05.6377213Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T12:15:05.6377714Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T12:15:05.6378200Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T12:15:05.6378839Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T12:15:05.6379314Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T12:15:05.6379896Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T12:15:05.6380442Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T12:15:05.6381230Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T12:15:05.6381812Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T12:15:05.6382348Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T12:15:05.6383060Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T12:15:05.6383468Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.6386121Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.6386681Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.6387721Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6388394Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6389302Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6389978Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6390871Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6391641Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6392264Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.6393546Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6393930Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.6395304Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6395414Z FAILED [0.5114s] [  0%]
2025-12-04T12:15:05.6395421Z 
2025-12-04T12:15:05.6395622Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.6396020Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _
2025-12-04T12:15:05.6396150Z Traceback (most recent call last):
2025-12-04T12:15:05.6396594Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.6396828Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.6397369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.6397618Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.6398131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.6398342Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.6398856Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.6399021Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.6399556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.6399881Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.6400416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.6400569Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.6401065Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.6401221Z     return self._compile_to_module()
2025-12-04T12:15:05.6401709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.6401885Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.6402403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.6402534Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.6403218Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.6403457Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.6404060Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.6404188Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.6404686Z   File "/tmp/tmp4vf3m1r3/7q/c7qv3x5zpj2odey5ro52ulltd6qbs62sm2jkootpe7ii7f2dn25g.py", line 137, in <module>
2025-12-04T12:15:05.6405165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.6405278Z     kernel.precompile(
2025-12-04T12:15:05.6405893Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.6406014Z     self._precompile_worker()
2025-12-04T12:15:05.6406609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.6406802Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.6407397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6407598Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6408067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6408353Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6408813Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6409151Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6409378Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.6410234Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6410329Z ^
2025-12-04T12:15:05.6410799Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6410808Z 
2025-12-04T12:15:05.6411520Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.6411527Z 
2025-12-04T12:15:05.6411532Z 
2025-12-04T12:15:05.6411751Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.6412474Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T12:15:05.6412483Z 
2025-12-04T12:15:05.6412752Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.6412986Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6413092Z frames [('total', 1)]
2025-12-04T12:15:05.6413211Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6413687Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.6413973Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6414090Z graph_break []
2025-12-04T12:15:05.6414489Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _
2025-12-04T12:15:05.6414616Z Traceback (most recent call last):
2025-12-04T12:15:05.6415050Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.6415285Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.6415774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.6416038Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.6416626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.6416843Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.6417358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.6417506Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.6418096Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.6418421Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.6418957Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.6419107Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.6419584Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.6419722Z     return self._compile_to_module()
2025-12-04T12:15:05.6420240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.6420404Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.6420935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.6421065Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.6421572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.6421839Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.6422423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.6422564Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.6423072Z   File "/tmp/tmpexhm0fx9/i4/ci4wjlqde6fk4tmtwjf4zm6powrm3orm22qjeh5bifomvaihricx.py", line 137, in <module>
2025-12-04T12:15:05.6423546Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.6423659Z     kernel.precompile(
2025-12-04T12:15:05.6424216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.6424350Z     self._precompile_worker()
2025-12-04T12:15:05.6424946Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.6425128Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.6425735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6425934Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6426430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6426677Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6427119Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6427467Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6427693Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.6428513Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6428604Z ^
2025-12-04T12:15:05.6429065Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6429072Z 
2025-12-04T12:15:05.6429796Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.6429802Z 
2025-12-04T12:15:05.6429806Z 
2025-12-04T12:15:05.6430024Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.6430774Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T12:15:05.6430783Z 
2025-12-04T12:15:05.6431055Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.6431295Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6431404Z frames [('total', 1)]
2025-12-04T12:15:05.6431522Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6432007Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.6432264Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6432367Z graph_break []
2025-12-04T12:15:05.6432607Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6432712Z frames [('total', 1)]
2025-12-04T12:15:05.6432832Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6433065Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6433557Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.6433670Z graph_break []
2025-12-04T12:15:05.6433817Z =================================== FAILURES ===================================
2025-12-04T12:15:05.6434216Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _
2025-12-04T12:15:05.6434357Z Traceback (most recent call last):
2025-12-04T12:15:05.6434782Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.6435016Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.6435527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.6435777Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.6436304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.6436498Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.6437008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.6437169Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.6437740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.6438074Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.6438595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.6438745Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.6439235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.6439370Z     return self._compile_to_module()
2025-12-04T12:15:05.6439867Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.6440033Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.6440552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.6440701Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.6441202Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.6441437Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.6442068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.6442198Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.6442721Z   File "/tmp/tmp68wrdup6/4e/c4eun5lbz42mqcz5denprvz2e7vakonqhzkfvfckcxg4ttfvxhob.py", line 137, in <module>
2025-12-04T12:15:05.6443184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.6443298Z     kernel.precompile(
2025-12-04T12:15:05.6443872Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.6443994Z     self._precompile_worker()
2025-12-04T12:15:05.6444642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.6444825Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.6445421Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6445667Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6446120Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6446368Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6446827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6447168Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6447413Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.6448221Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6448314Z ^
2025-12-04T12:15:05.6448786Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6448794Z 
2025-12-04T12:15:05.6449504Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.6449510Z 
2025-12-04T12:15:05.6449515Z 
2025-12-04T12:15:05.6449749Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.6450495Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T12:15:05.6450500Z 
2025-12-04T12:15:05.6450785Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.6451013Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6451121Z frames [('total', 1)]
2025-12-04T12:15:05.6451255Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6451726Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.6451950Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6452070Z graph_break []
2025-12-04T12:15:05.6452293Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6452416Z frames [('total', 1)]
2025-12-04T12:15:05.6452537Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6452757Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6453237Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.6453341Z graph_break []
2025-12-04T12:15:05.6453597Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6453718Z frames [('total', 1)]
2025-12-04T12:15:05.6453835Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6454055Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6454521Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.6454620Z graph_break []
2025-12-04T12:15:05.6455291Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df60bd1ca7e6baab.xml -
2025-12-04T12:15:05.6455472Z =========================== short test summary info ============================
2025-12-04T12:15:05.6456423Z FAILED [0.5114s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.6457248Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6457390Z ^
2025-12-04T12:15:05.6457864Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6457869Z 
2025-12-04T12:15:05.6458576Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.6458584Z 
2025-12-04T12:15:05.6458588Z 
2025-12-04T12:15:05.6458822Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.6459529Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T12:15:05.6459535Z 
2025-12-04T12:15:05.6459806Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.6460002Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.6460205Z ================== 1 failed, 30 deselected, 2 rerun in 4.58s ===================
2025-12-04T12:15:05.6460319Z Got exit code 1
2025-12-04T12:15:05.6460429Z Retrying single test...
2025-12-04T12:15:05.6460900Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-927fdf8f8ff6280c.xml
2025-12-04T12:15:05.6461077Z ============================= test session starts ==============================
2025-12-04T12:15:05.6461468Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.6461581Z cachedir: .pytest_cache
2025-12-04T12:15:05.6462114Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.6462243Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.6462366Z configfile: pytest.ini
2025-12-04T12:15:05.6462960Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.6463186Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.6463989Z stepcurrent: skipping 30 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T12:15:05.6464109Z Running 1 items in this shard
2025-12-04T12:15:05.6464114Z 
2025-12-04T12:15:05.6465601Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.6466845Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6467290Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.6467744Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.6468211Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.6468788Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.6469331Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.6469926Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.6470569Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.6471360Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.6471827Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.6472468Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.6473007Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.6473554Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T12:15:05.6474136Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6474677Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6475309Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6475814Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6476299Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6476769Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T12:15:05.6477289Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T12:15:05.6478051Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6478762Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.6479504Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.6480045Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.6480537Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.6480990Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T12:15:05.6481495Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.6481997Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T12:15:05.6482497Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.6483032Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.6483521Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.6484101Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.6484696Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6485290Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T12:15:05.6485847Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T12:15:05.6486348Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T12:15:05.6486824Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T12:15:05.6487399Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T12:15:05.6487869Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T12:15:05.6488477Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T12:15:05.6489031Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T12:15:05.6489737Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T12:15:05.6490314Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T12:15:05.6490841Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T12:15:05.6491552Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T12:15:05.6491933Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.6494658Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.6495216Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.6496348Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6497006Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6497938Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6498618Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6499514Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6500282Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6500901Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.6502145Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6502568Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.6503459Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6503608Z ('RERUN', {'yellow': True}) [3.5163s] [100%]
2025-12-04T12:15:05.6505035Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.6506280Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6506726Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.6507209Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.6507684Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.6508222Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.6508777Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.6509358Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.6509996Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.6510566Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.6511013Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.6511690Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.6512214Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.6512763Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T12:15:05.6513359Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6513887Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6514429Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6514921Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6515400Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6515879Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T12:15:05.6516416Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T12:15:05.6517190Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6517875Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.6518579Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.6519104Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.6519592Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.6520057Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T12:15:05.6520580Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.6521046Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T12:15:05.6521531Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.6522062Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.6522560Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.6523117Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.6523726Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6524305Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T12:15:05.6524897Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T12:15:05.6525404Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T12:15:05.6525866Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T12:15:05.6526456Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T12:15:05.6526911Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T12:15:05.6527487Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T12:15:05.6528042Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T12:15:05.6528746Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T12:15:05.6529333Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T12:15:05.6529883Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T12:15:05.6530607Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T12:15:05.6530970Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.6533645Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.6534184Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.6535240Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6535869Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6536862Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6537567Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6538457Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6539272Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6539882Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.6541153Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6541525Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.6542417Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6542566Z ('RERUN', {'yellow': True}) [0.5129s] [100%]
2025-12-04T12:15:05.6544001Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.6545293Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6545731Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.6546197Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.6546661Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.6547202Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.6547762Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.6548374Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.6548972Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.6549533Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.6550002Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.6550643Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.6551199Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.6551764Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T12:15:05.6552345Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6552918Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6553449Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6553940Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6554438Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6554906Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T12:15:05.6555424Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T12:15:05.6556187Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6556875Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.6557608Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.6558136Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.6558637Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.6559092Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T12:15:05.6559601Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.6560057Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T12:15:05.6560544Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.6561096Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.6561635Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.6562170Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.6562767Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6563349Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T12:15:05.6563923Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T12:15:05.6564451Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T12:15:05.6564929Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T12:15:05.6565504Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T12:15:05.6565989Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T12:15:05.6566580Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T12:15:05.6567120Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T12:15:05.6567840Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T12:15:05.6568418Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T12:15:05.6568932Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T12:15:05.6569655Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T12:15:05.6570018Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.6572885Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.6573420Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.6574487Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6575199Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6576106Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6576866Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6577761Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6578598Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6579214Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.6580476Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6580885Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.6581790Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6581899Z FAILED [0.5139s] [100%]
2025-12-04T12:15:05.6581906Z 
2025-12-04T12:15:05.6582070Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.6582472Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _
2025-12-04T12:15:05.6582600Z Traceback (most recent call last):
2025-12-04T12:15:05.6583040Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.6583275Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.6583778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.6584026Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.6584598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.6584808Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.6585320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.6585468Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.6586015Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.6586338Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.6586870Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.6587023Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.6587506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.6587647Z     return self._compile_to_module()
2025-12-04T12:15:05.6588134Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.6588346Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.6588865Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.6588999Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.6589507Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.6589742Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.6590323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.6590467Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.6591006Z   File "/tmp/tmprvz50br1/ij/cijjsrboulgkoazpletm4enpys4jzah2fb6wlkk3jfo756sifzzx.py", line 137, in <module>
2025-12-04T12:15:05.6591486Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.6591600Z     kernel.precompile(
2025-12-04T12:15:05.6592155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.6592319Z     self._precompile_worker()
2025-12-04T12:15:05.6592911Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.6593105Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.6593699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6593899Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6594365Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6594612Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6595052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6595401Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6595627Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.6596443Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6596570Z ^
2025-12-04T12:15:05.6597029Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6597049Z 
2025-12-04T12:15:05.6597759Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.6597769Z 
2025-12-04T12:15:05.6597774Z 
2025-12-04T12:15:05.6597991Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.6598715Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T12:15:05.6598721Z 
2025-12-04T12:15:05.6598992Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.6599228Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6599337Z frames [('total', 1)]
2025-12-04T12:15:05.6599455Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6599937Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.6600160Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6600260Z graph_break []
2025-12-04T12:15:05.6600701Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _
2025-12-04T12:15:05.6600827Z Traceback (most recent call last):
2025-12-04T12:15:05.6601264Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.6601496Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.6601985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.6602246Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.6602790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.6602997Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.6603509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.6603656Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.6604200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.6604554Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.6605071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.6605231Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.6605712Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.6605850Z     return self._compile_to_module()
2025-12-04T12:15:05.6606332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.6606497Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.6607023Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.6607158Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.6607664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.6607896Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.6608481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.6608673Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.6609180Z   File "/tmp/tmplm6oh1qc/ly/clygrn34bxe55c4lxlsrh6vy73lawtnmx34zbauu5y2wbgfti2ss.py", line 137, in <module>
2025-12-04T12:15:05.6609645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.6609773Z     kernel.precompile(
2025-12-04T12:15:05.6610323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.6610456Z     self._precompile_worker()
2025-12-04T12:15:05.6611055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.6611233Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.6611845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6612047Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6612515Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6612814Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6613259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6613612Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6613846Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.6614655Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6614764Z ^
2025-12-04T12:15:05.6615223Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6615268Z 
2025-12-04T12:15:05.6616005Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.6616011Z 
2025-12-04T12:15:05.6616016Z 
2025-12-04T12:15:05.6616234Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.6617052Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T12:15:05.6617059Z 
2025-12-04T12:15:05.6617331Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.6617554Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6617676Z frames [('total', 1)]
2025-12-04T12:15:05.6617795Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6618271Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
﻿2025-12-04T12:15:05.6621377Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6621483Z graph_break []
2025-12-04T12:15:05.6621711Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6621831Z frames [('total', 1)]
2025-12-04T12:15:05.6621954Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6622188Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6622652Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.6622754Z graph_break []
2025-12-04T12:15:05.6622932Z =================================== FAILURES ===================================
2025-12-04T12:15:05.6623331Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _
2025-12-04T12:15:05.6623459Z Traceback (most recent call last):
2025-12-04T12:15:05.6623903Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.6624175Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.6624668Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.6624939Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.6625457Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.6625667Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.6626181Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.6626329Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.6626880Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.6627263Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.6627800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.6627954Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.6628437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.6628575Z     return self._compile_to_module()
2025-12-04T12:15:05.6629064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.6629232Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.6629799Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.6629932Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.6630446Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.6630679Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.6631268Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.6631445Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.6631956Z   File "/tmp/tmpf2u9p3qs/gk/cgkwof2xrehpz5xuibdydb6xxjntk2ldzm2llo7mjfg3vmfvszhi.py", line 137, in <module>
2025-12-04T12:15:05.6632436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.6632551Z     kernel.precompile(
2025-12-04T12:15:05.6633108Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.6633245Z     self._precompile_worker()
2025-12-04T12:15:05.6633924Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.6634107Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.6634720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6634922Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6635385Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6635631Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6636079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6636430Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6636662Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.6637491Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6637587Z ^
2025-12-04T12:15:05.6638042Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6638048Z 
2025-12-04T12:15:05.6638773Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.6638779Z 
2025-12-04T12:15:05.6638784Z 
2025-12-04T12:15:05.6639006Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.6639728Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T12:15:05.6639768Z 
2025-12-04T12:15:05.6640039Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.6640271Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6640379Z frames [('total', 1)]
2025-12-04T12:15:05.6640499Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6640977Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.6641198Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6641299Z graph_break []
2025-12-04T12:15:05.6641527Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6641633Z frames [('total', 1)]
2025-12-04T12:15:05.6641780Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6642012Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6642473Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.6642586Z graph_break []
2025-12-04T12:15:05.6642803Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6642940Z frames [('total', 1)]
2025-12-04T12:15:05.6643067Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6643285Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6643740Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.6643853Z graph_break []
2025-12-04T12:15:05.6644507Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-927fdf8f8ff6280c.xml -
2025-12-04T12:15:05.6644697Z =========================== short test summary info ============================
2025-12-04T12:15:05.6645592Z FAILED [0.5139s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.6646398Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6646504Z ^
2025-12-04T12:15:05.6646959Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6646965Z 
2025-12-04T12:15:05.6647687Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.6647695Z 
2025-12-04T12:15:05.6647700Z 
2025-12-04T12:15:05.6647919Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.6648641Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T12:15:05.6648647Z 
2025-12-04T12:15:05.6648920Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.6649103Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.6649322Z ================== 1 failed, 187 deselected, 2 rerun in 4.59s ==================
2025-12-04T12:15:05.6649424Z Got exit code 1
2025-12-04T12:15:05.6649532Z Retrying single test...
2025-12-04T12:15:05.6650014Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f380c761dc75570.xml
2025-12-04T12:15:05.6650182Z ============================= test session starts ==============================
2025-12-04T12:15:05.6650547Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.6650660Z cachedir: .pytest_cache
2025-12-04T12:15:05.6651213Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.6651356Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.6651467Z configfile: pytest.ini
2025-12-04T12:15:05.6652057Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.6652293Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.6653080Z stepcurrent: skipping 30 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T12:15:05.6653244Z Running 1 items in this shard
2025-12-04T12:15:05.6653250Z 
2025-12-04T12:15:05.6654689Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.6656013Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6656576Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.6657032Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.6657511Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.6658097Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.6658653Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.6659243Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.6659845Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.6660401Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.6660852Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.6661506Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.6662028Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.6662588Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T12:15:05.6663165Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6663695Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6664234Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6664756Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6665248Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6665711Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T12:15:05.6666210Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T12:15:05.6667012Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6667707Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.6668413Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.6668965Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.6669465Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.6669921Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T12:15:05.6670417Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.6670881Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T12:15:05.6671607Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.6672160Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.6672652Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.6673175Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.6673782Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6674362Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T12:15:05.6674939Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T12:15:05.6675441Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T12:15:05.6675907Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T12:15:05.6676496Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T12:15:05.6676953Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T12:15:05.6677539Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T12:15:05.6678147Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T12:15:05.6678865Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T12:15:05.6679442Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T12:15:05.6679958Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T12:15:05.6680731Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T12:15:05.6681100Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.6683734Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.6684316Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.6685423Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6686053Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6686951Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6687628Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6688516Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6689883Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6690499Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.6691761Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6692129Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.6693086Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6693226Z ('RERUN', {'yellow': True}) [3.5349s] [100%]
2025-12-04T12:15:05.6694655Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.6695940Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6696443Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.6696913Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.6697414Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.6697964Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.6698507Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.6699090Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.6699693Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.6700372Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.6700839Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.6701469Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.6702142Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.6702697Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T12:15:05.6703280Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6703825Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6704354Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6704857Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6705334Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6705801Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T12:15:05.6711232Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T12:15:05.6712165Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6712885Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.6713574Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.6714106Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.6714684Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.6715148Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T12:15:05.6715662Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.6716158Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T12:15:05.6716642Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.6717191Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.6717686Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.6718220Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.6718869Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6719461Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T12:15:05.6720040Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T12:15:05.6720539Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T12:15:05.6721023Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T12:15:05.6721603Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T12:15:05.6722078Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T12:15:05.6722661Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T12:15:05.6723201Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T12:15:05.6723929Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T12:15:05.6724511Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T12:15:05.6725096Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T12:15:05.6725806Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T12:15:05.6726171Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.6728856Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.6729442Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.6730495Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6731564Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6732482Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6733226Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6734127Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6734897Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6735526Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.6736857Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6737231Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.6738139Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6738275Z ('RERUN', {'yellow': True}) [0.5152s] [100%]
2025-12-04T12:15:05.6739730Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.6741024Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6741474Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.6741922Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.6742380Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.6742963Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.6743509Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.6744104Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.6744715Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.6745276Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.6745725Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.6746362Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.6746937Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.6747480Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T12:15:05.6748072Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6748600Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6749124Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6749628Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6750108Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6750585Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T12:15:05.6751085Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T12:15:05.6751848Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6752553Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.6753269Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.6753816Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.6754306Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.6754771Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T12:15:05.6755259Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.6755714Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T12:15:05.6756240Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.6756781Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.6757283Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.6757839Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.6758432Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6759030Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T12:15:05.6759594Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T12:15:05.6760139Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T12:15:05.6760602Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T12:15:05.6761181Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T12:15:05.6761650Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T12:15:05.6762220Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T12:15:05.6762772Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T12:15:05.6763482Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T12:15:05.6764074Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T12:15:05.6764591Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T12:15:05.6765300Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T12:15:05.6765675Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.6768340Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.6768901Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.6769978Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6770626Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6771762Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6772463Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6773350Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6774144Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6774858Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.6776109Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6776559Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.6777523Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6777647Z FAILED [0.5182s] [100%]
2025-12-04T12:15:05.6777657Z 
2025-12-04T12:15:05.6777806Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.6778208Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _
2025-12-04T12:15:05.6778353Z Traceback (most recent call last):
2025-12-04T12:15:05.6778782Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.6779034Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.6779525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.6779777Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.6780307Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.6780575Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.6781098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.6781251Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.6781785Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.6782123Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.6782646Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.6782796Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.6783340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.6783467Z     return self._compile_to_module()
2025-12-04T12:15:05.6783969Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.6784133Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.6784697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.6784843Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.6785341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.6785587Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.6786175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.6786304Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.6786851Z   File "/tmp/tmp7yaj2ngs/hn/chnd5zeoc6rawi6vgrzw53ix6p54jmik4pp2r6spf4q2h6x272gd.py", line 137, in <module>
2025-12-04T12:15:05.6787314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.6787428Z     kernel.precompile(
2025-12-04T12:15:05.6787995Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.6788116Z     self._precompile_worker()
2025-12-04T12:15:05.6788724Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.6788904Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.6789498Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6789710Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6790165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6790423Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6790867Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6791200Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6791439Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.6792245Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6792353Z ^
2025-12-04T12:15:05.6792813Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6792822Z 
2025-12-04T12:15:05.6793576Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.6793586Z 
2025-12-04T12:15:05.6793590Z 
2025-12-04T12:15:05.6793821Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.6794524Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T12:15:05.6794530Z 
2025-12-04T12:15:05.6794812Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.6795039Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6795184Z frames [('total', 1)]
2025-12-04T12:15:05.6795322Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6795797Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.6796040Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6796142Z graph_break []
2025-12-04T12:15:05.6796542Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _
2025-12-04T12:15:05.6796718Z Traceback (most recent call last):
2025-12-04T12:15:05.6797143Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.6797377Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.6797887Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.6798139Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.6798666Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.6798907Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.6799416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.6799580Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.6800117Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.6800448Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.6800971Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.6801121Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.6801616Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.6801740Z     return self._compile_to_module()
2025-12-04T12:15:05.6802225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.6802402Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.6802919Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.6803061Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.6803558Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.6803790Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.6804392Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.6804522Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.6805085Z   File "/tmp/tmpfnxbtl3o/ts/ctsuz7uvnj6sfg3btq44tonxqbdbfymagb3i25w7q2d57gsazfst.py", line 137, in <module>
2025-12-04T12:15:05.6805553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.6805668Z     kernel.precompile(
2025-12-04T12:15:05.6806234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.6806352Z     self._precompile_worker()
2025-12-04T12:15:05.6806945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.6807141Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.6807773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6807988Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6808447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6808695Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6809192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6809530Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6809769Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.6810577Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6810673Z ^
2025-12-04T12:15:05.6811146Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6811220Z 
2025-12-04T12:15:05.6811934Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.6811943Z 
2025-12-04T12:15:05.6811947Z 
2025-12-04T12:15:05.6812179Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.6812887Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T12:15:05.6812893Z 
2025-12-04T12:15:05.6813165Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.6813404Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6813515Z frames [('total', 1)]
2025-12-04T12:15:05.6813650Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6814125Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.6814349Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6814467Z graph_break []
2025-12-04T12:15:05.6814691Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6814798Z frames [('total', 1)]
2025-12-04T12:15:05.6814932Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6815155Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6815635Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.6815735Z graph_break []
2025-12-04T12:15:05.6815882Z =================================== FAILURES ===================================
2025-12-04T12:15:05.6816368Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _
2025-12-04T12:15:05.6816500Z Traceback (most recent call last):
2025-12-04T12:15:05.6816975Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.6817226Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.6817718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.6817984Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.6818500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.6818698Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.6819256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.6819409Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.6819959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.6820282Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.6820834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.6820999Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.6821481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.6821606Z     return self._compile_to_module()
2025-12-04T12:15:05.6822100Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.6822267Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.6822797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.6822970Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.6823467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.6823719Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.6824305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.6824447Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.6824959Z   File "/tmp/tmp3q37ahr4/bs/cbsbpp7pjvsphuatpn2c2cok6zsa7zbbrfej5uieczdhghgxnl6w.py", line 137, in <module>
2025-12-04T12:15:05.6825426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.6825556Z     kernel.precompile(
2025-12-04T12:15:05.6826114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.6826232Z     self._precompile_worker()
2025-12-04T12:15:05.6826840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.6827023Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.6827627Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6827829Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6828282Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6828541Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6828986Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6829373Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6829603Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.6830413Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6830516Z ^
2025-12-04T12:15:05.6830973Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6830979Z 
2025-12-04T12:15:05.6831736Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.6831743Z 
2025-12-04T12:15:05.6831748Z 
2025-12-04T12:15:05.6831971Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.6832679Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T12:15:05.6832700Z 
2025-12-04T12:15:05.6833009Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.6833232Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6833353Z frames [('total', 1)]
2025-12-04T12:15:05.6833474Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6833940Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.6834178Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6834279Z graph_break []
2025-12-04T12:15:05.6834498Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6834660Z frames [('total', 1)]
2025-12-04T12:15:05.6834777Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6835009Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6835471Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.6835575Z graph_break []
2025-12-04T12:15:05.6835806Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.6835911Z frames [('total', 1)]
2025-12-04T12:15:05.6836026Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.6836254Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.6836707Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.6836821Z graph_break []
2025-12-04T12:15:05.6837469Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f380c761dc75570.xml -
2025-12-04T12:15:05.6837651Z =========================== short test summary info ============================
2025-12-04T12:15:05.6838509Z FAILED [0.5182s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.6839315Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6839420Z ^
2025-12-04T12:15:05.6839878Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6839884Z 
2025-12-04T12:15:05.6840593Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.6840614Z 
2025-12-04T12:15:05.6840618Z 
2025-12-04T12:15:05.6840874Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.6842131Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T12:15:05.6842141Z 
2025-12-04T12:15:05.6842433Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.6842621Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.6842843Z ================== 1 failed, 187 deselected, 2 rerun in 4.61s ==================
2025-12-04T12:15:05.6842948Z Got exit code 1
2025-12-04T12:15:05.6843626Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T12:15:05.6844054Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:05.6844533Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db3aa4c2f1c0f2c1.xml
2025-12-04T12:15:05.6844698Z ============================= test session starts ==============================
2025-12-04T12:15:05.6845104Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.6845218Z cachedir: .pytest_cache
2025-12-04T12:15:05.6845749Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.6845874Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.6845983Z configfile: pytest.ini
2025-12-04T12:15:05.6846771Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.6847005Z collecting ... collected 188 items / 31 deselected / 157 selected
2025-12-04T12:15:05.6847227Z stepcurrent: skipping 31 already run items.
2025-12-04T12:15:05.6847361Z Running 157 items in this shard
2025-12-04T12:15:05.6847367Z 
2025-12-04T12:15:05.6848738Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.6849850Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6850304Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T12:15:05.6850777Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.6851242Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.6851780Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.6852341Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.6852930Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.6853538Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.6854135Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.6854590Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.6855040Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.6855643Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.6856241Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.6856973Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.6857572Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6858112Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6858704Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6859207Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6859688Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6860167Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.6860955Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6861521Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.6862124Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6862843Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.6863456Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.6863858Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.6864482Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.6865079Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.6865725Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.6866444Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.6866924Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.6867411Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.6867928Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.6868564Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.6869100Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.6869646Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.6870275Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6870809Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6871528Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6872036Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6872604Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6873084Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.6873877Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6874424Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.6874977Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.6875444Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.6875962Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.6876421Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.6876931Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.6877473Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.6877971Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.6878508Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.6879104Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6879695Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.6880256Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T12:15:05.6880752Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.6881292Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.6881870Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.6882340Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.6882916Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.6883454Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.6884115Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T12:15:05.6884700Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.6885257Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T12:15:05.6885669Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.6888042Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.6888611Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.6889674Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6890306Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6891210Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6891909Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6892801Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6893592Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6894202Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.6895346Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6895718Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.6896701Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6896839Z ('RERUN', {'yellow': True}) [3.4364s] [  0%]
2025-12-04T12:15:05.6898226Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.6899331Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6899779Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T12:15:05.6900279Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.6900743Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.6901277Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.6901837Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.6902465Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.6903071Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.6903635Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.6904099Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.6904533Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.6905136Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.6905745Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.6906353Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.6906946Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6907478Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6908002Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6908510Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6909041Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6909523Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.6910315Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6910841Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.6911444Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6912219Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.6912845Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.6913362Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.6913992Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.6914582Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.6915231Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.6915987Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.6916467Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.6916957Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.6917432Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.6918064Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.6918604Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.6919154Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.6919745Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6920276Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6920815Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6921303Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6921784Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6922260Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.6923085Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6923627Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.6924122Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.6924582Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.6925128Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.6925592Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.6926105Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.6926644Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.6927179Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.6927713Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.6928312Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6928901Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.6929499Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T12:15:05.6929999Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.6930478Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.6931053Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.6931530Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.6932104Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.6932660Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.6933254Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T12:15:05.6933830Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.6934391Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T12:15:05.6934751Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.6937232Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.6937776Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.6938855Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6939487Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6940397Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6941107Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6941988Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6942778Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6943420Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.6944523Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6944892Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.6945803Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6945940Z ('RERUN', {'yellow': True}) [0.4364s] [  0%]
2025-12-04T12:15:05.6947301Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.6948400Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6948849Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T12:15:05.6949316Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.6949776Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.6950354Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.6950901Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.6951484Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.6952080Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.6952638Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.6953128Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.6953563Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.6954160Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.6954787Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.6955392Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.6955992Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6956527Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6957091Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6957592Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6958073Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6958551Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.6959334Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6959866Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.6960457Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6961175Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.6961792Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.6962195Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.6962827Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.6963472Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.6964122Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.6964842Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.6965320Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.6965809Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.6966316Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.6966968Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.6967490Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.6968069Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.6968658Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.6969187Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.6969728Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.6970249Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.6970726Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.6971433Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.6972220Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.6972767Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.6973265Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.6973732Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.6974250Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.6974708Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.6975217Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.6975753Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.6976249Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.6976847Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.6977543Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.6978140Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.6978704Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T12:15:05.6979210Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.6979724Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.6980310Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.6980783Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.6981405Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.6981957Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.6982547Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T12:15:05.6983127Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.6983730Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T12:15:05.6984089Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.6986439Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.6986979Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.6988031Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.6988661Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.6989566Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.6990251Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.6991164Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.6991962Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.6992572Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.6993706Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.6994074Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.6994985Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.6995127Z FAILED [0.4424s] [  0%]
2025-12-04T12:15:05.6995133Z 
2025-12-04T12:15:05.6995280Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.6995703Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _
2025-12-04T12:15:05.6995832Z Traceback (most recent call last):
2025-12-04T12:15:05.6996266Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.6996506Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.6996996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.6997289Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.6997804Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.6998013Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.6998522Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.6998672Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.6999222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.6999545Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.7000071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.7000241Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.7000727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.7000869Z     return self._compile_to_module()
2025-12-04T12:15:05.7001357Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.7001523Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.7002058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.7002192Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.7002704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.7002940Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.7003563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.7003709Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.7004219Z   File "/tmp/tmp76glkdzi/po/cpoyyo3lz5ssavri25muw6nyqrtinsgmmqbsiyzihwijrvuhg4kw.py", line 65, in <module>
2025-12-04T12:15:05.7004686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.7004815Z     kernel.precompile(
2025-12-04T12:15:05.7005373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.7005511Z     self._precompile_worker()
2025-12-04T12:15:05.7006139Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.7006322Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.7006941Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7007142Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7007640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7007888Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7008332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7008684Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7008913Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.7009570Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7009727Z ^
2025-12-04T12:15:05.7010183Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7010189Z 
2025-12-04T12:15:05.7010916Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.7010923Z 
2025-12-04T12:15:05.7010928Z 
2025-12-04T12:15:05.7011147Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.7011879Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T12:15:05.7011885Z 
2025-12-04T12:15:05.7012159Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.7012388Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7012517Z frames [('total', 1)]
2025-12-04T12:15:05.7012860Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7013561Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7014409Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7014878Z graph_break []
2025-12-04T12:15:05.7015434Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _
2025-12-04T12:15:05.7016121Z Traceback (most recent call last):
2025-12-04T12:15:05.7016872Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.7017682Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.7018540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.7019481Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.7020401Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.7021241Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.7022160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.7022980Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.7023802Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.7024808Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.7025829Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.7026659Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.7027431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.7028180Z     return self._compile_to_module()
2025-12-04T12:15:05.7028935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.7029745Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.7030572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.7031354Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.7032118Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.7032999Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.7034012Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.7034858Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.7035625Z   File "/tmp/tmp46vfc6fa/vf/cvffpdm2qdkks4ztfr6e2tkrjm5ndpf4p2ef4jszkjrm4xkjivxj.py", line 65, in <module>
2025-12-04T12:15:05.7036745Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.7037470Z     kernel.precompile(
2025-12-04T12:15:05.7038204Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.7039026Z     self._precompile_worker()
2025-12-04T12:15:05.7039855Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.7040766Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.7041677Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7042620Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7043416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7044253Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7045091Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7046020Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7046733Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.7047754Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7048645Z ^
2025-12-04T12:15:05.7049287Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7049884Z 
2025-12-04T12:15:05.7050614Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.7051464Z 
2025-12-04T12:15:05.7051468Z 
2025-12-04T12:15:05.7051692Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.7052783Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T12:15:05.7053656Z 
2025-12-04T12:15:05.7053977Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.7054639Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7055126Z frames [('total', 1)]
2025-12-04T12:15:05.7055436Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7056130Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7057073Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7057554Z graph_break []
2025-12-04T12:15:05.7057944Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7058428Z frames [('total', 1)]
2025-12-04T12:15:05.7058726Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7059175Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7060011Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7060712Z graph_break []
2025-12-04T12:15:05.7061034Z =================================== FAILURES ===================================
2025-12-04T12:15:05.7061804Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _
2025-12-04T12:15:05.7062472Z Traceback (most recent call last):
2025-12-04T12:15:05.7063151Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.7063961Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.7064837Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.7065708Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.7066626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.7067490Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.7068343Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.7069140Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.7069965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.7071177Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.7072181Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.7072989Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.7073773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.7074531Z     return self._compile_to_module()
2025-12-04T12:15:05.7075258Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.7076160Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.7076991Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.7077793Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.7078543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.7079415Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.7080378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.7081236Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.7082054Z   File "/tmp/tmpuxr1bpil/jz/cjz4vkh4bhcgpe7ew3ykmveyaajtmh6bftewjpr4ecfje5u4d4ee.py", line 65, in <module>
2025-12-04T12:15:05.7083177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.7083914Z     kernel.precompile(
2025-12-04T12:15:05.7084655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.7085531Z     self._precompile_worker()
2025-12-04T12:15:05.7086355Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.7087283Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.7088182Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7089128Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7089929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7090817Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7091644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7092569Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7093280Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.7094287Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7095176Z ^
2025-12-04T12:15:05.7095767Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7096431Z 
2025-12-04T12:15:05.7097165Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.7098022Z 
2025-12-04T12:15:05.7098026Z 
2025-12-04T12:15:05.7098265Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.7099328Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T12:15:05.7100195Z 
2025-12-04T12:15:05.7100468Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.7101108Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7101584Z frames [('total', 1)]
2025-12-04T12:15:05.7101874Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7102561Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7103400Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7103857Z graph_break []
2025-12-04T12:15:05.7104298Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7104779Z frames [('total', 1)]
2025-12-04T12:15:05.7105070Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7105513Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7106346Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7107071Z graph_break []
2025-12-04T12:15:05.7107438Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7107909Z frames [('total', 1)]
2025-12-04T12:15:05.7108208Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7108633Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7109512Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7110226Z graph_break []
2025-12-04T12:15:05.7111053Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db3aa4c2f1c0f2c1.xml -
2025-12-04T12:15:05.7112010Z =========================== short test summary info ============================
2025-12-04T12:15:05.7113270Z FAILED [0.4424s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.7114910Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7115800Z ^
2025-12-04T12:15:05.7116381Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7116994Z 
2025-12-04T12:15:05.7117714Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.7118643Z 
2025-12-04T12:15:05.7118648Z 
2025-12-04T12:15:05.7118867Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.7119948Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T12:15:05.7120792Z 
2025-12-04T12:15:05.7121066Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.7121670Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.7122200Z ================== 1 failed, 31 deselected, 2 rerun in 4.36s ===================
2025-12-04T12:15:05.7122651Z Got exit code 1
2025-12-04T12:15:05.7122918Z Retrying single test...
2025-12-04T12:15:05.7123589Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7c20e7902388541e.xml
2025-12-04T12:15:05.7124378Z ============================= test session starts ==============================
2025-12-04T12:15:05.7125035Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.7125646Z cachedir: .pytest_cache
2025-12-04T12:15:05.7126360Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.7127148Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.7127497Z configfile: pytest.ini
2025-12-04T12:15:05.7128286Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.7129252Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.7130410Z stepcurrent: skipping 31 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T12:15:05.7131514Z Running 1 items in this shard
2025-12-04T12:15:05.7131745Z 
2025-12-04T12:15:05.7133105Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.7135713Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7137524Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T12:15:05.7138564Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.7139630Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.7140812Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.7142036Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.7143305Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.7144608Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.7145888Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.7147088Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.7148117Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.7149283Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.7150603Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.7151946Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.7153277Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.7154526Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.7155726Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.7156881Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.7157998Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.7159072Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.7160510Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.7161958Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.7163210Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7164651Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.7166103Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.7167803Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.7168990Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.7170342Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.7171950Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.7173451Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.7174780Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.7175884Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.7177141Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.7178380Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.7179689Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.7180903Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.7182170Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.7183410Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.7184621Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.7185782Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.7186900Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.7187978Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.7189385Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.7190841Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.7192081Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.7193190Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.7194299Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.7195411Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.7196515Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.7197763Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.7198936Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.7200096Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.7201413Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7202728Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.7204000Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T12:15:05.7205210Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.7206354Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.7207537Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.7208699Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.7209872Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.7211129Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.7212407Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T12:15:05.7213726Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.7214972Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T12:15:05.7216027Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.7218986Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.7222004Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.7223736Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7225544Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7227264Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7228989Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7230687Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7232523Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7234037Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.7235878Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7237507Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.7238913Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7240081Z ('RERUN', {'yellow': True}) [3.4421s] [100%]
2025-12-04T12:15:05.7241691Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.7244258Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7245936Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T12:15:05.7246982Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.7248039Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.7249160Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.7250394Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.7251706Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.7253020Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.7254302Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.7255436Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.7256578Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.7257885Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.7259224Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.7260547Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.7261921Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.7263183Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.7264392Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.7265546Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.7266718Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.7267826Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.7269226Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.7270656Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.7272098Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7273551Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.7275025Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.7276178Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.7277336Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.7278686Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.7280068Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.7281635Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.7282952Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.7284051Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.7285150Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.7286405Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.7287751Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.7288962Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.7290240Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.7291536Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.7292742Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.7293889Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.7295008Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.7296100Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.7297618Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.7299061Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.7300229Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.7301332Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.7302446Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.7303537Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.7304644Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.7305823Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.7307007Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.7308172Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.7309417Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7310746Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.7312084Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T12:15:05.7313288Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.7314387Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.7315571Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.7316753Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.7317966Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.7319215Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.7320490Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T12:15:05.7321832Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.7323097Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T12:15:05.7324131Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.7326979Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.7330024Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.7331745Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7333554Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7335213Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7336975Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7339241Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7341049Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7342650Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.7344489Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7346095Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.7347527Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7348701Z ('RERUN', {'yellow': True}) [0.4662s] [100%]
2025-12-04T12:15:05.7350337Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.7353116Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7354793Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T12:15:05.7355814Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.7356877Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.7358091Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.7359316Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.7360573Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.7361889Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.7363171Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.7364321Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.7365341Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.7366528Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.7367857Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.7369186Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.7370513Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.7371941Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.7373277Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.7374443Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.7375574Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.7376720Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.7378194Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.7379638Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.7380906Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7382343Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.7383856Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.7385014Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.7386188Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.7387597Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.7388975Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.7390478Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.7391808Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.7392913Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.7394010Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.7395277Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.7396925Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.7398149Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.7399423Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.7400669Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.7401879Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.7403116Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.7404237Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.7405316Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.7406724Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.7408177Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.7409391Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.7410491Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.7411602Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.7412746Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.7413868Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.7415034Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.7416218Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.7417453Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.7418766Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7420096Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.7421366Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T12:15:05.7422567Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.7423682Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.7424864Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.7426045Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.7427228Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.7428487Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.7429765Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T12:15:05.7431067Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.7432378Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T12:15:05.7433433Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.7436316Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.7439333Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.7441056Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7442892Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7444563Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7446283Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7448055Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7449845Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7451358Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.7453202Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7454804Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.7456211Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7457444Z FAILED [0.4526s] [100%]
2025-12-04T12:15:05.7457625Z 
2025-12-04T12:15:05.7457772Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.7458480Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _
2025-12-04T12:15:05.7459163Z Traceback (most recent call last):
2025-12-04T12:15:05.7459819Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.7460618Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.7461482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.7462407Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.7463299Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.7464152Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.7464991Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.7465792Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.7466599Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.7467597Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.7468629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.7469444Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.7470201Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.7471141Z     return self._compile_to_module()
2025-12-04T12:15:05.7471971Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.7472752Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.7473575Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.7474367Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.7475133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.7475991Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.7477007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.7477861Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.7478594Z   File "/tmp/tmpgrziu__4/6u/c6uemlisc3hsoq7726bsp3fmh5ylwaujqxfeqxscwcgdf5lwfxby.py", line 65, in <module>
2025-12-04T12:15:05.7479675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.7480400Z     kernel.precompile(
2025-12-04T12:15:05.7481149Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.7481954Z     self._precompile_worker()
2025-12-04T12:15:05.7483186Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.7484111Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.7485047Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7485976Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7486777Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7487625Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7488466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7489381Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7490092Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.7491112Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7491986Z ^
2025-12-04T12:15:05.7492667Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7493277Z 
2025-12-04T12:15:05.7494000Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.7494845Z 
2025-12-04T12:15:05.7494850Z 
2025-12-04T12:15:05.7495089Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.7496172Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T12:15:05.7497090Z 
2025-12-04T12:15:05.7497421Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.7498067Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7498547Z frames [('total', 1)]
2025-12-04T12:15:05.7499069Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7499759Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7500649Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7501123Z graph_break []
2025-12-04T12:15:05.7501682Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _
2025-12-04T12:15:05.7502367Z Traceback (most recent call last):
2025-12-04T12:15:05.7503038Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.7503825Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.7504686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.7505604Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.7506509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.7507345Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.7508183Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.7508982Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.7509809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.7510796Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.7511798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.7512615Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.7513378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.7514110Z     return self._compile_to_module()
2025-12-04T12:15:05.7514850Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.7515646Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.7516449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.7517245Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.7517998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.7518869Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.7519860Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.7520720Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.7521486Z   File "/tmp/tmpkckkfsyj/es/ces7zyrn5mfa4cjuiroujifwbbngjz4ts3ahgghgeak7bx3qq2at.py", line 65, in <module>
2025-12-04T12:15:05.7522600Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.7523310Z     kernel.precompile(
2025-12-04T12:15:05.7524061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.7524877Z     self._precompile_worker()
2025-12-04T12:15:05.7525726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.7526651Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.7527567Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7528510Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7529285Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7530167Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7530998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7531926Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7532617Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.7533635Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7534561Z ^
2025-12-04T12:15:05.7535137Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7535744Z 
2025-12-04T12:15:05.7536534Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.7537394Z 
2025-12-04T12:15:05.7537399Z 
2025-12-04T12:15:05.7537619Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.7538695Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T12:15:05.7539544Z 
2025-12-04T12:15:05.7539837Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.7540461Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7540935Z frames [('total', 1)]
2025-12-04T12:15:05.7541249Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7541921Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7542754Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7543224Z graph_break []
2025-12-04T12:15:05.7543607Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7544061Z frames [('total', 1)]
2025-12-04T12:15:05.7544360Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7544796Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7545607Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7546320Z graph_break []
2025-12-04T12:15:05.7546631Z =================================== FAILURES ===================================
2025-12-04T12:15:05.7547390Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _
2025-12-04T12:15:05.7548056Z Traceback (most recent call last):
2025-12-04T12:15:05.7548727Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.7549528Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.7550021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.7550268Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.7550797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.7551032Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.7551553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.7551706Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.7552242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.7552635Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.7553155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.7553314Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.7553795Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.7553918Z     return self._compile_to_module()
2025-12-04T12:15:05.7561487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.7561843Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.7562396Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.7562550Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.7563057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.7563311Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.7563900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.7564032Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.7564548Z   File "/tmp/tmpj3600sez/lb/clbiv5vsuuc2fiqaak2ofncdtzl3q7rgj245pgjl43nlxhfgzwpt.py", line 65, in <module>
2025-12-04T12:15:05.7565014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.7565134Z     kernel.precompile(
2025-12-04T12:15:05.7565705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.7565831Z     self._precompile_worker()
2025-12-04T12:15:05.7566439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.7566621Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.7567213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7567430Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7567885Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7568152Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7568642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7568982Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7569231Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.7569888Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7569994Z ^
2025-12-04T12:15:05.7570456Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7570464Z 
2025-12-04T12:15:05.7571500Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.7571511Z 
2025-12-04T12:15:05.7571516Z 
2025-12-04T12:15:05.7571753Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.7572472Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T12:15:05.7572534Z 
2025-12-04T12:15:05.7572822Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.7573052Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7573159Z frames [('total', 1)]
2025-12-04T12:15:05.7573295Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7573768Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7574009Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7574187Z graph_break []
2025-12-04T12:15:05.7574416Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7574531Z frames [('total', 1)]
2025-12-04T12:15:05.7574647Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7574866Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7575346Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7575446Z graph_break []
2025-12-04T12:15:05.7575664Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7575783Z frames [('total', 1)]
2025-12-04T12:15:05.7575900Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7576131Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7576663Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7576766Z graph_break []
2025-12-04T12:15:05.7577434Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7c20e7902388541e.xml -
2025-12-04T12:15:05.7577609Z =========================== short test summary info ============================
2025-12-04T12:15:05.7578496Z FAILED [0.4526s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.7579141Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7579231Z ^
2025-12-04T12:15:05.7579707Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7579713Z 
2025-12-04T12:15:05.7580484Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.7580493Z 
2025-12-04T12:15:05.7580498Z 
2025-12-04T12:15:05.7580731Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.7581451Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T12:15:05.7581457Z 
2025-12-04T12:15:05.7581736Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.7581918Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.7582121Z ================== 1 failed, 187 deselected, 2 rerun in 4.41s ==================
2025-12-04T12:15:05.7582237Z Got exit code 1
2025-12-04T12:15:05.7582379Z Retrying single test...
2025-12-04T12:15:05.7582846Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43cf13c151388d8e.xml
2025-12-04T12:15:05.7583032Z ============================= test session starts ==============================
2025-12-04T12:15:05.7583385Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.7583528Z cachedir: .pytest_cache
2025-12-04T12:15:05.7584059Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.7584186Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.7584307Z configfile: pytest.ini
2025-12-04T12:15:05.7584899Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.7585118Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.7585934Z stepcurrent: skipping 31 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T12:15:05.7586085Z Running 1 items in this shard
2025-12-04T12:15:05.7586091Z 
2025-12-04T12:15:05.7587454Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.7588543Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7589002Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T12:15:05.7589455Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.7589924Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.7590473Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.7591011Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.7591604Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.7592187Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.7592737Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.7593240Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.7593673Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.7594281Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.7594863Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.7595500Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.7596093Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.7596631Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.7597202Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.7597697Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.7598190Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.7598656Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.7599445Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.7600014Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.7600607Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7601342Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.7601945Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.7602349Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.7602982Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.7603572Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.7604232Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.7604936Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.7605428Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.7605904Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.7606410Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.7607058Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.7607583Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.7608140Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.7608767Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.7609297Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.7609838Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.7610326Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.7610850Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.7611315Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.7612102Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.7612645Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.7613179Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.7613658Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.7614161Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.7614630Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.7615124Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.7615660Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.7616170Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.7616756Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.7617365Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7617947Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.7618508Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T12:15:05.7619018Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.7619519Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.7620109Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.7620572Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.7621145Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.7621696Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.7622328Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T12:15:05.7622924Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.7623469Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T12:15:05.7623895Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.7626253Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.7626831Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.7627876Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7628498Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7629407Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7630090Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7630986Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7631758Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7632377Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.7633583Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7633952Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.7634860Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7634995Z ('RERUN', {'yellow': True}) [3.4453s] [100%]
2025-12-04T12:15:05.7636389Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.7637474Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7637938Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T12:15:05.7638417Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.7638876Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.7639423Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.7639964Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.7640569Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.7641182Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.7641735Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.7642199Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.7642629Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.7643243Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.7643825Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.7644451Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.7645032Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.7645562Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.7646099Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.7646593Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.7647120Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.7647589Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.7648370Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.7648906Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.7649490Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7650247Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.7650855Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.7651267Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.7651918Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.7652511Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.7653168Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.7653880Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.7654400Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.7654878Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.7655351Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.7655999Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.7656594Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.7657157Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.7657741Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.7658272Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.7658814Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.7659305Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.7659802Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.7660268Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.7661130Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.7661666Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.7662162Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.7662640Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.7663171Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.7663646Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.7664148Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.7664684Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.7665225Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.7665749Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.7666358Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7666940Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.7667538Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T12:15:05.7668048Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.7668516Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.7669109Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.7669568Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.7670147Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.7670706Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.7671531Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T12:15:05.7672128Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.7672672Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T12:15:05.7673049Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.7675474Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.7676030Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.7677121Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7677759Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7678667Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7679389Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7680284Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7681055Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7681728Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.7682816Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7683192Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.7684081Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7684216Z ('RERUN', {'yellow': True}) [0.4550s] [100%]
2025-12-04T12:15:05.7685581Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.7686666Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7687123Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T12:15:05.7687570Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.7688032Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.7688619Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.7689161Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.7689757Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.7690338Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.7690906Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.7691390Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.7691826Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.7692437Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.7693054Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.7693669Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.7694244Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.7694774Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.7695346Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.7695833Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.7696392Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.7696863Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.7697642Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.7698183Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.7698776Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7699504Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.7700107Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.7700520Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.7701142Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.7701777Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.7702438Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.7703144Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.7703634Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.7704111Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.7704613Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.7705267Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.7705791Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.7706380Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.7706960Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.7707501Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.7708033Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.7709171Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.7709669Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.7710141Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.7710951Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.7711482Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.7711986Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.7712468Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.7712971Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.7713456Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.7713953Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.7714491Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.7715002Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.7715523Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.7716172Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7716750Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.7717310Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T12:15:05.7717818Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.7718314Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.7718898Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.7719359Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.7719946Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.7720515Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.7721157Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T12:15:05.7721835Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.7722385Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T12:15:05.7722798Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.7725148Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.7725697Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.7726744Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7727384Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7728283Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7728963Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7729888Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7730657Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7731286Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.7732402Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7732781Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.7733678Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7733816Z FAILED [0.4530s] [100%]
2025-12-04T12:15:05.7733823Z 
2025-12-04T12:15:05.7733980Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.7734387Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _
2025-12-04T12:15:05.7734522Z Traceback (most recent call last):
2025-12-04T12:15:05.7734947Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.7735181Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.7735684Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.7735967Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.7736565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.7736761Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.7737272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.7737433Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.7737969Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.7738288Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.7738824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.7738974Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.7739471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.7739597Z     return self._compile_to_module()
2025-12-04T12:15:05.7740086Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.7740265Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.7740780Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.7740926Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.7741421Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.7741658Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.7742301Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.7742430Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.7742933Z   File "/tmp/tmp5ggmxwz2/ek/cekg2gokxegjc25gieptllhpsmseog5b543tjrgeetz65iuizaik.py", line 65, in <module>
2025-12-04T12:15:05.7743411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.7743524Z     kernel.precompile(
2025-12-04T12:15:05.7744090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.7744209Z     self._precompile_worker()
2025-12-04T12:15:05.7744834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.7745029Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.7745631Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7745848Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7746332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7746578Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7747035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7747370Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7747599Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.7748260Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7748383Z ^
2025-12-04T12:15:05.7748855Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7748862Z 
2025-12-04T12:15:05.7749570Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.7749580Z 
2025-12-04T12:15:05.7749585Z 
2025-12-04T12:15:05.7749816Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.7750534Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T12:15:05.7750540Z 
2025-12-04T12:15:05.7750810Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.7751048Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7751156Z frames [('total', 1)]
2025-12-04T12:15:05.7751289Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7751758Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7751983Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7752096Z graph_break []
2025-12-04T12:15:05.7752516Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _
2025-12-04T12:15:05.7752643Z Traceback (most recent call last):
2025-12-04T12:15:05.7753081Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.7753612Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.7754124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.7754379Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.7754945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.7755158Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.7755675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.7755825Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.7756379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.7756703Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.7757268Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.7757423Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.7757907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.7758050Z     return self._compile_to_module()
2025-12-04T12:15:05.7758585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.7758762Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.7759277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.7759409Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.7759922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.7760159Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.7760783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.7760932Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.7761431Z   File "/tmp/tmpssk9n14d/i7/ci7ig3lx3xyndr2ivt262fdzrmbukp6ilf3mdund7p2x3k6uj7r5.py", line 65, in <module>
2025-12-04T12:15:05.7761921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.7762037Z     kernel.precompile(
2025-12-04T12:15:05.7762593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.7762728Z     self._precompile_worker()
2025-12-04T12:15:05.7763330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.7763527Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.7764130Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7764331Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7764792Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7765045Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7765489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7765841Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7766070Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.7766741Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7766836Z ^
2025-12-04T12:15:05.7767328Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7767334Z 
2025-12-04T12:15:05.7768060Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.7768069Z 
2025-12-04T12:15:05.7768074Z 
2025-12-04T12:15:05.7768292Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.7769025Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T12:15:05.7769030Z 
2025-12-04T12:15:05.7769351Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.7769598Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7769707Z frames [('total', 1)]
2025-12-04T12:15:05.7769828Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7770308Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7770564Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7770664Z graph_break []
2025-12-04T12:15:05.7770898Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7771209Z frames [('total', 1)]
2025-12-04T12:15:05.7771344Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7771565Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7772027Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7772144Z graph_break []
2025-12-04T12:15:05.7772296Z =================================== FAILURES ===================================
2025-12-04T12:15:05.7772798Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _
2025-12-04T12:15:05.7772938Z Traceback (most recent call last):
2025-12-04T12:15:05.7773365Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.7773615Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.7774105Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.7774356Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.7774886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.7775088Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.7775604Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.7775773Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.7776369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.7776713Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.7777236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.7777385Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.7777884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.7778009Z     return self._compile_to_module()
2025-12-04T12:15:05.7778508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.7778678Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.7779254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.7779400Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.7779896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.7780128Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.7780725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.7780852Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.7781382Z   File "/tmp/tmp938xof_7/g2/cg246gs4mive6nya6r23nldbjauykvbeqccxb75i2ywxwkjrs263.py", line 65, in <module>
2025-12-04T12:15:05.7781846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.7781969Z     kernel.precompile(
2025-12-04T12:15:05.7782536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.7782706Z     self._precompile_worker()
2025-12-04T12:15:05.7783314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.7783496Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.7784087Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7784304Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7784760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7785086Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7785534Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7785874Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7786120Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.7786775Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7786868Z ^
2025-12-04T12:15:05.7787343Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7787349Z 
2025-12-04T12:15:05.7788062Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.7788070Z 
2025-12-04T12:15:05.7788075Z 
2025-12-04T12:15:05.7788308Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.7789027Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T12:15:05.7789035Z 
2025-12-04T12:15:05.7789320Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.7789546Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7789654Z frames [('total', 1)]
2025-12-04T12:15:05.7789791Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7790254Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7790482Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7790597Z graph_break []
2025-12-04T12:15:05.7790865Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7790987Z frames [('total', 1)]
2025-12-04T12:15:05.7791105Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7791323Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7791800Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7791905Z graph_break []
2025-12-04T12:15:05.7792122Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7792242Z frames [('total', 1)]
2025-12-04T12:15:05.7792357Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7792588Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7793076Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7793179Z graph_break []
2025-12-04T12:15:05.7793839Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43cf13c151388d8e.xml -
2025-12-04T12:15:05.7794015Z =========================== short test summary info ============================
2025-12-04T12:15:05.7794908Z FAILED [0.4530s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.7795568Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.7795660Z ^
2025-12-04T12:15:05.7796133Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7796139Z 
2025-12-04T12:15:05.7796849Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.7796888Z 
2025-12-04T12:15:05.7796893Z 
2025-12-04T12:15:05.7797123Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.7797844Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T12:15:05.7797850Z 
2025-12-04T12:15:05.7798116Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.7798317Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.7798522Z ================== 1 failed, 187 deselected, 2 rerun in 4.40s ==================
2025-12-04T12:15:05.7798640Z Got exit code 1
2025-12-04T12:15:05.7799276Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T12:15:05.7799697Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:05.7800177Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-27661fe34019a4f8.xml
2025-12-04T12:15:05.7800346Z ============================= test session starts ==============================
2025-12-04T12:15:05.7800711Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.7800822Z cachedir: .pytest_cache
2025-12-04T12:15:05.7801342Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.7801479Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.7801589Z configfile: pytest.ini
2025-12-04T12:15:05.7802180Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.7802455Z collecting ... collected 188 items / 32 deselected / 156 selected
2025-12-04T12:15:05.7802601Z stepcurrent: skipping 32 already run items.
2025-12-04T12:15:05.7802730Z Running 156 items in this shard
2025-12-04T12:15:05.7802737Z 
2025-12-04T12:15:05.7804160Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T12:15:05.7805308Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.7805755Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.7806200Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.7806728Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.7807221Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.7807773Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.7808314Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.7808901Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.7809535Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.7810091Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.7810551Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.7811065Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.7811533Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.7812006Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.7812458Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.7813125Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.7813656Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.7814198Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T12:15:05.7814718Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.7815306Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7815899Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T12:15:05.7816554Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7817094Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T12:15:05.7817673Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.7818206Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T12:15:05.7818774Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.7819264Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T12:15:05.7819759Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T12:15:05.7820281Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T12:15:05.7820872Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7821425Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T12:15:05.7822011Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.7822502Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T12:15:05.7822984Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T12:15:05.7823479Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T12:15:05.7823935Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T12:15:05.7824419Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T12:15:05.7824959Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T12:15:05.7825444Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T12:15:05.7825953Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T12:15:05.7826556Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7827135Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T12:15:05.7827789Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.7828272Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T12:15:05.7828718Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T12:15:05.7829361Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T12:15:05.7829805Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T12:15:05.7830398Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T12:15:05.7830935Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T12:15:05.7831451Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T12:15:05.7832211Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T12:15:05.7832930Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T12:15:05.7833315Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.7835458Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.7836012Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.7837173Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7837821Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7838715Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7839411Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7840301Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7841072Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7841698Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.7842793Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.7843178Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.7844121Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7844275Z ('RERUN', {'yellow': True}) [3.3658s] [  0%]
2025-12-04T12:15:05.7845694Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T12:15:05.7846810Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.7847264Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.7847704Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.7848262Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.7848724Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.7849258Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.7849820Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.7850402Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.7851032Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.7851587Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.7852049Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.7852563Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.7853033Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.7853504Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.7853956Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.7854615Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.7855144Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.7855690Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T12:15:05.7856205Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.7856885Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7857492Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T12:15:05.7858072Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7858610Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T12:15:05.7859196Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.7859729Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T12:15:05.7860287Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.7860781Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T12:15:05.7861257Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T12:15:05.7861786Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T12:15:05.7862375Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7862925Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T12:15:05.7863502Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.7864037Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T12:15:05.7864475Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T12:15:05.7864968Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T12:15:05.7865423Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T12:15:05.7865907Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T12:15:05.7866445Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T12:15:05.7866926Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T12:15:05.7867438Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T12:15:05.7868042Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7868620Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T12:15:05.7869268Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.7869749Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T12:15:05.7870194Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T12:15:05.7870839Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T12:15:05.7871926Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T12:15:05.7872518Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T12:15:05.7873048Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T12:15:05.7873560Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T12:15:05.7874385Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T12:15:05.7875106Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T12:15:05.7875483Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.7877779Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.7878400Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.7879442Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7880087Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7880980Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7881662Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7882566Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7883340Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7883966Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.7885061Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.7885442Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.7886391Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7886530Z ('RERUN', {'yellow': True}) [0.4038s] [  0%]
2025-12-04T12:15:05.7887968Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T12:15:05.7889090Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.7889546Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.7889990Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.7890552Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.7891015Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.7891547Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.7892103Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.7892688Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.7893311Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.7893868Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.7894324Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.7894840Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.7895312Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.7895782Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.7896229Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.7896952Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.7897480Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.7898028Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T12:15:05.7898555Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.7899142Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7899729Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T12:15:05.7900307Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7900840Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T12:15:05.7901425Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.7901994Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T12:15:05.7902522Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.7903016Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T12:15:05.7903493Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T12:15:05.7904017Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T12:15:05.7904603Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7905151Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T12:15:05.7905731Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.7906273Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T12:15:05.7906711Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T12:15:05.7907201Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T12:15:05.7907658Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T12:15:05.7908136Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T12:15:05.7908675Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T12:15:05.7909152Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T12:15:05.7909659Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T12:15:05.7910255Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.7910828Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T12:15:05.7911475Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.7911957Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T12:15:05.7912400Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T12:15:05.7913026Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T12:15:05.7913464Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T12:15:05.7914047Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T12:15:05.7914581Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T12:15:05.7915094Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T12:15:05.7915843Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T12:15:05.7916557Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T12:15:05.7916966Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.7919058Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.7919640Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.7920684Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7921332Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7922221Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7922911Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7923813Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7924587Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7925210Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.7926305Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.7926727Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.7927622Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7927730Z FAILED [0.4035s] [  0%]
2025-12-04T12:15:05.7927752Z 
2025-12-04T12:15:05.7927901Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.7928283Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _
2025-12-04T12:15:05.7928423Z Traceback (most recent call last):
2025-12-04T12:15:05.7928848Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.7929115Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.7929623Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.7929875Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.7930400Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.7930628Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.7931136Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.7931296Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.7931828Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.7932151Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.7932685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.7932872Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.7933378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.7933504Z     return self._compile_to_module()
2025-12-04T12:15:05.7933989Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.7934168Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.7934687Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.7934835Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.7935336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.7935571Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.7936173Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.7936364Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.7936878Z   File "/tmp/tmp6qkidtfc/qd/cqdvosxiuun73ceezczqcvaxeq2b2mbglfras55yapfjpp5bt4sc.py", line 74, in <module>
2025-12-04T12:15:05.7937361Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.7937476Z     kernel.precompile(
2025-12-04T12:15:05.7938047Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.7938170Z     self._precompile_worker()
2025-12-04T12:15:05.7938771Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.7938972Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.7939613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7939830Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7940286Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7940533Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7940996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7941333Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7941605Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.7942265Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.7942361Z ^
2025-12-04T12:15:05.7942839Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7942876Z 
2025-12-04T12:15:05.7943593Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.7943599Z 
2025-12-04T12:15:05.7943604Z 
2025-12-04T12:15:05.7943840Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.7944529Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T12:15:05.7944538Z 
2025-12-04T12:15:05.7944810Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.7945086Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7945195Z frames [('total', 1)]
2025-12-04T12:15:05.7945329Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7945797Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7946019Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7946138Z graph_break []
2025-12-04T12:15:05.7946519Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _
2025-12-04T12:15:05.7946645Z Traceback (most recent call last):
2025-12-04T12:15:05.7947085Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.7947321Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.7947828Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.7948082Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.7948596Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.7948809Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.7949318Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.7949464Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.7950011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.7950332Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.7950861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.7951042Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.7951521Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.7951660Z     return self._compile_to_module()
2025-12-04T12:15:05.7952145Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.7952323Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.7952837Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.7952966Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.7953504Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.7953738Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.7954341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.7954470Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.7954992Z   File "/tmp/tmp6_pgr861/bq/cbq2v2em5kgujuwm57jdnkrros4hjaz72skgv6ptkwvq772rugcf.py", line 74, in <module>
2025-12-04T12:15:05.7955462Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.7955575Z     kernel.precompile(
2025-12-04T12:15:05.7956125Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.7956255Z     self._precompile_worker()
2025-12-04T12:15:05.7956853Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.7957081Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.7957675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7957876Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7958343Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7958592Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7959047Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7959381Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7959611Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.7960274Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.7960370Z ^
2025-12-04T12:15:05.7960827Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7960847Z 
2025-12-04T12:15:05.7961557Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.7961563Z 
2025-12-04T12:15:05.7961568Z 
2025-12-04T12:15:05.7961785Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.7962489Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T12:15:05.7962494Z 
2025-12-04T12:15:05.7962766Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.7963001Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7963148Z frames [('total', 1)]
2025-12-04T12:15:05.7963267Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7963750Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7963976Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7964077Z graph_break []
2025-12-04T12:15:05.7964309Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7964415Z frames [('total', 1)]
2025-12-04T12:15:05.7964541Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7964759Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7965251Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7965367Z graph_break []
2025-12-04T12:15:05.7965516Z =================================== FAILURES ===================================
2025-12-04T12:15:05.7965898Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _
2025-12-04T12:15:05.7966037Z Traceback (most recent call last):
2025-12-04T12:15:05.7966501Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.7966746Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.7967234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.7967482Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.7968010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.7968202Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.7968761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.7968909Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.7969441Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.7969778Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.7970296Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.7970443Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.7971136Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.7971266Z     return self._compile_to_module()
2025-12-04T12:15:05.7971767Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.7971938Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.7972452Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.7972601Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.7973097Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.7973344Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.7973934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.7974061Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.7974573Z   File "/tmp/tmplvqn329b/p5/cp5t2f53rpk7o5z6kw7x4uydjclzdisjyg3s3cy22fswwkis34a6.py", line 74, in <module>
2025-12-04T12:15:05.7975115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.7975229Z     kernel.precompile(
2025-12-04T12:15:05.7975800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.7975921Z     self._precompile_worker()
2025-12-04T12:15:05.7976605Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.7976787Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.7977381Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.7977653Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.7978104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.7978367Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.7978808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.7979184Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.7979425Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.7980075Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.7980166Z ^
2025-12-04T12:15:05.7980636Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7980642Z 
2025-12-04T12:15:05.7981351Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.7981403Z 
2025-12-04T12:15:05.7981410Z 
2025-12-04T12:15:05.7981640Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.7982322Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T12:15:05.7982331Z 
2025-12-04T12:15:05.7982612Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.7982833Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7982938Z frames [('total', 1)]
2025-12-04T12:15:05.7983067Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7983534Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7983756Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7983871Z graph_break []
2025-12-04T12:15:05.7984090Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7984209Z frames [('total', 1)]
2025-12-04T12:15:05.7984327Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7984549Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7985022Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7985124Z graph_break []
2025-12-04T12:15:05.7985340Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.7985456Z frames [('total', 1)]
2025-12-04T12:15:05.7985571Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.7985806Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.7986266Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.7986369Z graph_break []
2025-12-04T12:15:05.7987067Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-27661fe34019a4f8.xml -
2025-12-04T12:15:05.7987246Z =========================== short test summary info ============================
2025-12-04T12:15:05.7988085Z FAILED [0.4035s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.7988745Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.7988837Z ^
2025-12-04T12:15:05.7989339Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.7989346Z 
2025-12-04T12:15:05.7990063Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.7990070Z 
2025-12-04T12:15:05.7990074Z 
2025-12-04T12:15:05.7990306Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.7991028Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T12:15:05.7991033Z 
2025-12-04T12:15:05.7991303Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.7991498Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.7991698Z ================== 1 failed, 32 deselected, 2 rerun in 4.22s ===================
2025-12-04T12:15:05.7991818Z Got exit code 1
2025-12-04T12:15:05.7991927Z Retrying single test...
2025-12-04T12:15:05.7992401Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-63ef36c446edecf7.xml
2025-12-04T12:15:05.7992638Z ============================= test session starts ==============================
2025-12-04T12:15:05.7992991Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.7993105Z cachedir: .pytest_cache
2025-12-04T12:15:05.7993638Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.7993766Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.7993889Z configfile: pytest.ini
2025-12-04T12:15:05.7994482Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.7994707Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.7995489Z stepcurrent: skipping 32 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T12:15:05.7995607Z Running 1 items in this shard
2025-12-04T12:15:05.7995612Z 
2025-12-04T12:15:05.7997039Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T12:15:05.7998141Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.7998588Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.7999064Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.7999575Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.8000054Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.8000590Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.8001144Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.8001777Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.8002365Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.8002936Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.8003410Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.8003940Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.8004414Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.8004870Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.8005335Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.8006017Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.8006557Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.8007109Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T12:15:05.8007617Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.8008221Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8008759Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T12:15:05.8009364Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8009904Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T12:15:05.8010491Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8011031Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T12:15:05.8011542Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.8012049Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T12:15:05.8012560Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T12:15:05.8013055Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T12:15:05.8013648Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8014189Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T12:15:05.8014783Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8015295Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T12:15:05.8015753Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T12:15:05.8016245Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T12:15:05.8016783Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T12:15:05.8017283Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T12:15:05.8017812Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T12:15:05.8018311Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T12:15:05.8018816Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T12:15:05.8019443Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8020034Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T12:15:05.8020677Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8021173Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T12:15:05.8021617Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T12:15:05.8022192Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T12:15:05.8022650Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T12:15:05.8023220Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T12:15:05.8023765Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T12:15:05.8024278Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T12:15:05.8025000Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T12:15:05.8025743Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T12:15:05.8026115Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.8028209Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.8028775Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.8029832Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8030495Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8031406Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8032090Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8032990Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8033790Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8034398Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.8035506Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8035873Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.8036779Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8036916Z ('RERUN', {'yellow': True}) [3.3944s] [100%]
2025-12-04T12:15:05.8038351Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T12:15:05.8039434Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8039875Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.8040354Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.8040875Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.8041356Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.8041891Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.8042442Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.8043060Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.8043649Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.8044219Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.8044775Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.8045301Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.8045772Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.8046236Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.8046695Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.8047370Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.8047908Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.8048453Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T12:15:05.8048960Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.8049555Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8050084Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T12:15:05.8050679Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8051218Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T12:15:05.8051782Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8052329Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T12:15:05.8052843Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.8053343Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T12:15:05.8053874Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T12:15:05.8054369Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T12:15:05.8054958Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8055495Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T12:15:05.8056122Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8056682Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T12:15:05.8057139Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T12:15:05.8057629Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T12:15:05.8058109Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T12:15:05.8058604Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T12:15:05.8059132Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T12:15:05.8059628Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T12:15:05.8060135Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T12:15:05.8060755Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8061342Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T12:15:05.8061979Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8062471Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T12:15:05.8062913Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T12:15:05.8063479Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T12:15:05.8063934Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T12:15:05.8064505Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T12:15:05.8065047Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T12:15:05.8065560Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T12:15:05.8066264Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T12:15:05.8067011Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T12:15:05.8067380Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.8069526Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.8070065Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.8071317Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8072024Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8072938Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8073626Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8074521Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8075335Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8075943Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.8077046Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8077416Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.8078320Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8078455Z ('RERUN', {'yellow': True}) [0.4155s] [100%]
2025-12-04T12:15:05.8079884Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T12:15:05.8080968Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8081397Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.8081899Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.8082414Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.8082886Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.8083416Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.8083954Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.8084584Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.8085173Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.8085740Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.8086208Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.8086737Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.8087208Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.8087671Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.8088173Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.8088817Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.8089358Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.8089903Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T12:15:05.8090408Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.8091003Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8091539Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T12:15:05.8092131Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8092664Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T12:15:05.8093238Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8093785Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T12:15:05.8094294Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.8094824Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T12:15:05.8095306Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T12:15:05.8095789Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T12:15:05.8096452Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8096995Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T12:15:05.8097616Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8098097Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T12:15:05.8098549Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T12:15:05.8099043Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T12:15:05.8099515Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T12:15:05.8100007Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T12:15:05.8100534Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T12:15:05.8101027Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T12:15:05.8101584Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T12:15:05.8102170Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8102760Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T12:15:05.8103397Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8103890Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T12:15:05.8104335Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T12:15:05.8104907Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T12:15:05.8105371Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T12:15:05.8105947Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T12:15:05.8106499Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T12:15:05.8107015Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T12:15:05.8107722Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T12:15:05.8108476Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T12:15:05.8108844Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.8110986Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.8111533Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.8112590Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8113249Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8114150Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8114836Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8115768Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8116539Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8117153Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.8118264Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8118635Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.8119552Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8119665Z FAILED [0.4246s] [100%]
2025-12-04T12:15:05.8119670Z 
2025-12-04T12:15:05.8119819Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.8120217Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _
2025-12-04T12:15:05.8120346Z Traceback (most recent call last):
2025-12-04T12:15:05.8120793Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.8121034Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.8121526Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.8121831Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.8122348Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.8122560Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.8123071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.8123219Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.8123766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.8124089Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.8124645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.8124816Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.8125297Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.8125435Z     return self._compile_to_module()
2025-12-04T12:15:05.8125954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.8126124Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.8126656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.8126791Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.8127300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.8127536Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.8128152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.8128294Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.8128794Z   File "/tmp/tmpovscfvm8/2t/c2tvd4waojawcq2jd6vt4hx66yjsr73vag7aljgtsa2xwwxxq476.py", line 74, in <module>
2025-12-04T12:15:05.8129271Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.8129383Z     kernel.precompile(
2025-12-04T12:15:05.8129939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.8130075Z     self._precompile_worker()
2025-12-04T12:15:05.8130669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.8130848Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.8131458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8131654Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8132115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8132361Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8132801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8133150Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8133376Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.8134041Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8134134Z ^
2025-12-04T12:15:05.8134623Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8134629Z 
2025-12-04T12:15:05.8135351Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.8135360Z 
2025-12-04T12:15:05.8135365Z 
2025-12-04T12:15:05.8135580Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.8136350Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T12:15:05.8136357Z 
2025-12-04T12:15:05.8136670Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.8136897Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8137021Z frames [('total', 1)]
2025-12-04T12:15:05.8137142Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8137626Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.8137879Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8137979Z graph_break []
2025-12-04T12:15:05.8138373Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _
2025-12-04T12:15:05.8138499Z Traceback (most recent call last):
2025-12-04T12:15:05.8138924Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.8139171Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.8139659Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.8139950Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.8140466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.8140659Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.8141188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.8141335Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.8141868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.8142203Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.8142726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.8142889Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.8143371Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.8143493Z     return self._compile_to_module()
2025-12-04T12:15:05.8143994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.8144157Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.8144682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.8144818Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.8145313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.8145562Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.8146181Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.8146325Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.8146825Z   File "/tmp/tmpnhz4w3ix/ef/cefcji5ilwaclwro5i25bancohnqhnyhfzwxnybhn7chngw53xcd.py", line 74, in <module>
2025-12-04T12:15:05.8147289Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.8147414Z     kernel.precompile(
2025-12-04T12:15:05.8147966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.8148084Z     self._precompile_worker()
2025-12-04T12:15:05.8148723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.8148902Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.8149512Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8149710Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8150159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8150466Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8150909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8151255Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8151481Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.8152132Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8152268Z ^
2025-12-04T12:15:05.8152729Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8152735Z 
2025-12-04T12:15:05.8153455Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.8153463Z 
2025-12-04T12:15:05.8153468Z 
2025-12-04T12:15:05.8153685Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.8154372Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T12:15:05.8154377Z 
2025-12-04T12:15:05.8154657Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.8154881Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8155002Z frames [('total', 1)]
2025-12-04T12:15:05.8155119Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8155584Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.8155821Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8155922Z graph_break []
2025-12-04T12:15:05.8156140Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8156258Z frames [('total', 1)]
2025-12-04T12:15:05.8156374Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8156606Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8157065Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.8157168Z graph_break []
2025-12-04T12:15:05.8157327Z =================================== FAILURES ===================================
2025-12-04T12:15:05.8157742Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _
2025-12-04T12:15:05.8157868Z Traceback (most recent call last):
2025-12-04T12:15:05.8158305Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.8158540Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.8159040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.8159288Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.8159801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.8160037Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.8160546Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.8160709Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.8161239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.8161591Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.8162122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.8162272Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.8162750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.8162886Z     return self._compile_to_module()
2025-12-04T12:15:05.8163373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.8163579Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.8164098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.8164228Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.8164736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.8164968Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.8165565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.8165694Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.8166194Z   File "/tmp/tmpnsm7xs1c/lp/clprq5564qirf3ierb2tnnvy6ickadup4jypktvrti6ssgf45oyd.py", line 74, in <module>
2025-12-04T12:15:05.8166670Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.8166785Z     kernel.precompile(
2025-12-04T12:15:05.8167344Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.8167478Z     self._precompile_worker()
2025-12-04T12:15:05.8168078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.8168269Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.8168864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8169060Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8169525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8169770Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8170258Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8170598Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8170825Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.8171692Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8171786Z ^
2025-12-04T12:15:05.8172243Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8172264Z 
2025-12-04T12:15:05.8173053Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.8173062Z 
2025-12-04T12:15:05.8173067Z 
2025-12-04T12:15:05.8173285Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.8173990Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T12:15:05.8174039Z 
2025-12-04T12:15:05.8174306Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.8174542Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8174648Z frames [('total', 1)]
2025-12-04T12:15:05.8174765Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8175248Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.8175475Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8175576Z graph_break []
2025-12-04T12:15:05.8175855Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8175962Z frames [('total', 1)]
2025-12-04T12:15:05.8176090Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8176372Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8176839Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.8176956Z graph_break []
2025-12-04T12:15:05.8177174Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8177280Z frames [('total', 1)]
2025-12-04T12:15:05.8177414Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8177633Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8178111Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.8178213Z graph_break []
2025-12-04T12:15:05.8178872Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-63ef36c446edecf7.xml -
2025-12-04T12:15:05.8179062Z =========================== short test summary info ============================
2025-12-04T12:15:05.8179899Z FAILED [0.4246s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.8180559Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8180654Z ^
2025-12-04T12:15:05.8181115Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8181121Z 
2025-12-04T12:15:05.8181849Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.8181908Z 
2025-12-04T12:15:05.8181914Z 
2025-12-04T12:15:05.8182135Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.8182840Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T12:15:05.8182848Z 
2025-12-04T12:15:05.8183117Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.8183301Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.8183524Z ================== 1 failed, 187 deselected, 2 rerun in 4.28s ==================
2025-12-04T12:15:05.8183628Z Got exit code 1
2025-12-04T12:15:05.8183752Z Retrying single test...
2025-12-04T12:15:05.8184254Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-818cc5e6f257d295.xml
2025-12-04T12:15:05.8184428Z ============================= test session starts ==============================
2025-12-04T12:15:05.8184795Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.8184908Z cachedir: .pytest_cache
2025-12-04T12:15:05.8185464Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.8185610Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.8185722Z configfile: pytest.ini
2025-12-04T12:15:05.8186333Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.8186554Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.8187326Z stepcurrent: skipping 32 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T12:15:05.8187493Z Running 1 items in this shard
2025-12-04T12:15:05.8187502Z 
2025-12-04T12:15:05.8188921Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T12:15:05.8190030Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8190466Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.8190921Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.8191443Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.8191907Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.8192456Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.8193000Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.8193604Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.8194192Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.8194782Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.8195247Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.8195765Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.8196245Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.8196704Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.8197195Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.8197857Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.8198384Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.8198971Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T12:15:05.8199477Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.8200056Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8200606Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T12:15:05.8201189Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8201780Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T12:15:05.8202354Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8202898Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T12:15:05.8203408Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.8203895Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T12:15:05.8204384Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T12:15:05.8204866Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T12:15:05.8205466Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8206008Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T12:15:05.8206585Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8207077Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T12:15:05.8207515Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T12:15:05.8208053Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T12:15:05.8208494Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T12:15:05.8208978Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T12:15:05.8209521Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T12:15:05.8209999Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T12:15:05.8210550Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T12:15:05.8211146Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8211724Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T12:15:05.8212402Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8212882Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T12:15:05.8213336Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T12:15:05.8213912Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T12:15:05.8214385Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T12:15:05.8214967Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T12:15:05.8215502Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T12:15:05.8216031Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T12:15:05.8216802Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T12:15:05.8217523Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T12:15:05.8217892Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.8219997Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.8220553Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.8221647Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8222293Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8223187Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8223878Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8224793Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8225579Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8226217Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.8227324Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8227693Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.8228589Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8228775Z ('RERUN', {'yellow': True}) [3.3705s] [100%]
2025-12-04T12:15:05.8230194Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T12:15:05.8231297Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8231727Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.8232180Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.8232699Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.8233160Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.8233707Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.8234245Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.8234840Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.8235422Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.8236006Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.8236466Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.8236982Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.8237466Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.8237925Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.8238399Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.8239059Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.8239582Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.8240167Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T12:15:05.8240673Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.8241254Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8241805Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T12:15:05.8242415Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8242959Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T12:15:05.8243535Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8244084Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T12:15:05.8245205Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.8245698Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T12:15:05.8246197Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T12:15:05.8246677Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T12:15:05.8247283Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8247822Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T12:15:05.8248400Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8248892Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T12:15:05.8249328Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T12:15:05.8249899Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T12:15:05.8250345Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T12:15:05.8250829Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T12:15:05.8251370Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T12:15:05.8251847Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T12:15:05.8252479Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T12:15:05.8253071Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8253649Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T12:15:05.8254334Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8254815Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T12:15:05.8255274Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T12:15:05.8255846Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T12:15:05.8256397Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T12:15:05.8256984Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T12:15:05.8257531Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T12:15:05.8258179Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T12:15:05.8258886Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T12:15:05.8259611Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T12:15:05.8259980Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.8262069Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.8262625Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.8263744Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8264392Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8265285Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8265983Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8266897Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8267683Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8268319Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.8269412Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8269793Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.8270687Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8270866Z ('RERUN', {'yellow': True}) [0.4110s] [100%]
2025-12-04T12:15:05.8272483Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T12:15:05.8273588Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8274022Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.8274469Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T12:15:05.8274999Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T12:15:05.8275461Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.8276007Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.8276548Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.8277153Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.8277824Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.8278391Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.8278851Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.8279383Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.8279871Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.8280380Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.8280830Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T12:15:05.8281495Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.8282022Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.8282628Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T12:15:05.8283142Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.8283727Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8284275Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T12:15:05.8284901Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8285445Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T12:15:05.8286022Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8286557Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T12:15:05.8287391Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.8287890Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T12:15:05.8288390Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T12:15:05.8288869Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T12:15:05.8289473Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8290013Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T12:15:05.8290593Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8291091Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T12:15:05.8291587Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T12:15:05.8292092Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T12:15:05.8292539Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T12:15:05.8293021Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T12:15:05.8293566Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T12:15:05.8294106Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T12:15:05.8294635Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T12:15:05.8295226Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8295805Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T12:15:05.8296568Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8297056Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T12:15:05.8297524Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T12:15:05.8298096Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T12:15:05.8298605Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T12:15:05.8299200Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T12:15:05.8299738Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T12:15:05.8300268Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T12:15:05.8300980Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T12:15:05.8301699Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T12:15:05.8302064Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.8304159Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.8304715Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.8305808Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8306452Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8307339Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8308058Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8308945Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8309725Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8310367Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.8311449Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8311833Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.8312756Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8312879Z FAILED [0.4143s] [100%]
2025-12-04T12:15:05.8312885Z 
2025-12-04T12:15:05.8313032Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.8313411Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _
2025-12-04T12:15:05.8313553Z Traceback (most recent call last):
2025-12-04T12:15:05.8313979Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.8314229Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.8314721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.8314971Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.8315502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.8315695Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.8316223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.8316371Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.8316901Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.8317238Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.8317760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.8317911Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.8318439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.8318565Z     return self._compile_to_module()
2025-12-04T12:15:05.8319061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.8319229Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.8319743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.8319890Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.8320385Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.8320661Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.8321252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.8321383Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.8321904Z   File "/tmp/tmpbdqisg2q/sv/csvvymowwh2iuybyivtqdf7qhfhnalutevq7yvxg3yh52agrfyfp.py", line 74, in <module>
2025-12-04T12:15:05.8322397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.8322510Z     kernel.precompile(
2025-12-04T12:15:05.8323082Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.8323201Z     self._precompile_worker()
2025-12-04T12:15:05.8323805Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.8323990Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.8324584Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8324832Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8325284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8325544Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8325989Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8326326Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8326565Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.8327221Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8327329Z ^
2025-12-04T12:15:05.8327791Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8327797Z 
2025-12-04T12:15:05.8328507Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.8328517Z 
2025-12-04T12:15:05.8328522Z 
2025-12-04T12:15:05.8328753Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.8329440Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T12:15:05.8329446Z 
2025-12-04T12:15:05.8329731Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.8329959Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8330065Z frames [('total', 1)]
2025-12-04T12:15:05.8330199Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8330696Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.8330933Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8331038Z graph_break []
2025-12-04T12:15:05.8331418Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _
2025-12-04T12:15:05.8331556Z Traceback (most recent call last):
2025-12-04T12:15:05.8331979Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.8332212Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.8332741Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.8332991Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.8333518Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.8333709Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.8334218Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.8334409Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.8334940Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.8335273Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.8335790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.8335941Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.8336538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.8336662Z     return self._compile_to_module()
2025-12-04T12:15:05.8337146Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.8337329Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.8337846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.8337992Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.8338488Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.8338720Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.8339325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.8339454Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.8339962Z   File "/tmp/tmpmk786yzz/66/c66zje36jroboq56lns2fp755z7atndlwfgrbo7xtw4jylsgr5tj.py", line 74, in <module>
2025-12-04T12:15:05.8340430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.8340545Z     kernel.precompile(
2025-12-04T12:15:05.8341109Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.8341227Z     self._precompile_worker()
2025-12-04T12:15:05.8341827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.8342020Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.8342617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8342875Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8343327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8343573Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8344029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8344365Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8344609Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.8345292Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8345384Z ^
2025-12-04T12:15:05.8345854Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8345864Z 
2025-12-04T12:15:05.8346576Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.8346629Z 
2025-12-04T12:15:05.8346634Z 
2025-12-04T12:15:05.8346864Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.8347550Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T12:15:05.8347556Z 
2025-12-04T12:15:05.8347827Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.8348064Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8348172Z frames [('total', 1)]
2025-12-04T12:15:05.8348307Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8348809Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.8349032Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8349148Z graph_break []
2025-12-04T12:15:05.8349368Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8349473Z frames [('total', 1)]
2025-12-04T12:15:05.8349607Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8349827Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8350301Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.8350401Z graph_break []
2025-12-04T12:15:05.8350552Z =================================== FAILURES ===================================
2025-12-04T12:15:05.8350945Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _
2025-12-04T12:15:05.8351076Z Traceback (most recent call last):
2025-12-04T12:15:05.8351501Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.8351748Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.8352238Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.8352500Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.8358006Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.8358278Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.8358838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.8358997Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.8359623Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.8359963Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.8360493Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.8360659Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.8361142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.8361265Z     return self._compile_to_module()
2025-12-04T12:15:05.8361809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.8361979Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.8362498Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.8362644Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.8363139Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.8363429Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.8364018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.8364147Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.8364667Z   File "/tmp/tmp4jptl5lp/gx/cgxw5jl3tp2wk3lcynnubkv2md7le22vidarqvy4v3kovdwcbw26.py", line 74, in <module>
2025-12-04T12:15:05.8365133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.8365261Z     kernel.precompile(
2025-12-04T12:15:05.8365860Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.8365982Z     self._precompile_worker()
2025-12-04T12:15:05.8366596Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.8366783Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.8367378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8367592Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8368062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8368325Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8368771Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8369113Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8369359Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.8370017Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8370126Z ^
2025-12-04T12:15:05.8370584Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8370591Z 
2025-12-04T12:15:05.8371534Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.8371555Z 
2025-12-04T12:15:05.8371560Z 
2025-12-04T12:15:05.8371785Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.8372572Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T12:15:05.8372579Z 
2025-12-04T12:15:05.8372868Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.8373096Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8373203Z frames [('total', 1)]
2025-12-04T12:15:05.8373339Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8373808Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.8374046Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8374192Z graph_break []
2025-12-04T12:15:05.8374415Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8374538Z frames [('total', 1)]
2025-12-04T12:15:05.8374653Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8374874Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8375348Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.8375489Z graph_break []
2025-12-04T12:15:05.8375721Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8375823Z frames [('total', 1)]
2025-12-04T12:15:05.8375940Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8376172Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8376708Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.8376808Z graph_break []
2025-12-04T12:15:05.8377469Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-818cc5e6f257d295.xml -
2025-12-04T12:15:05.8377697Z =========================== short test summary info ============================
2025-12-04T12:15:05.8378545Z FAILED [0.4143s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.8379198Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8379289Z ^
2025-12-04T12:15:05.8379764Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8379770Z 
2025-12-04T12:15:05.8380479Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.8380487Z 
2025-12-04T12:15:05.8380491Z 
2025-12-04T12:15:05.8380723Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.8381416Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T12:15:05.8381423Z 
2025-12-04T12:15:05.8381702Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.8381884Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.8382086Z ================== 1 failed, 187 deselected, 2 rerun in 4.24s ==================
2025-12-04T12:15:05.8382206Z Got exit code 1
2025-12-04T12:15:05.8382816Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T12:15:05.8383224Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:05.8383749Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b552d5ebf2a766dc.xml
2025-12-04T12:15:05.8383918Z ============================= test session starts ==============================
2025-12-04T12:15:05.8384284Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.8384396Z cachedir: .pytest_cache
2025-12-04T12:15:05.8384916Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.8385059Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.8385168Z configfile: pytest.ini
2025-12-04T12:15:05.8385804Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.8386034Z collecting ... collected 188 items / 33 deselected / 155 selected
2025-12-04T12:15:05.8386182Z stepcurrent: skipping 33 already run items.
2025-12-04T12:15:05.8386311Z Running 155 items in this shard
2025-12-04T12:15:05.8386316Z 
2025-12-04T12:15:05.8388217Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.8389428Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8389862Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.8390338Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.8390874Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.8391337Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.8392030Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.8392576Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.8393166Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.8393766Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.8394329Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.8394789Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.8395307Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.8395795Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.8396255Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.8396710Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T12:15:05.8397263Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T12:15:05.8397912Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.8398628Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.8399314Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.8399901Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.8400465Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T12:15:05.8400982Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.8401465Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.8401932Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T12:15:05.8402410Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.8402859Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T12:15:05.8403320Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.8404369Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.8404838Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.8405345Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.8405950Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8406525Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T12:15:05.8407177Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8407663Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T12:15:05.8408122Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T12:15:05.8408701Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T12:15:05.8409137Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T12:15:05.8409723Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T12:15:05.8410257Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T12:15:05.8410783Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T12:15:05.8411530Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T12:15:05.8412241Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T12:15:05.8412617Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.8415033Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.8415615Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.8416734Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8417385Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8418281Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8419042Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8419924Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8420703Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8421313Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.8422464Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8422842Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.8423732Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8423878Z ('RERUN', {'yellow': True}) [3.6064s] [  0%]
2025-12-04T12:15:05.8425336Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.8426499Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8426931Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.8427378Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.8427908Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.8428408Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.8428959Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.8429497Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.8430111Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.8430707Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.8431263Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.8431719Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.8432272Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.8432741Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.8433215Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.8433661Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T12:15:05.8434160Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T12:15:05.8434807Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.8435518Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.8436205Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.8436732Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.8437292Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T12:15:05.8437806Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.8438289Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.8438750Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T12:15:05.8439230Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.8439682Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T12:15:05.8440147Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.8440676Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.8441143Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.8441684Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.8442289Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8442863Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T12:15:05.8443545Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8444030Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T12:15:05.8444475Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T12:15:05.8445060Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T12:15:05.8445538Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T12:15:05.8446120Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T12:15:05.8446654Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T12:15:05.8447178Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T12:15:05.8447880Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T12:15:05.8448586Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T12:15:05.8448968Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.8451374Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.8451981Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.8453027Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8453672Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8454566Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8455293Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8456181Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8457050Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8457715Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.8458867Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8459249Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.8460179Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8460333Z ('RERUN', {'yellow': True}) [0.6085s] [  0%]
2025-12-04T12:15:05.8461754Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.8462915Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8463352Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.8463798Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.8464333Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.8464795Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.8465343Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.8465888Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.8466581Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.8467186Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.8467743Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.8468200Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.8468721Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.8469225Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.8469700Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.8470150Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T12:15:05.8470645Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T12:15:05.8471540Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.8472247Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.8472940Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.8473552Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.8474118Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T12:15:05.8474627Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.8475112Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.8475545Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T12:15:05.8476025Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.8476481Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T12:15:05.8476952Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.8477484Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.8477954Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.8478458Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.8479058Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8479635Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T12:15:05.8480328Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8480816Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T12:15:05.8481261Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T12:15:05.8481848Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T12:15:05.8482285Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T12:15:05.8482910Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T12:15:05.8483448Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T12:15:05.8483958Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T12:15:05.8484713Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T12:15:05.8485416Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T12:15:05.8485791Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.8488182Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.8488763Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.8489805Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8490451Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8491343Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8492040Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8492922Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8493691Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8494348Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.8495495Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8495879Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.8496872Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8496995Z FAILED [0.6109s] [  0%]
2025-12-04T12:15:05.8497002Z 
2025-12-04T12:15:05.8497153Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.8497542Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _
2025-12-04T12:15:05.8497683Z Traceback (most recent call last):
2025-12-04T12:15:05.8498139Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.8498384Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.8498875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.8499126Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.8499650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.8499845Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.8500403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.8500567Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.8501101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.8501436Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.8501954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.8502105Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.8502595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.8502721Z     return self._compile_to_module()
2025-12-04T12:15:05.8503214Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.8503385Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.8503907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.8504057Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.8504557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.8504791Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.8505391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.8505520Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.8506043Z   File "/tmp/tmp0b8nceha/j7/cj7h6jumgz2ritrdd52emqour3wzuolivg46ljwykgtzzbmutrvg.py", line 137, in <module>
2025-12-04T12:15:05.8506537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.8506651Z     kernel.precompile(
2025-12-04T12:15:05.8507216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.8507340Z     self._precompile_worker()
2025-12-04T12:15:05.8507946Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.8508125Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.8508715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8508927Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8509407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8509658Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8510109Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8510474Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8510710Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.8511422Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8511513Z ^
2025-12-04T12:15:05.8511986Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8511992Z 
2025-12-04T12:15:05.8512707Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.8512747Z 
2025-12-04T12:15:05.8512752Z 
2025-12-04T12:15:05.8512983Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.8513674Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T12:15:05.8513682Z 
2025-12-04T12:15:05.8513963Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.8514187Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8514292Z frames [('total', 1)]
2025-12-04T12:15:05.8514425Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8514891Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.8515112Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8515229Z graph_break []
2025-12-04T12:15:05.8515617Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _
2025-12-04T12:15:05.8515751Z Traceback (most recent call last):
2025-12-04T12:15:05.8516174Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.8516406Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.8516905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.8517153Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.8517663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.8517869Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.8518376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.8518573Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.8519107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.8519426Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.8519957Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.8520106Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.8520597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.8520719Z     return self._compile_to_module()
2025-12-04T12:15:05.8521233Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.8521414Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.8521929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.8522090Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.8522596Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.8522828Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.8523426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.8523551Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.8524057Z   File "/tmp/tmp7kkspfeo/te/cteizhh2wyjrb26zcfaw7rteg7eje6ljy5zq2k2b7yt4q4nxrnkq.py", line 137, in <module>
2025-12-04T12:15:05.8524531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.8524677Z     kernel.precompile(
2025-12-04T12:15:05.8525243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.8525370Z     self._precompile_worker()
2025-12-04T12:15:05.8525968Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.8526161Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.8526757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8526970Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8527426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8527675Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8528135Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8528468Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8528700Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.8529421Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8529511Z ^
2025-12-04T12:15:05.8529982Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8529988Z 
2025-12-04T12:15:05.8530699Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.8530707Z 
2025-12-04T12:15:05.8530746Z 
2025-12-04T12:15:05.8530978Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.8531671Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T12:15:05.8531679Z 
2025-12-04T12:15:05.8531951Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.8532184Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8532289Z frames [('total', 1)]
2025-12-04T12:15:05.8532403Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8532912Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.8533132Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8533247Z graph_break []
2025-12-04T12:15:05.8533468Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8533570Z frames [('total', 1)]
2025-12-04T12:15:05.8533698Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8533947Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8534407Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.8534520Z graph_break []
2025-12-04T12:15:05.8534666Z =================================== FAILURES ===================================
2025-12-04T12:15:05.8535078Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _
2025-12-04T12:15:05.8535204Z Traceback (most recent call last):
2025-12-04T12:15:05.8535633Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.8535911Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.8536478Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.8536745Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.8537264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.8537464Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.8537987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.8538137Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.8538680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.8539021Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.8539544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.8539709Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.8540197Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.8540320Z     return self._compile_to_module()
2025-12-04T12:15:05.8540823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.8540989Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.8541522Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.8541660Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.8542157Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.8542454Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.8543045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.8543177Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.8543704Z   File "/tmp/tmpcvqnmz83/ne/cneh6ojjwd4az6gygqhbjjxqv2oaheohx5nbs5a2oiwt4kopttgx.py", line 137, in <module>
2025-12-04T12:15:05.8544172Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.8544305Z     kernel.precompile(
2025-12-04T12:15:05.8544892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.8545014Z     self._precompile_worker()
2025-12-04T12:15:05.8545634Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.8545815Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.8546428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8546679Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8547133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8547393Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8547839Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8548178Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8548451Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.8549168Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8549285Z ^
2025-12-04T12:15:05.8549746Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8549751Z 
2025-12-04T12:15:05.8550462Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.8550482Z 
2025-12-04T12:15:05.8550486Z 
2025-12-04T12:15:05.8550705Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.8551398Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T12:15:05.8551406Z 
2025-12-04T12:15:05.8551698Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.8551921Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8552040Z frames [('total', 1)]
2025-12-04T12:15:05.8552159Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8552624Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.8552860Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8552959Z graph_break []
2025-12-04T12:15:05.8553176Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8553292Z frames [('total', 1)]
2025-12-04T12:15:05.8553408Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8553629Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8554138Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.8554238Z graph_break []
2025-12-04T12:15:05.8554473Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8554582Z frames [('total', 1)]
2025-12-04T12:15:05.8554697Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8554935Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8555391Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.8555490Z graph_break []
2025-12-04T12:15:05.8556159Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b552d5ebf2a766dc.xml -
2025-12-04T12:15:05.8556365Z =========================== short test summary info ============================
2025-12-04T12:15:05.8557210Z FAILED [0.6109s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.8557918Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8558038Z ^
2025-12-04T12:15:05.8558513Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8558519Z 
2025-12-04T12:15:05.8559224Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.8559230Z 
2025-12-04T12:15:05.8559235Z 
2025-12-04T12:15:05.8559466Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.8560166Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T12:15:05.8560201Z 
2025-12-04T12:15:05.8560484Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.8560671Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.8560874Z ================== 1 failed, 33 deselected, 2 rerun in 4.87s ===================
2025-12-04T12:15:05.8560990Z Got exit code 1
2025-12-04T12:15:05.8561100Z Retrying single test...
2025-12-04T12:15:05.8561572Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-08c28ac73e77007a.xml
2025-12-04T12:15:05.8561750Z ============================= test session starts ==============================
2025-12-04T12:15:05.8562104Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.8562229Z cachedir: .pytest_cache
2025-12-04T12:15:05.8562753Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.8562880Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.8563006Z configfile: pytest.ini
2025-12-04T12:15:05.8563601Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.8563837Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.8564612Z stepcurrent: skipping 33 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T12:15:05.8564729Z Running 1 items in this shard
2025-12-04T12:15:05.8564734Z 
2025-12-04T12:15:05.8566192Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.8567349Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8567797Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.8568243Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.8568806Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.8569270Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.8569807Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.8570363Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.8571177Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.8571781Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.8572342Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.8572784Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.8573386Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.8573860Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.8574337Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.8574784Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T12:15:05.8575268Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T12:15:05.8575933Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.8576695Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.8577402Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.8577931Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.8578492Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T12:15:05.8579005Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.8579476Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.8579975Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T12:15:05.8580453Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.8580903Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T12:15:05.8581369Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.8582371Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.8582920Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.8583430Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.8584036Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8584661Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T12:15:05.8585301Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8585800Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T12:15:05.8586246Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T12:15:05.8586840Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T12:15:05.8587314Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T12:15:05.8587884Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T12:15:05.8588432Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T12:15:05.8588943Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T12:15:05.8589659Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T12:15:05.8590366Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T12:15:05.8590732Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.8593127Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.8593712Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.8594892Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8595525Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8596431Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8597176Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8598079Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8598888Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8599512Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.8600675Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8601095Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.8601993Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8602131Z ('RERUN', {'yellow': True}) [3.6041s] [100%]
2025-12-04T12:15:05.8603564Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.8604709Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8605152Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.8605605Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.8606146Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.8606609Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.8607139Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.8607696Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.8608320Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.8608920Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.8609483Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.8609924Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.8610456Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.8610973Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.8611451Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.8611900Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T12:15:05.8612418Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T12:15:05.8613080Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.8613771Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.8614475Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.8615042Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.8615593Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T12:15:05.8616118Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.8616666Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.8617112Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T12:15:05.8617595Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.8618029Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T12:15:05.8618512Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.8619033Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.8619517Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.8620021Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.8620624Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8621206Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T12:15:05.8621898Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8622392Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T12:15:05.8622839Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T12:15:05.8623424Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T12:15:05.8623867Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T12:15:05.8624759Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T12:15:05.8625319Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T12:15:05.8625836Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T12:15:05.8626595Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T12:15:05.8627305Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T12:15:05.8627671Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.8630075Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.8630694Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.8631742Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8632378Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8633284Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8633970Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8634876Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8635646Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8636310Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.8637472Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8637835Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.8638779Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8638919Z ('RERUN', {'yellow': True}) [0.6096s] [100%]
2025-12-04T12:15:05.8640359Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.8641542Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8641993Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.8642443Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.8643002Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.8643478Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.8644018Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.8644573Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.8645157Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.8645756Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.8646326Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.8646771Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.8647303Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.8647776Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.8648253Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.8648702Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T12:15:05.8649190Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T12:15:05.8649905Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.8650598Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.8651302Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.8651831Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.8652412Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T12:15:05.8652939Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.8653415Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.8653898Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T12:15:05.8654374Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.8654809Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T12:15:05.8655289Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.8655811Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.8656413Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.8656921Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.8657510Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8658098Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T12:15:05.8658733Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8659234Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T12:15:05.8659681Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T12:15:05.8660266Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T12:15:05.8660710Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T12:15:05.8661281Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T12:15:05.8661832Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T12:15:05.8662349Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T12:15:05.8663111Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T12:15:05.8663824Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T12:15:05.8664194Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.8666608Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.8667158Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.8668231Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8668860Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8669768Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8670563Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8671748Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8672520Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8673141Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.8674298Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8674667Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.8675573Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8675678Z FAILED [0.6176s] [100%]
2025-12-04T12:15:05.8675685Z 
2025-12-04T12:15:05.8675844Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.8676230Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _
2025-12-04T12:15:05.8676357Z Traceback (most recent call last):
2025-12-04T12:15:05.8676794Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.8677145Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.8677656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.8677910Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.8678423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.8678634Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.8679143Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.8679309Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.8679895Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.8680224Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.8680764Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.8680969Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.8681448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.8681587Z     return self._compile_to_module()
2025-12-04T12:15:05.8682070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.8682248Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.8682770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.8682901Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.8683466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.8683700Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.8684305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.8684435Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.8684948Z   File "/tmp/tmp6l90mpjy/y7/cy7w4pppwhstmgbhedapboukma6vmkcfiktelziyofw2uon22cbx.py", line 137, in <module>
2025-12-04T12:15:05.8685430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.8685542Z     kernel.precompile(
2025-12-04T12:15:05.8686108Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.8686243Z     self._precompile_worker()
2025-12-04T12:15:05.8686841Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.8687033Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.8687628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8687824Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8688288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8688533Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8688990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8689323Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8689593Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.8690318Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8690411Z ^
2025-12-04T12:15:05.8690870Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8690887Z 
2025-12-04T12:15:05.8691600Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.8691607Z 
2025-12-04T12:15:05.8691612Z 
2025-12-04T12:15:05.8691830Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.8692570Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T12:15:05.8692581Z 
2025-12-04T12:15:05.8692852Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.8693091Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8693232Z frames [('total', 1)]
2025-12-04T12:15:05.8693351Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8693831Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.8694054Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8694155Z graph_break []
2025-12-04T12:15:05.8694556Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _
2025-12-04T12:15:05.8694685Z Traceback (most recent call last):
2025-12-04T12:15:05.8695125Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.8695418Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.8695908Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.8696172Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.8696774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.8696984Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.8697493Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.8697640Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.8698190Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.8698515Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.8699033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.8699200Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.8699679Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.8699817Z     return self._compile_to_module()
2025-12-04T12:15:05.8700301Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.8700465Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.8700995Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.8701127Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.8701680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.8701915Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.8702502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.8702647Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.8703125Z   File "/tmp/tmp7m_z2z_n/bf/cbfyztuyfzxs6gg5z7c564uo3zygh2evvep6wqyycfvbmim5tzre.py", line 137, in <module>
2025-12-04T12:15:05.8703607Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.8703721Z     kernel.precompile(
2025-12-04T12:15:05.8704315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.8704452Z     self._precompile_worker()
2025-12-04T12:15:05.8705050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.8705230Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.8705873Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8706074Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8706542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8706786Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8707232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8707582Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8707843Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.8708574Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8708667Z ^
2025-12-04T12:15:05.8709126Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8709132Z 
2025-12-04T12:15:05.8709850Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.8709856Z 
2025-12-04T12:15:05.8709861Z 
2025-12-04T12:15:05.8710082Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.8710791Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T12:15:05.8710799Z 
2025-12-04T12:15:05.8711074Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.8711297Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8711419Z frames [('total', 1)]
2025-12-04T12:15:05.8711538Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8712010Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.8712233Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8712335Z graph_break []
2025-12-04T12:15:05.8712576Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8712683Z frames [('total', 1)]
2025-12-04T12:15:05.8712803Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8713039Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8713557Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.8713674Z graph_break []
2025-12-04T12:15:05.8713823Z =================================== FAILURES ===================================
2025-12-04T12:15:05.8714215Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _
2025-12-04T12:15:05.8714357Z Traceback (most recent call last):
2025-12-04T12:15:05.8714785Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.8715018Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.8715523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.8715808Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.8716340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.8716535Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.8717047Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.8717249Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.8717786Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.8718121Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.8718644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.8718796Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.8719293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.8719453Z     return self._compile_to_module()
2025-12-04T12:15:05.8719935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.8720121Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.8720640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.8720786Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.8721284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.8721518Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.8722125Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.8722258Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.8722783Z   File "/tmp/tmpv9l2bewc/xf/cxfljslkrx2pu46ribcqseu4ictygoxwuddw2h7xousjqc5wg66z.py", line 137, in <module>
2025-12-04T12:15:05.8723249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.8723368Z     kernel.precompile(
2025-12-04T12:15:05.8723936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.8724062Z     self._precompile_worker()
2025-12-04T12:15:05.8724661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.8724856Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.8725454Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8725710Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8726566Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8726820Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8727281Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8727614Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8727860Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.8728631Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8728724Z ^
2025-12-04T12:15:05.8729194Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8729205Z 
2025-12-04T12:15:05.8729915Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.8729951Z 
2025-12-04T12:15:05.8729956Z 
2025-12-04T12:15:05.8730183Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.8730877Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T12:15:05.8730885Z 
2025-12-04T12:15:05.8731151Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.8731389Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8731496Z frames [('total', 1)]
2025-12-04T12:15:05.8731629Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8732280Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.8732506Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8732626Z graph_break []
2025-12-04T12:15:05.8732847Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8732953Z frames [('total', 1)]
2025-12-04T12:15:05.8733089Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8733306Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8733780Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.8733879Z graph_break []
2025-12-04T12:15:05.8734097Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8734216Z frames [('total', 1)]
2025-12-04T12:15:05.8734336Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8734557Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8735026Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.8735129Z graph_break []
2025-12-04T12:15:05.8735793Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-08c28ac73e77007a.xml -
2025-12-04T12:15:05.8735968Z =========================== short test summary info ============================
2025-12-04T12:15:05.8736887Z FAILED [0.6176s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.8737611Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8737707Z ^
2025-12-04T12:15:05.8738234Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8738242Z 
2025-12-04T12:15:05.8738952Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.8738961Z 
2025-12-04T12:15:05.8738965Z 
2025-12-04T12:15:05.8739183Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.8739885Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T12:15:05.8739891Z 
2025-12-04T12:15:05.8740194Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.8740389Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.8740597Z ================== 1 failed, 187 deselected, 2 rerun in 4.87s ==================
2025-12-04T12:15:05.8740700Z Got exit code 1
2025-12-04T12:15:05.8740825Z Retrying single test...
2025-12-04T12:15:05.8741301Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df1b42bf8f6cd06e.xml
2025-12-04T12:15:05.8741532Z ============================= test session starts ==============================
2025-12-04T12:15:05.8741886Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.8741999Z cachedir: .pytest_cache
2025-12-04T12:15:05.8742531Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.8742657Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.8742770Z configfile: pytest.ini
2025-12-04T12:15:05.8743373Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.8743636Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.8744433Z stepcurrent: skipping 33 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T12:15:05.8744556Z Running 1 items in this shard
2025-12-04T12:15:05.8744561Z 
2025-12-04T12:15:05.8745986Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.8747158Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8747595Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.8748056Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.8748576Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.8749049Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.8749585Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.8750127Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.8750763Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.8751346Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.8751916Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.8752357Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.8752877Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.8753393Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.8753860Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.8754315Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T12:15:05.8754831Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T12:15:05.8755488Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.8756179Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.8756867Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.8757445Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.8757996Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T12:15:05.8758520Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.8758990Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.8759422Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T12:15:05.8759911Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.8760349Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T12:15:05.8760826Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.8761346Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.8761815Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.8762331Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.8762925Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8763511Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T12:15:05.8764184Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8764675Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T12:15:05.8765131Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T12:15:05.8765698Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T12:15:05.8766181Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T12:15:05.8766757Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T12:15:05.8767306Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T12:15:05.8767817Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T12:15:05.8768550Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T12:15:05.8769267Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T12:15:05.8769632Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.8772194Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.8772826Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.8773885Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8774516Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8775417Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8776096Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8777042Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8777877Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8778487Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.8779651Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8780017Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.8780963Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8781104Z ('RERUN', {'yellow': True}) [3.6095s] [100%]
2025-12-04T12:15:05.8782527Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.8783731Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8784162Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.8784621Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.8785184Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.8785658Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.8786197Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.8786738Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.8787333Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.8787914Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.8788489Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.8788930Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.8789447Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.8789933Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.8790391Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.8790855Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T12:15:05.8791335Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T12:15:05.8792012Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.8792715Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.8793400Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.8793937Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.8794535Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T12:15:05.8795063Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.8795530Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.8795994Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T12:15:05.8796487Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.8796927Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T12:15:05.8797401Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.8797921Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.8798428Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.8798947Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.8799540Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8800130Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T12:15:05.8800768Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8801256Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T12:15:05.8801721Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T12:15:05.8802295Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T12:15:05.8802755Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T12:15:05.8803329Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T12:15:05.8803863Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T12:15:05.8804391Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T12:15:05.8805137Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T12:15:05.8805859Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T12:15:05.8806228Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.8808683Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.8809252Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.8810312Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8810942Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8811855Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8812572Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8813466Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8814251Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8814859Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.8816025Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8816467Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.8817375Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8817512Z ('RERUN', {'yellow': True}) [0.6020s] [100%]
2025-12-04T12:15:05.8818934Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.8820146Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8820582Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.8821045Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T12:15:05.8821563Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T12:15:05.8822068Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.8822605Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.8823148Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.8823773Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.8824355Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.8824924Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.8825369Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T12:15:05.8825891Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.8826412Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T12:15:05.8826877Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T12:15:05.8827341Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T12:15:05.8827820Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T12:15:05.8828463Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T12:15:05.8829164Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.8829849Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.8830388Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.8830942Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T12:15:05.8831459Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.8831928Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.8832355Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T12:15:05.8832880Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.8833317Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T12:15:05.8833797Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.8834316Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.8834785Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.8835331Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.8835931Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8836518Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T12:15:05.8837189Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T12:15:05.8837669Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T12:15:05.8838122Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T12:15:05.8838696Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T12:15:05.8839184Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T12:15:05.8839757Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T12:15:05.8840294Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T12:15:05.8840820Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T12:15:05.8841530Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T12:15:05.8842251Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T12:15:05.8842617Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.8844990Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.8845531Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.8846633Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8847264Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8848166Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8848876Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8849755Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8850535Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8851174Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.8852334Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8852705Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.8853638Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8853745Z FAILED [0.6038s] [100%]
2025-12-04T12:15:05.8853754Z 
2025-12-04T12:15:05.8853900Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.8854298Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _
2025-12-04T12:15:05.8854425Z Traceback (most recent call last):
2025-12-04T12:15:05.8854847Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.8855095Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.8855589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.8855852Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.8856431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.8856632Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.8857162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.8857311Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.8857860Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.8858183Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.8858705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.8858873Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.8859393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.8859531Z     return self._compile_to_module()
2025-12-04T12:15:05.8860016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.8860184Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.8860712Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.8860842Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.8861335Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.8861610Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.8862196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.8862340Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.8862829Z   File "/tmp/tmpjwoagh_5/7b/c7bkwe7w4rxkl3zpge2awqaihytaac57lg4juzzcfrkd7e4jfgcn.py", line 137, in <module>
2025-12-04T12:15:05.8863328Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.8863454Z     kernel.precompile(
2025-12-04T12:15:05.8864008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.8864139Z     self._precompile_worker()
2025-12-04T12:15:05.8864732Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.8864919Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.8865525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8865760Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8866215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8866474Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8866917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8867265Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8867491Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.8868201Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8868307Z ^
2025-12-04T12:15:05.8868768Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8868775Z 
2025-12-04T12:15:05.8869496Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.8869505Z 
2025-12-04T12:15:05.8869510Z 
2025-12-04T12:15:05.8869728Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.8870422Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T12:15:05.8870440Z 
2025-12-04T12:15:05.8870708Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.8871159Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8871284Z frames [('total', 1)]
2025-12-04T12:15:05.8871409Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8871949Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.8872187Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8872295Z graph_break []
2025-12-04T12:15:05.8872693Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _
2025-12-04T12:15:05.8872818Z Traceback (most recent call last):
2025-12-04T12:15:05.8873247Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.8873494Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.8874033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.8874283Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.8874817Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.8875010Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.8875658Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.8875808Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.8876340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.8876674Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.8877194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.8877357Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.8877883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.8878006Z     return self._compile_to_module()
2025-12-04T12:15:05.8878503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.8878671Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.8879188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.8879331Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.8879827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.8880082Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.8880669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.8880801Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.8881329Z   File "/tmp/tmp1uu6riyx/rp/crpvzmvk2nslzhw4ccpeejwn2sicjksy4yichzaagajbkggpjegs.py", line 137, in <module>
2025-12-04T12:15:05.8881796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.8881928Z     kernel.precompile(
2025-12-04T12:15:05.8882484Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.8882601Z     self._precompile_worker()
2025-12-04T12:15:05.8883210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.8883394Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.8883987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8884231Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8884685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8884945Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8885388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8885720Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8885959Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.8886698Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8886805Z ^
2025-12-04T12:15:05.8887266Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8887272Z 
2025-12-04T12:15:05.8887981Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.8888031Z 
2025-12-04T12:15:05.8888036Z 
2025-12-04T12:15:05.8888254Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.8888947Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T12:15:05.8888952Z 
2025-12-04T12:15:05.8889236Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.8889460Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8889622Z frames [('total', 1)]
2025-12-04T12:15:05.8889754Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8890222Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.8890461Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8890567Z graph_break []
2025-12-04T12:15:05.8890788Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8890909Z frames [('total', 1)]
2025-12-04T12:15:05.8891027Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8891249Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8891726Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.8891826Z graph_break []
2025-12-04T12:15:05.8891991Z =================================== FAILURES ===================================
2025-12-04T12:15:05.8892381Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _
2025-12-04T12:15:05.8892508Z Traceback (most recent call last):
2025-12-04T12:15:05.8892947Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.8893186Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.8893677Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.8893945Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.8894460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.8894668Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.8895180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.8895336Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.8895914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.8896241Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.8896850Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.8897005Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.8897486Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.8897626Z     return self._compile_to_module()
2025-12-04T12:15:05.8898153Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.8898321Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.8898858Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.8898991Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.8899531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.8899765Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.8900348Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.8900489Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.8900994Z   File "/tmp/tmpll822hcz/dm/cdm4qxpijru6fav7joz4qry3i522rghlxy7haujqu6yeqbv2yt6o.py", line 137, in <module>
2025-12-04T12:15:05.8901475Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.8901615Z     kernel.precompile(
2025-12-04T12:15:05.8902171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.8902303Z     self._precompile_worker()
2025-12-04T12:15:05.8902904Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.8903095Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.8903686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8903884Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8904351Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8904596Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8905043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8905388Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8905618Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.8906333Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8906423Z ^
2025-12-04T12:15:05.8906882Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8906888Z 
2025-12-04T12:15:05.8907611Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.8907620Z 
2025-12-04T12:15:05.8907625Z 
2025-12-04T12:15:05.8907880Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.8908589Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T12:15:05.8908597Z 
2025-12-04T12:15:05.8908867Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.8909103Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8909208Z frames [('total', 1)]
2025-12-04T12:15:05.8909325Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8909804Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.8910057Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8910160Z graph_break []
2025-12-04T12:15:05.8910394Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8910504Z frames [('total', 1)]
2025-12-04T12:15:05.8910622Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8910856Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8911346Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.8911458Z graph_break []
2025-12-04T12:15:05.8911676Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.8911780Z frames [('total', 1)]
2025-12-04T12:15:05.8911910Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.8912129Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.8912584Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.8912697Z graph_break []
2025-12-04T12:15:05.8913384Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df1b42bf8f6cd06e.xml -
2025-12-04T12:15:05.8913575Z =========================== short test summary info ============================
2025-12-04T12:15:05.8914408Z FAILED [0.6038s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.8915118Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T12:15:05.8915222Z ^
2025-12-04T12:15:05.8915682Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8915690Z 
2025-12-04T12:15:05.8916407Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.8916417Z 
2025-12-04T12:15:05.8916422Z 
2025-12-04T12:15:05.8916638Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.8917344Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T12:15:05.8917352Z 
2025-12-04T12:15:05.8917619Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.8917803Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.8918021Z ================== 1 failed, 187 deselected, 2 rerun in 4.86s ==================
2025-12-04T12:15:05.8918123Z Got exit code 1
2025-12-04T12:15:05.8918748Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T12:15:05.8919190Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:05.8919662Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-97d6c66aee44b097.xml
2025-12-04T12:15:05.8919842Z ============================= test session starts ==============================
2025-12-04T12:15:05.8920195Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.8920307Z cachedir: .pytest_cache
2025-12-04T12:15:05.8920838Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.8920963Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.8921088Z configfile: pytest.ini
2025-12-04T12:15:05.8921709Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.8921943Z collecting ... collected 188 items / 34 deselected / 154 selected
2025-12-04T12:15:05.8922104Z stepcurrent: skipping 34 already run items.
2025-12-04T12:15:05.8922219Z Running 154 items in this shard
2025-12-04T12:15:05.8922224Z 
2025-12-04T12:15:05.8923619Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.8924710Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.8925150Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T12:15:05.8925650Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.8926109Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.8926660Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.8927199Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.8927781Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.8928287Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.8928844Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.8929305Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.8929738Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.8930348Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.8930934Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.8931543Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.8932170Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.8932705Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.8933244Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.8933733Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.8934213Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.8934724Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.8935544Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.8936083Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.8936796Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8937517Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.8938137Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.8938542Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.8939244Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.8939862Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.8940555Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.8941260Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.8941740Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.8942233Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.8942707Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.8943358Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.8943883Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.8944433Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.8945027Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.8945600Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.8946139Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.8946628Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.8947123Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.8947588Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.8948433Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.8948981Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.8949475Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.8949979Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.8950482Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.8950940Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.8951454Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.8951995Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.8952539Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.8953060Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.8953658Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8954253Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.8954841Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T12:15:05.8955352Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.8955817Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.8956401Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.8956875Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.8957450Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.8958586Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.8959218Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T12:15:05.8959858Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.8960409Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T12:15:05.8960772Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.8963065Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.8963604Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.8964686Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.8965319Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.8966229Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.8966944Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.8967838Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.8968603Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.8969209Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.8970317Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.8970741Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.8971879Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.8972017Z ('RERUN', {'yellow': True}) [3.4136s] [  0%]
2025-12-04T12:15:05.8973373Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.8974560Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.8975015Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T12:15:05.8975468Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.8975929Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.8976540Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.8977138Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.8977741Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.8978236Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.8978839Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.8979305Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.8979746Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.8980362Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.8981002Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.8981606Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.8982200Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.8982732Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.8983272Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.8983765Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.8984255Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.8984742Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.8985563Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.8986108Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.8986700Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.8987438Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.8988094Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.8988506Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.8989176Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.8989797Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.8990543Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.8991259Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.8991739Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.8992277Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.8992747Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.8993403Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.8993936Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.8994539Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.8995124Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.8995658Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.8996203Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.8996696Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.8997193Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.8997666Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.8998482Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.8999029Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.8999529Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.9000009Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.9000826Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.9001342Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.9001854Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.9002396Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.9002904Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.9003425Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.9004059Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9004653Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.9005241Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T12:15:05.9005786Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.9006247Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.9006835Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.9007294Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.9007873Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.9008459Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.9009086Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T12:15:05.9009675Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.9010224Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T12:15:05.9010585Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.9012849Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.9013389Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.9014464Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9015134Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9016046Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9016795Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9017690Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9018505Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9019135Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.9020255Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9020625Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.9021536Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9021671Z ('RERUN', {'yellow': True}) [0.4359s] [  0%]
2025-12-04T12:15:05.9023056Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.9024143Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9024591Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T12:15:05.9025043Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.9025501Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.9026058Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.9026596Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.9027201Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.9027694Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.9028250Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.9028709Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.9029202Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.9029815Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9030404Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9031012Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9031644Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9032176Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9032720Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9033211Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9033739Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9034205Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.9035021Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9035559Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.9036176Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9036910Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.9037512Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.9037912Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.9038582Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.9039205Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.9039887Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.9040594Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.9041087Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.9041565Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.9042039Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.9042749Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.9043274Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.9043832Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.9044408Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9044973Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9045514Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9046007Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9046495Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9046990Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.9047809Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9048357Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.9048853Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.9049365Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.9049865Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.9050337Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.9050833Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.9051368Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.9051880Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.9052403Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.9053005Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9053584Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.9054170Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T12:15:05.9054679Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.9055142Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.9055764Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.9056222Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.9056893Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.9057447Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.9058073Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T12:15:05.9058698Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.9059250Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T12:15:05.9059611Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.9061900Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.9062477Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.9063520Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9064152Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9065064Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9065749Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9066647Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9067422Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9068044Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.9069138Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9069506Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.9070453Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9070565Z FAILED [0.4407s] [  0%]
2025-12-04T12:15:05.9070572Z 
2025-12-04T12:15:05.9070736Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.9071322Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _
2025-12-04T12:15:05.9071450Z Traceback (most recent call last):
2025-12-04T12:15:05.9071889Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.9072218Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.9072723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.9072980Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.9073494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.9073749Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.9074259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.9074421Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.9074954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.9075282Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.9075817Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.9076011Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.9076491Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.9076630Z     return self._compile_to_module()
2025-12-04T12:15:05.9077115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.9077291Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.9077807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.9077938Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.9078449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.9078682Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.9079284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.9079429Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.9079928Z   File "/tmp/tmpw7y32ewp/66/c66hsm3vmwnt3wy77ar7w7thboulp5wcyjt6mhqsautxju4d2lnc.py", line 65, in <module>
2025-12-04T12:15:05.9080404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.9080519Z     kernel.precompile(
2025-12-04T12:15:05.9081073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.9081210Z     self._precompile_worker()
2025-12-04T12:15:05.9081811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.9082013Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.9082648Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9082851Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9083322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9083571Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9084028Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9084366Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9084718Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.9085389Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9085488Z ^
2025-12-04T12:15:05.9085946Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9085999Z 
2025-12-04T12:15:05.9086711Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.9086718Z 
2025-12-04T12:15:05.9086723Z 
2025-12-04T12:15:05.9086943Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.9087668Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T12:15:05.9087673Z 
2025-12-04T12:15:05.9087947Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.9088240Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9088352Z frames [('total', 1)]
2025-12-04T12:15:05.9088476Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9088965Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.9089195Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9089311Z graph_break []
2025-12-04T12:15:05.9089711Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _
2025-12-04T12:15:05.9089838Z Traceback (most recent call last):
2025-12-04T12:15:05.9090274Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.9090508Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.9091001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.9091270Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.9091786Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.9091998Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.9092511Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.9092660Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.9093217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.9093540Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.9094084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.9094239Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.9094755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.9094893Z     return self._compile_to_module()
2025-12-04T12:15:05.9095379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.9095542Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.9096072Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.9096201Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.9096815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.9097049Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.9097637Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.9097779Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.9098257Z   File "/tmp/tmp_wmyafcy/zc/czcx4l4pcssfzep4tga4tuzmlabsywmuqvcdzztv74d4wc3zuzud.py", line 65, in <module>
2025-12-04T12:15:05.9098766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.9098877Z     kernel.precompile(
2025-12-04T12:15:05.9099431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.9099561Z     self._precompile_worker()
2025-12-04T12:15:05.9100160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.9100339Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.9100978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9101176Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9101644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9101889Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9102330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9102678Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9102906Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.9103570Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9103663Z ^
2025-12-04T12:15:05.9104123Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9104129Z 
2025-12-04T12:15:05.9104856Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.9104863Z 
2025-12-04T12:15:05.9104867Z 
2025-12-04T12:15:05.9105085Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.9105807Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T12:15:05.9105812Z 
2025-12-04T12:15:05.9106086Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.9106308Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9106432Z frames [('total', 1)]
2025-12-04T12:15:05.9106580Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9107059Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.9107284Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9107384Z graph_break []
2025-12-04T12:15:05.9107615Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9107722Z frames [('total', 1)]
2025-12-04T12:15:05.9107838Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9108070Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9108563Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.9108678Z graph_break []
2025-12-04T12:15:05.9108828Z =================================== FAILURES ===================================
2025-12-04T12:15:05.9109228Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _
2025-12-04T12:15:05.9109367Z Traceback (most recent call last):
2025-12-04T12:15:05.9109825Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.9110056Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.9110556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.9110808Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.9111332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.9111528Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.9112073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.9112235Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.9112768Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.9113106Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.9113624Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.9113774Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.9114270Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.9114395Z     return self._compile_to_module()
2025-12-04T12:15:05.9114879Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.9115060Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.9115575Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.9115720Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.9116216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.9116449Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.9117052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.9117179Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.9117709Z   File "/tmp/tmpl4vojzx9/pz/cpzghq2dotxh6p6q74wpwp5f23tqx5ymabvlkqjlncl4jfmyqfhi.py", line 65, in <module>
2025-12-04T12:15:05.9118170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.9118316Z     kernel.precompile(
2025-12-04T12:15:05.9118883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.9119003Z     self._precompile_worker()
2025-12-04T12:15:05.9119597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.9119791Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.9120383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9120596Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9121077Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9121325Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9121784Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9122118Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9122392Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.9123040Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9123131Z ^
2025-12-04T12:15:05.9123602Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9123607Z 
2025-12-04T12:15:05.9124322Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.9124363Z 
2025-12-04T12:15:05.9124370Z 
2025-12-04T12:15:05.9124600Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.9125305Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T12:15:05.9125313Z 
2025-12-04T12:15:05.9125586Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.9125819Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9125926Z frames [('total', 1)]
2025-12-04T12:15:05.9126059Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9126528Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.9126749Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9126869Z graph_break []
2025-12-04T12:15:05.9127089Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9127194Z frames [('total', 1)]
2025-12-04T12:15:05.9127324Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9127545Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9128017Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.9128117Z graph_break []
2025-12-04T12:15:05.9128331Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9128447Z frames [('total', 1)]
2025-12-04T12:15:05.9128563Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9128781Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9129259Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.9129360Z graph_break []
2025-12-04T12:15:05.9130058Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-97d6c66aee44b097.xml -
2025-12-04T12:15:05.9130236Z =========================== short test summary info ============================
2025-12-04T12:15:05.9131083Z FAILED [0.4407s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.9131745Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9131838Z ^
2025-12-04T12:15:05.9132344Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9132350Z 
2025-12-04T12:15:05.9133060Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.9133068Z 
2025-12-04T12:15:05.9133072Z 
2025-12-04T12:15:05.9133291Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.9134052Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T12:15:05.9134057Z 
2025-12-04T12:15:05.9134330Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.9134530Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.9134731Z ================== 1 failed, 34 deselected, 2 rerun in 4.33s ===================
2025-12-04T12:15:05.9134835Z Got exit code 1
2025-12-04T12:15:05.9134962Z Retrying single test...
2025-12-04T12:15:05.9135433Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-232f2d4b09cdec77.xml
2025-12-04T12:15:05.9135660Z ============================= test session starts ==============================
2025-12-04T12:15:05.9136562Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.9136682Z cachedir: .pytest_cache
2025-12-04T12:15:05.9137224Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.9137350Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.9137460Z configfile: pytest.ini
2025-12-04T12:15:05.9138063Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.9138288Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.9139079Z stepcurrent: skipping 34 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T12:15:05.9145592Z Running 1 items in this shard
2025-12-04T12:15:05.9145599Z 
2025-12-04T12:15:05.9146969Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.9148085Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9148526Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T12:15:05.9148994Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.9149570Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.9150112Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.9150669Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.9151250Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.9151807Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.9152363Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.9152831Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.9153273Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.9153911Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9154517Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9155129Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9155720Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9156295Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9156823Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9157331Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9157815Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9158294Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.9159113Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9159647Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.9160254Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9160974Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.9161591Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.9161996Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.9162693Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.9163316Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.9164024Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.9164742Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.9165253Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.9165747Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.9166227Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.9166859Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.9167428Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.9167970Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.9168569Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9169103Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9169672Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9170161Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9170642Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9171331Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.9172152Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9172693Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.9173193Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.9173657Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.9174169Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.9174625Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.9175133Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.9175674Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.9176265Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.9176877Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.9177482Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9178076Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.9178661Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T12:15:05.9179208Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.9179697Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.9180273Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.9180788Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.9181362Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.9181914Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.9182545Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T12:15:05.9183170Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.9183735Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T12:15:05.9184101Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.9186387Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.9186923Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.9187983Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9188614Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9189527Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9190248Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9191125Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9191910Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9192513Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.9193668Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9194041Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.9194977Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9195110Z ('RERUN', {'yellow': True}) [3.4310s] [100%]
2025-12-04T12:15:05.9196444Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.9197549Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9198017Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T12:15:05.9198481Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.9198942Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.9199490Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.9200037Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.9200619Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.9201126Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.9201685Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.9202147Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.9202580Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.9203179Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9203776Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9204413Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9205008Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9205541Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9206069Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9206605Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9207088Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9207571Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.9208384Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9208957Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.9209543Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9210263Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.9210919Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.9211326Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.9211989Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.9212605Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.9213275Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.9213987Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.9214466Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.9214960Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.9215432Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.9216075Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.9216663Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.9217258Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.9217858Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9218389Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9218925Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9219418Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9219928Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9220407Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.9221227Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9221801Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.9222296Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.9222757Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.9223270Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.9223726Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.9224270Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.9224807Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.9225305Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.9225836Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.9226428Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9227020Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.9227608Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T12:15:05.9228115Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.9228579Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.9229154Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.9229626Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.9230200Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.9230784Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.9231408Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T12:15:05.9231987Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.9232548Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T12:15:05.9232911Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.9235211Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.9235781Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.9236835Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9237502Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9238406Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9239092Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9239969Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9240752Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9241365Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.9242465Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9242834Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.9243737Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9243874Z ('RERUN', {'yellow': True}) [0.4436s] [100%]
2025-12-04T12:15:05.9245249Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.9246354Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9246790Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T12:15:05.9247250Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.9247741Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.9248299Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.9248837Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.9249451Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.9249950Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.9250502Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.9250962Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.9251428Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.9252026Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9252630Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9253234Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9253824Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9254354Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9254904Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9255393Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9255872Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9256428Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.9257240Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9257783Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.9258421Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9259142Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.9259764Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.9260165Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.9260866Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.9261491Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.9262181Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.9262918Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.9263396Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.9263886Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.9264363Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.9265046Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.9265573Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.9266125Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.9266721Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9267256Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9267797Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9268294Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9268776Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9269259Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.9270077Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9270623Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.9271349Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.9271939Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.9272445Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.9272910Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.9273423Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.9273961Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.9274533Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.9275060Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.9275660Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9276316Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.9276905Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T12:15:05.9277416Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.9277884Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.9278468Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.9278994Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.9279575Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.9280123Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.9280748Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T12:15:05.9281325Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.9281891Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T12:15:05.9282252Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.9284511Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.9285042Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.9286138Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9286768Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9287673Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9288388Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9289288Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9290057Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9290699Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.9291807Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9292174Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.9293128Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9293239Z FAILED [0.4454s] [100%]
2025-12-04T12:15:05.9293246Z 
2025-12-04T12:15:05.9293395Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.9293806Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _
2025-12-04T12:15:05.9293934Z Traceback (most recent call last):
2025-12-04T12:15:05.9294369Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.9294607Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.9295097Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.9295363Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.9295876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.9296085Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.9296666Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.9296816Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.9297368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.9297692Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.9298229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.9298382Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.9298995Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.9299134Z     return self._compile_to_module()
2025-12-04T12:15:05.9299621Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.9299790Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.9300321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.9300453Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.9300964Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.9301233Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.9301824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.9301966Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.9302467Z   File "/tmp/tmpdomg4n05/iz/cizmrzz6mryyxk6k74zpidzy2u2zzkaqvmcsyw3wh5zw5relgn42.py", line 65, in <module>
2025-12-04T12:15:05.9302974Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.9303088Z     kernel.precompile(
2025-12-04T12:15:05.9303638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.9303770Z     self._precompile_worker()
2025-12-04T12:15:05.9304365Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.9304545Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.9305187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9305387Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9305852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9306102Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9306547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9306895Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9307127Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.9307788Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9307883Z ^
2025-12-04T12:15:05.9308344Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9308351Z 
2025-12-04T12:15:05.9309073Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.9309082Z 
2025-12-04T12:15:05.9309086Z 
2025-12-04T12:15:05.9309305Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.9310020Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T12:15:05.9310026Z 
2025-12-04T12:15:05.9310299Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.9310525Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9310647Z frames [('total', 1)]
2025-12-04T12:15:05.9310796Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9311280Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.9311504Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9311606Z graph_break []
2025-12-04T12:15:05.9312014Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _
2025-12-04T12:15:05.9312137Z Traceback (most recent call last):
2025-12-04T12:15:05.9312560Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.9312806Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.9313326Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.9313592Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.9314107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.9314300Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.9314878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.9315025Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.9315573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.9315893Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.9316412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.9316603Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.9317088Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.9317211Z     return self._compile_to_module()
2025-12-04T12:15:05.9317710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.9317878Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.9318406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.9318533Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.9319026Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.9319271Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.9319853Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.9319997Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.9320470Z   File "/tmp/tmphc7i6l_a/xm/cxmjr4g4b2zqtk32544qtcdij4vwxgnynrm5c4tfhepuwribwsbf.py", line 65, in <module>
2025-12-04T12:15:05.9320935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.9321061Z     kernel.precompile(
2025-12-04T12:15:05.9321613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.9321730Z     self._precompile_worker()
2025-12-04T12:15:05.9322337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.9322519Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.9323159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9323357Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9323806Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9324064Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9324507Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9324878Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9325106Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.9325783Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9325890Z ^
2025-12-04T12:15:05.9326351Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9326356Z 
2025-12-04T12:15:05.9327077Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.9327117Z 
2025-12-04T12:15:05.9327122Z 
2025-12-04T12:15:05.9327343Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.9328047Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T12:15:05.9328066Z 
2025-12-04T12:15:05.9328336Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.9328563Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9328716Z frames [('total', 1)]
2025-12-04T12:15:05.9328835Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9329310Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.9329552Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9329655Z graph_break []
2025-12-04T12:15:05.9329875Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9329993Z frames [('total', 1)]
2025-12-04T12:15:05.9330109Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9330342Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9330804Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.9330907Z graph_break []
2025-12-04T12:15:05.9331067Z =================================== FAILURES ===================================
2025-12-04T12:15:05.9331468Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _
2025-12-04T12:15:05.9331592Z Traceback (most recent call last):
2025-12-04T12:15:05.9332030Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.9332265Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.9332768Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.9333018Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.9333530Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.9333736Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.9334244Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.9334436Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.9334971Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.9335294Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.9335824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.9335973Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.9336545Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.9336688Z     return self._compile_to_module()
2025-12-04T12:15:05.9337214Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.9337400Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.9337918Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.9338051Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.9338618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.9338851Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.9339455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.9339584Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.9340082Z   File "/tmp/tmpwb9d0956/sk/cskhnk4q7cufhboa4xv5riz7djeb56xwe3v5cf4b32sw7f6tkcwy.py", line 65, in <module>
2025-12-04T12:15:05.9340562Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.9340707Z     kernel.precompile(
2025-12-04T12:15:05.9341268Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.9341403Z     self._precompile_worker()
2025-12-04T12:15:05.9342004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.9342200Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.9342793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9342995Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9343467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9343712Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9344172Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9344509Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9344740Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.9345407Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9345500Z ^
2025-12-04T12:15:05.9345957Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9345977Z 
2025-12-04T12:15:05.9346694Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.9346702Z 
2025-12-04T12:15:05.9346706Z 
2025-12-04T12:15:05.9346957Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.9347678Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T12:15:05.9347686Z 
2025-12-04T12:15:05.9347951Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.9348189Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9348299Z frames [('total', 1)]
2025-12-04T12:15:05.9348416Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9348898Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.9349152Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9349273Z graph_break []
2025-12-04T12:15:05.9349494Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9349603Z frames [('total', 1)]
2025-12-04T12:15:05.9349737Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9349959Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9350448Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.9350562Z graph_break []
2025-12-04T12:15:05.9350777Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9350879Z frames [('total', 1)]
2025-12-04T12:15:05.9351012Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9351231Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9351704Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.9351832Z graph_break []
2025-12-04T12:15:05.9352486Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-232f2d4b09cdec77.xml -
2025-12-04T12:15:05.9352675Z =========================== short test summary info ============================
2025-12-04T12:15:05.9353522Z FAILED [0.4454s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.9354828Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9354919Z ^
2025-12-04T12:15:05.9355379Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9355388Z 
2025-12-04T12:15:05.9356110Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.9356120Z 
2025-12-04T12:15:05.9356125Z 
2025-12-04T12:15:05.9356342Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.9357055Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T12:15:05.9357063Z 
2025-12-04T12:15:05.9357334Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.9357516Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.9357735Z ================== 1 failed, 187 deselected, 2 rerun in 4.36s ==================
2025-12-04T12:15:05.9357837Z Got exit code 1
2025-12-04T12:15:05.9357958Z Retrying single test...
2025-12-04T12:15:05.9358435Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6add3d31a0a55a66.xml
2025-12-04T12:15:05.9358651Z ============================= test session starts ==============================
2025-12-04T12:15:05.9359020Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.9359133Z cachedir: .pytest_cache
2025-12-04T12:15:05.9359658Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.9359798Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.9359908Z configfile: pytest.ini
2025-12-04T12:15:05.9360514Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.9360738Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.9361563Z stepcurrent: skipping 34 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T12:15:05.9361702Z Running 1 items in this shard
2025-12-04T12:15:05.9361708Z 
2025-12-04T12:15:05.9363040Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.9364171Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9364611Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T12:15:05.9365071Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.9365566Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.9366103Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.9366654Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.9367309Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.9367884Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.9368444Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.9368896Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.9369344Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.9369944Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9370539Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9371333Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9371933Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9372565Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9373097Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9373605Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9374086Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9374566Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.9375441Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9375975Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.9376648Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9377423Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.9378038Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.9378445Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.9379097Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.9379786Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.9380462Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.9381178Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.9381658Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.9382152Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.9382631Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.9383263Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.9383802Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.9384348Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.9384942Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9385473Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9386042Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9386547Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9387027Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9387502Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.9388319Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9388917Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.9389419Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.9389881Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.9390432Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.9390891Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.9391400Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.9391944Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.9392479Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.9393013Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.9393608Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9394200Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.9394789Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T12:15:05.9395290Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.9396090Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.9396668Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.9397141Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.9397719Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.9398257Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.9398897Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T12:15:05.9399525Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.9400090Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T12:15:05.9400454Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.9402751Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.9403288Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.9404377Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9405007Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9405901Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9406623Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9407504Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9408288Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9408898Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.9409997Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9410371Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.9411282Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9411421Z ('RERUN', {'yellow': True}) [3.4387s] [100%]
2025-12-04T12:15:05.9412755Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.9413900Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9414344Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T12:15:05.9414811Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.9415269Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.9415805Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.9416470Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.9417060Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.9417573Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.9418126Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.9418620Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.9419052Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.9419651Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9420249Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9420896Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9421487Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9422021Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9422553Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9423053Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9423534Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9424015Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.9424831Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9425356Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.9425960Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9426675Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.9427343Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.9427746Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.9428411Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.9429030Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.9429704Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.9430884Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.9431382Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.9431871Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.9432384Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.9433021Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.9433567Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.9434117Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.9434746Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9435277Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9435821Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9436313Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9436793Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9437279Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.9438103Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9438652Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.9439152Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.9439617Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.9440137Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.9440599Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.9441175Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.9441715Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.9442215Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.9442753Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.9443346Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9443975Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.9444573Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T12:15:05.9445072Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.9445586Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.9446168Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.9446641Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.9447225Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.9447815Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.9448437Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T12:15:05.9449014Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.9449575Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T12:15:05.9449936Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.9452202Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.9452745Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.9453803Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9454437Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9455379Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9456061Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9457015Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9457875Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9458487Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.9459590Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9459989Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.9460893Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9461032Z ('RERUN', {'yellow': True}) [0.4432s] [100%]
2025-12-04T12:15:05.9462366Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:05.9463493Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9463930Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T12:15:05.9464396Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:05.9464858Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.9465409Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.9465955Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.9466539Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.9467056Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:05.9467608Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.9468069Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.9468500Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:05.9469135Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9469733Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9470343Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:05.9471143Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9471761Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9472293Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9472799Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9473327Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9473808Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.9474619Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9475164Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.9475812Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9476530Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:05.9477147Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:05.9477547Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:05.9478214Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:05.9478834Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:05.9479505Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:05.9480228Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:05.9480706Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:05.9481197Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:05.9481671Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:05.9482371Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.9482904Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:05.9483451Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:05.9484043Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9484570Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9485144Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9485637Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9486119Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9486599Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:05.9487451Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9487994Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:05.9488893Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:05.9489366Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:05.9489951Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:05.9490414Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:05.9490925Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:05.9491464Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:05.9491956Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:05.9492494Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:05.9493092Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9493837Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:05.9494430Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T12:15:05.9494941Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:05.9495409Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:05.9495990Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:05.9496590Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:05.9497169Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:05.9497725Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:05.9498351Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T12:15:05.9498959Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:05.9499520Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T12:15:05.9499885Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.9502154Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.9502725Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.9503906Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9504542Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9505443Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9506128Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9507016Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9507806Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9508418Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.9509522Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9509891Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.9510844Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9510954Z FAILED [0.4448s] [100%]
2025-12-04T12:15:05.9510963Z 
2025-12-04T12:15:05.9511112Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.9511522Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _
2025-12-04T12:15:05.9511648Z Traceback (most recent call last):
2025-12-04T12:15:05.9512085Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.9512321Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.9512844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.9513109Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.9513624Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.9513818Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.9514373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.9514521Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.9515066Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.9515388Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.9515908Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.9516069Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.9516586Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.9516723Z     return self._compile_to_module()
2025-12-04T12:15:05.9517206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.9517376Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.9517909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.9518041Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.9518549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.9518784Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.9519370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.9519514Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.9520026Z   File "/tmp/tmpchk618lu/rx/crxneosyy2h3poby4nucsuxwszzvopnfwjh2zfomce626zhrqqn5.py", line 65, in <module>
2025-12-04T12:15:05.9520493Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.9520622Z     kernel.precompile(
2025-12-04T12:15:05.9521174Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.9521307Z     self._precompile_worker()
2025-12-04T12:15:05.9521905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.9522085Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.9522699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9522937Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9523405Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9523657Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9524099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9524445Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9524670Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.9525354Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9525462Z ^
2025-12-04T12:15:05.9525918Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9525924Z 
2025-12-04T12:15:05.9526650Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.9526688Z 
2025-12-04T12:15:05.9526693Z 
2025-12-04T12:15:05.9526910Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.9527623Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T12:15:05.9527629Z 
2025-12-04T12:15:05.9527899Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.9528127Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9528281Z frames [('total', 1)]
2025-12-04T12:15:05.9528403Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9528872Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.9529120Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9529225Z graph_break []
2025-12-04T12:15:05.9529634Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _
2025-12-04T12:15:05.9529760Z Traceback (most recent call last):
2025-12-04T12:15:05.9530185Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.9530435Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.9530925Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.9531189Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.9531707Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.9531905Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.9532434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.9532582Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.9533119Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.9533455Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.9533976Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.9534140Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.9534657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.9534784Z     return self._compile_to_module()
2025-12-04T12:15:05.9535283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.9535450Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.9535981Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.9536115Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.9536678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.9536930Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.9537553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.9537687Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.9538205Z   File "/tmp/tmps62fmfbq/7a/c7aoc4bxcv6wpgvpfv3xndonc7ddlvev4izlx62jhz7ign57wyva.py", line 65, in <module>
2025-12-04T12:15:05.9538723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.9538853Z     kernel.precompile(
2025-12-04T12:15:05.9539406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.9539526Z     self._precompile_worker()
2025-12-04T12:15:05.9540138Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.9540322Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.9540933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9541167Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9541620Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9541884Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9542328Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9542663Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9542909Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.9543562Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9543672Z ^
2025-12-04T12:15:05.9544134Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9544140Z 
2025-12-04T12:15:05.9544852Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.9544872Z 
2025-12-04T12:15:05.9544877Z 
2025-12-04T12:15:05.9545094Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.9545792Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T12:15:05.9545798Z 
2025-12-04T12:15:05.9546080Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.9546305Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9546425Z frames [('total', 1)]
2025-12-04T12:15:05.9546548Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9547045Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.9547279Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9547380Z graph_break []
2025-12-04T12:15:05.9547598Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9547715Z frames [('total', 1)]
2025-12-04T12:15:05.9547832Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9548049Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9548517Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.9548617Z graph_break []
2025-12-04T12:15:05.9548806Z =================================== FAILURES ===================================
2025-12-04T12:15:05.9549204Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _
2025-12-04T12:15:05.9549333Z Traceback (most recent call last):
2025-12-04T12:15:05.9549771Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.9550035Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.9550537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.9550786Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.9551300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.9551507Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.9552020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.9552202Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.9552749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.9553071Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.9553608Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.9553759Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.9554240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.9554375Z     return self._compile_to_module()
2025-12-04T12:15:05.9554861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.9555038Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.9555563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.9555692Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.9556203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.9556436Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.9557018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.9557159Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.9557657Z   File "/tmp/tmptfemlvnp/px/cpxba3dcg7g2emdld3folai36dg5l5232bau7arm43ijoiqyzepi.py", line 65, in <module>
2025-12-04T12:15:05.9558132Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.9558246Z     kernel.precompile(
2025-12-04T12:15:05.9558845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.9558980Z     self._precompile_worker()
2025-12-04T12:15:05.9559579Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.9559772Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.9560368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9560565Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9561029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9561309Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9561756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9562110Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9562341Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.9563036Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9563128Z ^
2025-12-04T12:15:05.9563587Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9563592Z 
2025-12-04T12:15:05.9564317Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.9564322Z 
2025-12-04T12:15:05.9564357Z 
2025-12-04T12:15:05.9564574Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.9565298Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T12:15:05.9565306Z 
2025-12-04T12:15:05.9565575Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.9565813Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9565918Z frames [('total', 1)]
2025-12-04T12:15:05.9566035Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9566514Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.9566735Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9566837Z graph_break []
2025-12-04T12:15:05.9567068Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9567174Z frames [('total', 1)]
2025-12-04T12:15:05.9567307Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9567524Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9567980Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.9568096Z graph_break []
2025-12-04T12:15:05.9568311Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9568413Z frames [('total', 1)]
2025-12-04T12:15:05.9568541Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9568760Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9569232Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:05.9569330Z graph_break []
2025-12-04T12:15:05.9570020Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6add3d31a0a55a66.xml -
2025-12-04T12:15:05.9570210Z =========================== short test summary info ============================
2025-12-04T12:15:05.9571250Z FAILED [0.4448s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.9571922Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9572013Z ^
2025-12-04T12:15:05.9572469Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9572474Z 
2025-12-04T12:15:05.9573283Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.9573292Z 
2025-12-04T12:15:05.9573296Z 
2025-12-04T12:15:05.9573517Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.9574232Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T12:15:05.9574278Z 
2025-12-04T12:15:05.9574549Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.9574732Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.9574955Z ================== 1 failed, 187 deselected, 2 rerun in 4.37s ==================
2025-12-04T12:15:05.9575057Z Got exit code 1
2025-12-04T12:15:05.9575704Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T12:15:05.9576112Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:05.9576697Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fa52f41f0c0be4e5.xml
2025-12-04T12:15:05.9576885Z ============================= test session starts ==============================
2025-12-04T12:15:05.9577240Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.9577352Z cachedir: .pytest_cache
2025-12-04T12:15:05.9577888Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.9578013Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.9578139Z configfile: pytest.ini
2025-12-04T12:15:05.9578733Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.9578961Z collecting ... collected 188 items / 35 deselected / 153 selected
2025-12-04T12:15:05.9579129Z stepcurrent: skipping 35 already run items.
2025-12-04T12:15:05.9579245Z Running 153 items in this shard
2025-12-04T12:15:05.9579250Z 
2025-12-04T12:15:05.9580711Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.9581962Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9582410Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.9582910Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.9583372Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.9583922Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.9584462Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.9585054Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.9585684Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.9586246Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.9586712Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.9587375Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.9587913Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.9588460Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T12:15:05.9589040Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9589616Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9590144Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9590652Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9591130Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9591593Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T12:15:05.9592108Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T12:15:05.9592872Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9593577Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.9594262Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.9594800Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.9595291Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.9595743Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T12:15:05.9596286Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.9596741Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T12:15:05.9597240Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.9597771Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.9598260Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.9598825Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.9599423Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9600015Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T12:15:05.9600605Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T12:15:05.9601106Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T12:15:05.9601584Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T12:15:05.9602169Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T12:15:05.9602676Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T12:15:05.9603253Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T12:15:05.9603808Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T12:15:05.9604515Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T12:15:05.9605094Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T12:15:05.9605627Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T12:15:05.9606342Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T12:15:05.9606720Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.9609378Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.9609933Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.9610984Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9611628Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9612555Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9613240Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9614135Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9614936Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9615560Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.9616879Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9617301Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.9618201Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9618352Z ('RERUN', {'yellow': True}) [3.5258s] [  0%]
2025-12-04T12:15:05.9619787Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.9621030Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9621482Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.9621938Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.9622417Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.9622952Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.9623494Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.9624128Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.9624715Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.9625289Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.9625760Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.9626444Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.9626975Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.9627526Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T12:15:05.9628126Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9628690Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9629234Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9629723Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9630206Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9630724Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T12:15:05.9631223Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T12:15:05.9631998Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9632687Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.9633384Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.9633909Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.9634395Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.9634863Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T12:15:05.9635351Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.9635817Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T12:15:05.9636299Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.9636833Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.9637382Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.9637902Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.9638514Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9639094Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T12:15:05.9639690Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T12:15:05.9640205Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T12:15:05.9640678Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T12:15:05.9641266Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T12:15:05.9641757Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T12:15:05.9642331Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T12:15:05.9642884Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T12:15:05.9643591Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T12:15:05.9644218Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T12:15:05.9644733Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T12:15:05.9645454Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T12:15:05.9645815Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.9648439Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.9648986Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.9650042Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9650698Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9651591Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9652280Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9653164Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9653973Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9654585Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.9655845Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9656244Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.9657234Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9657385Z ('RERUN', {'yellow': True}) [0.5049s] [  0%]
2025-12-04T12:15:05.9658851Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.9660108Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9660542Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.9661010Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.9661474Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.9662006Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.9662562Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.9663145Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.9663741Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.9664297Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.9664748Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.9665428Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.9665956Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.9666514Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T12:15:05.9667090Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9667663Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9668194Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9668685Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9669212Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9669677Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T12:15:05.9670188Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T12:15:05.9671128Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9671895Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.9672605Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.9673131Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.9673635Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.9674085Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T12:15:05.9674594Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.9675050Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T12:15:05.9675536Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.9676088Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.9676578Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.9677113Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.9677709Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9678845Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T12:15:05.9679430Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T12:15:05.9679934Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T12:15:05.9680418Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T12:15:05.9680998Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T12:15:05.9681506Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T12:15:05.9682101Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T12:15:05.9682645Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T12:15:05.9683408Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T12:15:05.9683986Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T12:15:05.9684504Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T12:15:05.9685228Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T12:15:05.9685649Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.9688258Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.9688799Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.9689854Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9690561Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9691523Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9692205Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9693140Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9693915Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9694528Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.9695828Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9696196Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.9697163Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9697306Z FAILED [0.5101s] [  0%]
2025-12-04T12:15:05.9697313Z 
2025-12-04T12:15:05.9697478Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.9697868Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _
2025-12-04T12:15:05.9697993Z Traceback (most recent call last):
2025-12-04T12:15:05.9698435Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.9698676Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.9699176Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.9699465Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.9699981Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.9700193Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.9700705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.9700853Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.9701405Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.9701729Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.9702265Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.9702421Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.9702903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.9703046Z     return self._compile_to_module()
2025-12-04T12:15:05.9703531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.9703709Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.9704225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.9704357Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.9704895Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.9705132Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.9705767Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.9705911Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.9706422Z   File "/tmp/tmpgv72wtly/d2/cd2bhfqkfzk3ytih4l5jpwrrqngdoz4q257rrrhwldmopxdpfcqa.py", line 137, in <module>
2025-12-04T12:15:05.9706897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.9707012Z     kernel.precompile(
2025-12-04T12:15:05.9707562Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.9707694Z     self._precompile_worker()
2025-12-04T12:15:05.9708326Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.9708526Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.9709124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9709326Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9709919Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9710169Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9710613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9710966Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9711194Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.9712026Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9712157Z ^
2025-12-04T12:15:05.9712617Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9712641Z 
2025-12-04T12:15:05.9713357Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.9713363Z 
2025-12-04T12:15:05.9713368Z 
2025-12-04T12:15:05.9713588Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.9714303Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T12:15:05.9714312Z 
2025-12-04T12:15:05.9714584Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.9714828Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9714936Z frames [('total', 1)]
2025-12-04T12:15:05.9715059Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9715541Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.9715769Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9715871Z graph_break []
2025-12-04T12:15:05.9716278Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _
2025-12-04T12:15:05.9716406Z Traceback (most recent call last):
2025-12-04T12:15:05.9716843Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.9717083Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.9717573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.9717881Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.9718683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.9718899Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.9719414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.9719566Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.9720113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.9720485Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.9721008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.9721179Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.9721661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.9721834Z     return self._compile_to_module()
2025-12-04T12:15:05.9722321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.9722488Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.9723026Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.9723159Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.9723680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.9723919Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.9724547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.9724691Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.9725206Z   File "/tmp/tmpz6kanr3z/2k/c2kscxxt7ys57ljxoamro5pgno6dvi66nhweineppnb6f3qpuar7.py", line 137, in <module>
2025-12-04T12:15:05.9725669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.9725797Z     kernel.precompile(
2025-12-04T12:15:05.9726354Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.9726488Z     self._precompile_worker()
2025-12-04T12:15:05.9727085Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.9727265Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.9727880Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9728078Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9728542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9728788Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9729233Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9729576Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9729807Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.9730653Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9730758Z ^
2025-12-04T12:15:05.9731220Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9731228Z 
2025-12-04T12:15:05.9731950Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.9731958Z 
2025-12-04T12:15:05.9731962Z 
2025-12-04T12:15:05.9732180Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.9732888Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T12:15:05.9732928Z 
2025-12-04T12:15:05.9733197Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.9733426Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9733543Z frames [('total', 1)]
2025-12-04T12:15:05.9733661Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9734137Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.9734409Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9734510Z graph_break []
2025-12-04T12:15:05.9734743Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9734848Z frames [('total', 1)]
2025-12-04T12:15:05.9734966Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9735200Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9735662Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.9735795Z graph_break []
2025-12-04T12:15:05.9735963Z =================================== FAILURES ===================================
2025-12-04T12:15:05.9736431Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _
2025-12-04T12:15:05.9736577Z Traceback (most recent call last):
2025-12-04T12:15:05.9737008Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.9737241Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.9737749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.9737997Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.9738526Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.9738719Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.9739235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.9739398Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.9739935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.9740258Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.9740793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.9740941Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.9741436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.9741562Z     return self._compile_to_module()
2025-12-04T12:15:05.9742098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.9742278Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.9742796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.9742943Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.9743444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.9743680Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.9744279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.9744440Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.9744943Z   File "/tmp/tmpfahfx886/7c/c7cfetq46btaezjz4qzq4ubbk7h2uh4qdnt5yqj5q6hm3ku4wm37.py", line 137, in <module>
2025-12-04T12:15:05.9745426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.9745542Z     kernel.precompile(
2025-12-04T12:15:05.9746109Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.9746268Z     self._precompile_worker()
2025-12-04T12:15:05.9746865Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.9747059Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.9747656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9747869Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9748321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9748603Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9749063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9749400Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9749627Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.9750453Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9750543Z ^
2025-12-04T12:15:05.9751017Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9751023Z 
2025-12-04T12:15:05.9751737Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.9751743Z 
2025-12-04T12:15:05.9751747Z 
2025-12-04T12:15:05.9751978Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.9752680Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T12:15:05.9752686Z 
2025-12-04T12:15:05.9752955Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.9753188Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9753293Z frames [('total', 1)]
2025-12-04T12:15:05.9753422Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9753889Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.9754113Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9754266Z graph_break []
2025-12-04T12:15:05.9754485Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9754590Z frames [('total', 1)]
2025-12-04T12:15:05.9754725Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9754944Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9755415Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.9755516Z graph_break []
2025-12-04T12:15:05.9755733Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9755849Z frames [('total', 1)]
2025-12-04T12:15:05.9755965Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9756219Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9756694Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.9756793Z graph_break []
2025-12-04T12:15:05.9757463Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fa52f41f0c0be4e5.xml -
2025-12-04T12:15:05.9757671Z =========================== short test summary info ============================
2025-12-04T12:15:05.9758507Z FAILED [0.5101s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.9759329Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9759419Z ^
2025-12-04T12:15:05.9759889Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9759928Z 
2025-12-04T12:15:05.9760642Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.9760650Z 
2025-12-04T12:15:05.9760655Z 
2025-12-04T12:15:05.9760874Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.9761581Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T12:15:05.9761588Z 
2025-12-04T12:15:05.9761859Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.9762056Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.9762259Z ================== 1 failed, 35 deselected, 2 rerun in 4.58s ===================
2025-12-04T12:15:05.9762364Z Got exit code 1
2025-12-04T12:15:05.9762487Z Retrying single test...
2025-12-04T12:15:05.9762959Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-38b24c1b21208356.xml
2025-12-04T12:15:05.9763138Z ============================= test session starts ==============================
2025-12-04T12:15:05.9763492Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.9763604Z cachedir: .pytest_cache
2025-12-04T12:15:05.9764139Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.9764268Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.9764379Z configfile: pytest.ini
2025-12-04T12:15:05.9764984Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.9765211Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.9766040Z stepcurrent: skipping 35 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T12:15:05.9766162Z Running 1 items in this shard
2025-12-04T12:15:05.9766167Z 
2025-12-04T12:15:05.9767602Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.9768892Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9769331Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.9769797Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.9770291Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.9770838Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.9771564Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.9772150Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.9772844Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.9773399Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.9773864Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.9774501Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.9775038Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.9775588Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T12:15:05.9776176Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9776791Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9777326Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9777830Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9778312Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9778783Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T12:15:05.9779302Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T12:15:05.9780118Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9780833Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.9781520Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.9782113Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.9782619Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.9783081Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T12:15:05.9783589Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.9784088Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T12:15:05.9784571Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.9785123Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.9785614Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.9786184Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.9786776Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9787377Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T12:15:05.9787943Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T12:15:05.9788445Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T12:15:05.9788931Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T12:15:05.9789512Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T12:15:05.9789987Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T12:15:05.9790569Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T12:15:05.9791111Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T12:15:05.9791831Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T12:15:05.9792414Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T12:15:05.9792983Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T12:15:05.9793696Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T12:15:05.9794065Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.9796729Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.9797313Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.9798355Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9798987Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9799899Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9800612Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9801512Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9802287Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9802907Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.9804159Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9804544Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.9805439Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9805576Z ('RERUN', {'yellow': True}) [3.5480s] [100%]
2025-12-04T12:15:05.9807017Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.9808295Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9808748Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.9809198Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.9809670Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.9810239Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.9810787Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.9811389Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.9812009Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.9812576Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.9813026Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.9813665Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.9814237Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.9814785Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T12:15:05.9815378Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9815907Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9816882Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9817392Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9817876Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9818354Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T12:15:05.9818858Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T12:15:05.9819634Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9820330Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.9821067Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.9821778Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.9822274Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.9822740Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T12:15:05.9823236Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.9823732Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T12:15:05.9824233Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.9824772Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.9825274Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.9825835Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.9826430Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9827032Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T12:15:05.9827594Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T12:15:05.9828145Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T12:15:05.9828609Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T12:15:05.9829198Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T12:15:05.9829654Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T12:15:05.9830229Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T12:15:05.9830782Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T12:15:05.9831487Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T12:15:05.9832077Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T12:15:05.9832595Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T12:15:05.9833301Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T12:15:05.9833680Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.9836344Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.9836897Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.9837971Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9838619Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9839544Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9840240Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9841121Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9841937Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9842546Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.9843791Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9844172Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.9845063Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9845217Z ('RERUN', {'yellow': True}) [0.5240s] [100%]
2025-12-04T12:15:05.9846641Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.9847897Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9848328Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.9848824Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.9849289Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.9849826Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.9850382Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.9850963Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.9851585Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.9852138Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.9852593Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.9853274Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.9853803Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.9854360Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T12:15:05.9854945Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9855475Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9856048Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9856621Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9857115Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9857580Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T12:15:05.9858077Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T12:15:05.9858857Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9859550Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.9860253Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.9860779Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.9861285Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.9861740Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T12:15:05.9862276Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.9862745Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T12:15:05.9863226Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.9863773Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.9864262Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.9864815Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.9865421Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9866002Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T12:15:05.9866609Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T12:15:05.9867108Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T12:15:05.9867574Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T12:15:05.9868165Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T12:15:05.9868623Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T12:15:05.9869246Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T12:15:05.9869787Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T12:15:05.9870512Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T12:15:05.9871319Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T12:15:05.9871840Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T12:15:05.9872568Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T12:15:05.9872928Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.9875556Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.9876164Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.9877216Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9877845Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9878799Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9879480Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9880377Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9881199Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9881808Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.9883074Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9883512Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.9884417Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9884524Z FAILED [0.5093s] [100%]
2025-12-04T12:15:05.9884530Z 
2025-12-04T12:15:05.9884676Z ==================================== RERUNS ====================================
2025-12-04T12:15:05.9885087Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _
2025-12-04T12:15:05.9885215Z Traceback (most recent call last):
2025-12-04T12:15:05.9885655Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.9885891Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.9886382Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.9886644Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.9887162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.9887371Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.9887882Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.9888030Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.9888578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.9888899Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.9889468Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.9889620Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.9890101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.9890236Z     return self._compile_to_module()
2025-12-04T12:15:05.9890720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.9890885Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.9891435Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.9891596Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.9892105Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.9892345Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.9892931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.9893112Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.9893624Z   File "/tmp/tmpuio9ao7y/3k/c3kj2kiyfpwgkww7duyhamkmysena3itstpvte6z3exyizeie6zu.py", line 137, in <module>
2025-12-04T12:15:05.9894101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.9894218Z     kernel.precompile(
2025-12-04T12:15:05.9894781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.9894916Z     self._precompile_worker()
2025-12-04T12:15:05.9895518Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.9895731Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.9896400Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9896603Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9897074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9897320Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9897766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9898121Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9898352Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.9899179Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9899275Z ^
2025-12-04T12:15:05.9899731Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9899737Z 
2025-12-04T12:15:05.9900466Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.9900473Z 
2025-12-04T12:15:05.9900477Z 
2025-12-04T12:15:05.9900696Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.9901421Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T12:15:05.9901429Z 
2025-12-04T12:15:05.9901738Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.9901974Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9902103Z frames [('total', 1)]
2025-12-04T12:15:05.9902224Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9902706Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.9902930Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9903034Z graph_break []
2025-12-04T12:15:05.9903437Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _
2025-12-04T12:15:05.9903567Z Traceback (most recent call last):
2025-12-04T12:15:05.9904022Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.9904275Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.9904764Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.9905033Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.9905579Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.9905771Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.9906299Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.9906451Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.9907001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.9907322Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.9907880Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.9908047Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.9908535Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.9908656Z     return self._compile_to_module()
2025-12-04T12:15:05.9909157Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.9909321Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.9909845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.9909977Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.9910472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.9910724Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.9911311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.9911449Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.9911950Z   File "/tmp/tmp8rtpeauw/cc/ccctdkq4glp3ayx2tg773vcu44cl24rp45fxhxbasyfsf3qg5w7r.py", line 137, in <module>
2025-12-04T12:15:05.9912412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.9912537Z     kernel.precompile(
2025-12-04T12:15:05.9913096Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.9913212Z     self._precompile_worker()
2025-12-04T12:15:05.9913857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.9914040Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.9914643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9914845Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9915295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9915552Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9915998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9916456Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9916685Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.9917500Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9917636Z ^
2025-12-04T12:15:05.9918093Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9918100Z 
2025-12-04T12:15:05.9918825Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.9918830Z 
2025-12-04T12:15:05.9918835Z 
2025-12-04T12:15:05.9919054Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.9919758Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T12:15:05.9919808Z 
2025-12-04T12:15:05.9920083Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.9920306Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9920426Z frames [('total', 1)]
2025-12-04T12:15:05.9920545Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9921008Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.9921241Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9921343Z graph_break []
2025-12-04T12:15:05.9921576Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9921681Z frames [('total', 1)]
2025-12-04T12:15:05.9921799Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9922032Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9922498Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.9922596Z graph_break []
2025-12-04T12:15:05.9922756Z =================================== FAILURES ===================================
2025-12-04T12:15:05.9923147Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _
2025-12-04T12:15:05.9923272Z Traceback (most recent call last):
2025-12-04T12:15:05.9923712Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:05.9923947Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:05.9924448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:05.9924701Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:05.9925217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:05.9925457Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:05.9925967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:05.9926131Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:05.9926663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:05.9926983Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:05.9927516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:05.9927696Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:05.9928190Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:05.9928318Z     return self._compile_to_module()
2025-12-04T12:15:05.9928800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:05.9929028Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:05.9929543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:05.9929672Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:05.9930179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:05.9930407Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:05.9931007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:05.9931163Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:05.9931676Z   File "/tmp/tmp70m2r3ad/ey/ceycqjnmpvwnxuzpllna34bpdo66bnzsjlxrkdyo2klbckgshgp4.py", line 137, in <module>
2025-12-04T12:15:05.9932150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:05.9932267Z     kernel.precompile(
2025-12-04T12:15:05.9932834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:05.9932953Z     self._precompile_worker()
2025-12-04T12:15:05.9933547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:05.9933741Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:05.9934338Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9934540Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9935004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9935247Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9935704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9936039Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9936264Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.9937165Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9937258Z ^
2025-12-04T12:15:05.9937733Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9937786Z 
2025-12-04T12:15:05.9938503Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.9938512Z 
2025-12-04T12:15:05.9938516Z 
2025-12-04T12:15:05.9938733Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.9939447Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T12:15:05.9939453Z 
2025-12-04T12:15:05.9939726Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.9939994Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9940102Z frames [('total', 1)]
2025-12-04T12:15:05.9940219Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9940698Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.9940923Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9941067Z graph_break []
2025-12-04T12:15:05.9941285Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9941389Z frames [('total', 1)]
2025-12-04T12:15:05.9941521Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9941742Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9942203Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.9942316Z graph_break []
2025-12-04T12:15:05.9942534Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:05.9942651Z frames [('total', 1)]
2025-12-04T12:15:05.9942803Z stats [('calls_captured', 10)]
2025-12-04T12:15:05.9943024Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:05.9943492Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:05.9943597Z graph_break []
2025-12-04T12:15:05.9944247Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-38b24c1b21208356.xml -
2025-12-04T12:15:05.9944436Z =========================== short test summary info ============================
2025-12-04T12:15:05.9945264Z FAILED [0.5093s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:05.9946084Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9946177Z ^
2025-12-04T12:15:05.9946635Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9946641Z 
2025-12-04T12:15:05.9947362Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:05.9947368Z 
2025-12-04T12:15:05.9947372Z 
2025-12-04T12:15:05.9947593Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:05.9948303Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T12:15:05.9948309Z 
2025-12-04T12:15:05.9948580Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:05.9948776Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:05.9949019Z ================== 1 failed, 187 deselected, 2 rerun in 4.62s ==================
2025-12-04T12:15:05.9949122Z Got exit code 1
2025-12-04T12:15:05.9949245Z Retrying single test...
2025-12-04T12:15:05.9949714Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b1ae24833396f782.xml
2025-12-04T12:15:05.9949883Z ============================= test session starts ==============================
2025-12-04T12:15:05.9950253Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:05.9950364Z cachedir: .pytest_cache
2025-12-04T12:15:05.9950883Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:05.9955697Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:05.9955911Z configfile: pytest.ini
2025-12-04T12:15:05.9956538Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:05.9956775Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:05.9957568Z stepcurrent: skipping 35 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T12:15:05.9957743Z Running 1 items in this shard
2025-12-04T12:15:05.9957750Z 
2025-12-04T12:15:05.9959193Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:05.9960466Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9960944Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:05.9961409Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:05.9961871Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:05.9962407Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:05.9962960Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:05.9963548Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:05.9964149Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:05.9964704Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:05.9965157Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:05.9965806Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:05.9966335Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:05.9966931Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T12:15:05.9967518Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:05.9968049Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:05.9968589Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:05.9969079Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:05.9969600Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:05.9970067Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T12:15:05.9970589Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T12:15:05.9971564Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:05.9972338Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.9973043Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:05.9973575Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:05.9974129Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T12:15:05.9974581Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T12:15:05.9975079Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:05.9975545Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T12:15:05.9976027Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T12:15:05.9976646Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:05.9977142Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T12:15:05.9977666Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:05.9978274Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:05.9978855Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T12:15:05.9979436Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T12:15:05.9979935Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T12:15:05.9980408Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T12:15:05.9981031Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T12:15:05.9981488Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T12:15:05.9982076Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T12:15:05.9982611Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T12:15:05.9983394Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T12:15:05.9983976Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T12:15:05.9984487Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T12:15:05.9985235Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T12:15:05.9985597Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:05.9988220Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:05.9988793Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:05.9989850Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:05.9990482Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:05.9991387Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:05.9992068Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:05.9992957Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:05.9993726Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:05.9994368Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:05.9995641Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:05.9996009Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:05.9997454Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:05.9997594Z ('RERUN', {'yellow': True}) [3.5338s] [100%]
2025-12-04T12:15:05.9999074Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:06.0000323Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0000801Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:06.0001251Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:06.0001712Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:06.0002301Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:06.0002842Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.0003438Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:06.0004022Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:06.0004575Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:06.0005038Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:06.0005673Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:06.0006209Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:06.0006761Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T12:15:06.0007341Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:06.0007884Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:06.0008412Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:06.0008946Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:06.0009562Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:06.0010034Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T12:15:06.0010549Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T12:15:06.0011318Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:06.0012071Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:06.0012767Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:06.0013344Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.0013832Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T12:15:06.0014285Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T12:15:06.0014797Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:06.0015250Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T12:15:06.0015787Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T12:15:06.0016384Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:06.0016879Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T12:15:06.0017415Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:06.0018013Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:06.0018611Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T12:15:06.0019174Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T12:15:06.0019670Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T12:15:06.0020159Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T12:15:06.0020736Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T12:15:06.0021208Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T12:15:06.0021784Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T12:15:06.0022386Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T12:15:06.0023094Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T12:15:06.0023673Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T12:15:06.0024199Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T12:15:06.0024943Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T12:15:06.0025330Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.0027969Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.0028551Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.0029625Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0030263Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0031154Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0031833Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0032730Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0033504Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0034123Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.0035364Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0035745Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:06.0036715Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0037132Z ('RERUN', {'yellow': True}) [0.5303s] [100%]
2025-12-04T12:15:06.0038578Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T12:15:06.0039886Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0040339Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:06.0040790Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T12:15:06.0041300Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:06.0041838Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:06.0042378Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.0042974Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:06.0043555Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:06.0044158Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:06.0044612Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:06.0045255Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:06.0045782Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T12:15:06.0046329Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T12:15:06.0046925Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:06.0047455Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:06.0047994Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:06.0048482Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:06.0048961Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:06.0049440Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T12:15:06.0049942Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T12:15:06.0050762Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:06.0051461Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:06.0052161Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T12:15:06.0052686Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.0053214Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T12:15:06.0053692Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T12:15:06.0054187Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T12:15:06.0054686Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T12:15:06.0055169Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T12:15:06.0055698Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T12:15:06.0056202Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T12:15:06.0056800Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T12:15:06.0057464Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:06.0058046Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T12:15:06.0058607Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T12:15:06.0059118Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T12:15:06.0059586Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T12:15:06.0060172Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T12:15:06.0060632Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T12:15:06.0061206Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T12:15:06.0061761Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T12:15:06.0062469Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T12:15:06.0063056Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T12:15:06.0063571Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T12:15:06.0064344Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T12:15:06.0064715Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.0067392Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.0067936Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.0069023Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0069654Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0070547Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0071497Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0072387Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0073176Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0073792Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.0075054Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0075426Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:06.0076327Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0076449Z FAILED [0.5134s] [100%]
2025-12-04T12:15:06.0076456Z 
2025-12-04T12:15:06.0076604Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.0077014Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _
2025-12-04T12:15:06.0077140Z Traceback (most recent call last):
2025-12-04T12:15:06.0077667Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:06.0077917Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:06.0078411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.0078679Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.0079192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.0079388Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.0079916Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.0080111Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.0080647Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.0080986Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.0081511Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.0081722Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.0082202Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.0082327Z     return self._compile_to_module()
2025-12-04T12:15:06.0082825Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.0082993Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.0083527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.0083721Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.0084223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.0084472Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.0085060Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.0085191Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.0085707Z   File "/tmp/tmp00sk6nt5/c3/cc3ka5ut3tzoxycnduhrx2jw53imj4r72i5wqfxcnm66srw64d2i.py", line 137, in <module>
2025-12-04T12:15:06.0086169Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.0086297Z     kernel.precompile(
2025-12-04T12:15:06.0086857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.0086978Z     self._precompile_worker()
2025-12-04T12:15:06.0087594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.0087777Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.0088388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0088589Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0089042Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0089303Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0089748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0090095Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0090353Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.0091160Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0091265Z ^
2025-12-04T12:15:06.0091724Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0091730Z 
2025-12-04T12:15:06.0092453Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0092459Z 
2025-12-04T12:15:06.0092464Z 
2025-12-04T12:15:06.0092716Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0093422Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T12:15:06.0093428Z 
2025-12-04T12:15:06.0093707Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0093967Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0094087Z frames [('total', 1)]
2025-12-04T12:15:06.0094203Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0094667Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:06.0094900Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0095002Z graph_break []
2025-12-04T12:15:06.0095401Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _
2025-12-04T12:15:06.0095540Z Traceback (most recent call last):
2025-12-04T12:15:06.0096002Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:06.0096248Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:06.0096798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.0097049Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.0097578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.0097774Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.0098296Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.0098449Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.0098983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.0099320Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.0099845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.0099998Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.0100491Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.0100612Z     return self._compile_to_module()
2025-12-04T12:15:06.0101107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.0101268Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.0101787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.0101929Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.0102469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.0102714Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.0103306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.0103433Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.0103955Z   File "/tmp/tmpyn5psnqo/3g/c3ga3bog5mrqjsrpzn6bpss6y4kpevtk35kjgnyxu2lktonmrbpp.py", line 137, in <module>
2025-12-04T12:15:06.0104415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.0104529Z     kernel.precompile(
2025-12-04T12:15:06.0105129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.0105251Z     self._precompile_worker()
2025-12-04T12:15:06.0105858Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.0106071Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.0106668Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0106881Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0107336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0107595Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0108043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0108410Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0108656Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.0109462Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0109567Z ^
2025-12-04T12:15:06.0110027Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0110032Z 
2025-12-04T12:15:06.0110751Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0110757Z 
2025-12-04T12:15:06.0110764Z 
2025-12-04T12:15:06.0110993Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0111692Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T12:15:06.0111698Z 
2025-12-04T12:15:06.0111979Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0112205Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0112310Z frames [('total', 1)]
2025-12-04T12:15:06.0112441Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0112912Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:06.0113146Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0113248Z graph_break []
2025-12-04T12:15:06.0113467Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0113582Z frames [('total', 1)]
2025-12-04T12:15:06.0113705Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0113958Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0114434Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:06.0114535Z graph_break []
2025-12-04T12:15:06.0114687Z =================================== FAILURES ===================================
2025-12-04T12:15:06.0115095Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _
2025-12-04T12:15:06.0115218Z Traceback (most recent call last):
2025-12-04T12:15:06.0115653Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:06.0115885Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:06.0116412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.0116674Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.0117188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.0117397Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.0117934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.0118083Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.0118630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.0118951Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.0119473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.0119663Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.0120143Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.0120274Z     return self._compile_to_module()
2025-12-04T12:15:06.0120763Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.0120928Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.0121455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.0121584Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.0122090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.0122325Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.0122917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.0123056Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.0123566Z   File "/tmp/tmp4lr11uw0/xg/cxgekqvj2prddxnzlrmwnkmuggs6vyoojttp6nduyhega5v2gj3z.py", line 137, in <module>
2025-12-04T12:15:06.0124028Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.0124152Z     kernel.precompile(
2025-12-04T12:15:06.0124707Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.0124839Z     self._precompile_worker()
2025-12-04T12:15:06.0125433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.0125617Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.0126335Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0126536Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0127000Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0127246Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0127688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0128035Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0128260Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.0129126Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0129222Z ^
2025-12-04T12:15:06.0129682Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0129688Z 
2025-12-04T12:15:06.0130457Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0130463Z 
2025-12-04T12:15:06.0130468Z 
2025-12-04T12:15:06.0130685Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0131397Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T12:15:06.0131404Z 
2025-12-04T12:15:06.0131676Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0131899Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0132048Z frames [('total', 1)]
2025-12-04T12:15:06.0132169Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0132644Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:06.0132870Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0132972Z graph_break []
2025-12-04T12:15:06.0133204Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0133309Z frames [('total', 1)]
2025-12-04T12:15:06.0133426Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0134074Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0134540Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:06.0134654Z graph_break []
2025-12-04T12:15:06.0134871Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0134981Z frames [('total', 1)]
2025-12-04T12:15:06.0135110Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0135329Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0135794Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:15:06.0135906Z graph_break []
2025-12-04T12:15:06.0136641Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b1ae24833396f782.xml -
2025-12-04T12:15:06.0136835Z =========================== short test summary info ============================
2025-12-04T12:15:06.0137689Z FAILED [0.5134s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.0138725Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0138836Z ^
2025-12-04T12:15:06.0139297Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0139306Z 
2025-12-04T12:15:06.0140031Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0140037Z 
2025-12-04T12:15:06.0140041Z 
2025-12-04T12:15:06.0140260Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0140993Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T12:15:06.0141014Z 
2025-12-04T12:15:06.0141289Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0141477Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.0141700Z ================== 1 failed, 187 deselected, 2 rerun in 4.62s ==================
2025-12-04T12:15:06.0141836Z Got exit code 1
2025-12-04T12:15:06.0142455Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T12:15:06.0142881Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:06.0143354Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-80996ba6b8c32f81.xml
2025-12-04T12:15:06.0143537Z ============================= test session starts ==============================
2025-12-04T12:15:06.0143894Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.0144041Z cachedir: .pytest_cache
2025-12-04T12:15:06.0144578Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.0144708Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.0144826Z configfile: pytest.ini
2025-12-04T12:15:06.0145431Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.0145662Z collecting ... collected 188 items / 36 deselected / 152 selected
2025-12-04T12:15:06.0145823Z stepcurrent: skipping 36 already run items.
2025-12-04T12:15:06.0145942Z Running 152 items in this shard
2025-12-04T12:15:06.0145947Z 
2025-12-04T12:15:06.0147302Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:06.0148414Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0148868Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T12:15:06.0149336Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:06.0149799Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:06.0150351Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:06.0150927Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.0151513Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:06.0152109Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:06.0152663Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:06.0153124Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:06.0153584Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.0154189Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0154792Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0155470Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0156065Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:06.0156599Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:06.0157143Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:06.0157667Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:06.0158146Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:06.0158631Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:06.0159420Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:06.0159957Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.0160544Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:06.0161268Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:06.0161886Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:06.0162288Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:06.0162918Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:06.0163506Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:06.0164198Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:06.0164906Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:06.0165388Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:06.0165878Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:06.0166347Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:06.0167028Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:06.0167558Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:06.0168107Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:06.0168727Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:06.0169256Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:06.0169793Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:06.0170284Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:06.0170763Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:06.0171451Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:06.0172242Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:06.0172786Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:06.0173285Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:06.0173766Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:06.0174273Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:06.0174739Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:06.0175250Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:06.0175796Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:06.0176368Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:06.0176892Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:06.0177486Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:06.0178179Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:06.0178742Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T12:15:06.0179252Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:06.0179719Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:06.0180360Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:06.0180835Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:06.0181417Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:06.0181969Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:06.0182612Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T12:15:06.0183185Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:06.0183746Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T12:15:06.0184109Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.0186508Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.0187049Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.0188110Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0188743Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0189648Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0190328Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0191226Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0192044Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0192659Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.0193768Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0194135Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:06.0195070Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0195210Z ('RERUN', {'yellow': True}) [3.4498s] [  0%]
2025-12-04T12:15:06.0196565Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:06.0197686Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0198133Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T12:15:06.0198595Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:06.0199092Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:06.0199637Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:06.0200183Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.0200765Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:06.0201358Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:06.0201910Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:06.0202373Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:06.0202801Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.0203413Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0203998Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0204602Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0205197Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:06.0205761Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:06.0206305Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:06.0206795Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:06.0207270Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:06.0207747Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:06.0208560Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:06.0209102Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.0209688Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:06.0210434Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:06.0211050Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:06.0211450Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:06.0212080Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:06.0212702Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:06.0213359Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:06.0214061Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:06.0214541Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:06.0215028Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:06.0215506Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:06.0216147Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:06.0216735Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:06.0217284Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:06.0217878Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:06.0218413Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:06.0218999Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:06.0219490Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:06.0219987Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:06.0220452Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:06.0221238Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:06.0221814Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:06.0222316Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:06.0222790Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:06.0223324Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:06.0223786Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:06.0224298Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:06.0224838Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:06.0225347Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:06.0225903Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:06.0226504Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:06.0227096Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:06.0227656Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T12:15:06.0228166Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:06.0228630Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:06.0229210Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:06.0229684Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:06.0230260Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:06.0230811Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:06.0231409Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T12:15:06.0232049Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:06.0232601Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T12:15:06.0232963Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.0235356Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.0235896Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.0236984Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0237610Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0238519Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0239229Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0240119Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0240896Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0241501Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.0242614Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0242983Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:06.0243884Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0244023Z ('RERUN', {'yellow': True}) [0.4440s] [  0%]
2025-12-04T12:15:06.0245377Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:06.0246497Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0246965Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T12:15:06.0247421Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:06.0247888Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:06.0248437Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:06.0249012Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.0249611Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:06.0250199Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:06.0250784Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:06.0251249Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:06.0251686Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.0252300Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0252888Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0253530Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0254126Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:06.0254658Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:06.0255199Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:06.0255690Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:06.0256169Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:06.0256756Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:06.0257542Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:06.0258089Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.0258678Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:06.0259414Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:06.0260064Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:06.0260470Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:06.0261103Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:06.0261693Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:06.0262388Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:06.0263095Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:06.0263582Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:06.0264110Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:06.0264581Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:06.0265232Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:06.0265761Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:06.0266325Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:06.0266937Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:06.0267468Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:06.0268015Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:06.0268508Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:06.0269006Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:06.0269470Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:06.0270267Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:06.0270813Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:06.0271903Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:06.0272416Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:06.0272921Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:06.0273389Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:06.0274046Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:06.0274586Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:06.0275097Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:06.0275621Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:06.0276216Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:06.0276863Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:06.0277433Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T12:15:06.0277940Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:06.0278453Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:06.0279044Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:06.0279500Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:06.0280077Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:06.0280680Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:06.0281274Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T12:15:06.0281866Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:06.0282417Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T12:15:06.0282777Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.0285138Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.0285678Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.0286738Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0287401Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0288310Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0288993Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0289885Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0290684Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0291306Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.0292399Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0292796Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:06.0293698Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0293806Z FAILED [0.4455s] [  0%]
2025-12-04T12:15:06.0293814Z 
2025-12-04T12:15:06.0293971Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.0294410Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.0294536Z Traceback (most recent call last):
2025-12-04T12:15:06.0294973Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:06.0295214Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:06.0295717Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.0295970Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.0296556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.0296771Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.0297285Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.0297439Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.0297990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.0298316Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.0298849Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.0299001Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.0299481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.0299623Z     return self._compile_to_module()
2025-12-04T12:15:06.0300110Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.0300293Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.0300852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.0300983Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.0301499Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.0301730Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.0302313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.0302453Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.0303019Z   File "/tmp/tmp68dn1xf6/xe/cxeeedhdiosib6g6mpxnfz46ldep3ho7an5w23pkg733m7i4v5qx.py", line 65, in <module>
2025-12-04T12:15:06.0303497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.0303615Z     kernel.precompile(
2025-12-04T12:15:06.0304166Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.0304332Z     self._precompile_worker()
2025-12-04T12:15:06.0304924Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.0305115Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.0305706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0305904Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0306369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0306648Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0307105Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0307442Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0307669Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.0308332Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0308424Z ^
2025-12-04T12:15:06.0308882Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0308888Z 
2025-12-04T12:15:06.0309616Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0309625Z 
2025-12-04T12:15:06.0309629Z 
2025-12-04T12:15:06.0309847Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0310573Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T12:15:06.0310582Z 
2025-12-04T12:15:06.0310850Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0311089Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0311194Z frames [('total', 1)]
2025-12-04T12:15:06.0311315Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0311794Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0312019Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0312121Z graph_break []
2025-12-04T12:15:06.0312593Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.0312718Z Traceback (most recent call last):
2025-12-04T12:15:06.0313155Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:06.0313392Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:06.0313883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.0314142Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.0314655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.0314893Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.0315404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.0315556Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.0316102Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.0316456Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.0316976Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.0317136Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.0317616Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.0317750Z     return self._compile_to_module()
2025-12-04T12:15:06.0318240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.0318450Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.0318984Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.0319115Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.0319623Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.0319857Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.0320446Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.0320588Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.0321085Z   File "/tmp/tmpjl8roieu/wp/cwpr3t4i73ta32nonbipacqxobdkww3ahzw52d56nxg27hzemnre.py", line 65, in <module>
2025-12-04T12:15:06.0321548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.0321679Z     kernel.precompile(
2025-12-04T12:15:06.0322236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.0322369Z     self._precompile_worker()
2025-12-04T12:15:06.0322967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.0323147Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.0323753Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0323953Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0324418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0324662Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0325152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0325501Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0325731Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.0326377Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0326481Z ^
2025-12-04T12:15:06.0326939Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0326944Z 
2025-12-04T12:15:06.0327699Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0327708Z 
2025-12-04T12:15:06.0327713Z 
2025-12-04T12:15:06.0327935Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0328659Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T12:15:06.0328697Z 
2025-12-04T12:15:06.0328966Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0329191Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0329311Z frames [('total', 1)]
2025-12-04T12:15:06.0329430Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0329892Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0330148Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0330250Z graph_break []
2025-12-04T12:15:06.0330515Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0330624Z frames [('total', 1)]
2025-12-04T12:15:06.0330743Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0330976Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0331439Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0331540Z graph_break []
2025-12-04T12:15:06.0331704Z =================================== FAILURES ===================================
2025-12-04T12:15:06.0332111Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.0332251Z Traceback (most recent call last):
2025-12-04T12:15:06.0332685Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:06.0332919Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:06.0333430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.0333682Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.0334215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.0334410Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.0334922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.0335084Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.0335619Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.0335945Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.0336817Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.0336979Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.0337487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.0337616Z     return self._compile_to_module()
2025-12-04T12:15:06.0338103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.0338287Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.0338808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.0338954Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.0339484Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.0339727Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.0340330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.0340491Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.0340993Z   File "/tmp/tmp4wo547af/3w/c3wzmawjcbjsrcfyxmclgbfwvt3io5i75yxrhyqhu7sqxxsqbdd3.py", line 65, in <module>
2025-12-04T12:15:06.0341476Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.0341588Z     kernel.precompile(
2025-12-04T12:15:06.0342154Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.0342276Z     self._precompile_worker()
2025-12-04T12:15:06.0342874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.0343107Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.0343704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0343919Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0344369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0344615Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0345072Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0345406Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0345641Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.0346307Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0346397Z ^
2025-12-04T12:15:06.0346869Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0346878Z 
2025-12-04T12:15:06.0347592Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0347599Z 
2025-12-04T12:15:06.0347603Z 
2025-12-04T12:15:06.0347831Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0348545Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T12:15:06.0348551Z 
2025-12-04T12:15:06.0348822Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0349110Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0349217Z frames [('total', 1)]
2025-12-04T12:15:06.0349344Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0349816Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0350039Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0350153Z graph_break []
2025-12-04T12:15:06.0350371Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0350473Z frames [('total', 1)]
2025-12-04T12:15:06.0350601Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0350820Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0351322Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0351439Z graph_break []
2025-12-04T12:15:06.0351660Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0351776Z frames [('total', 1)]
2025-12-04T12:15:06.0351891Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0352144Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0352619Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0352721Z graph_break []
2025-12-04T12:15:06.0353374Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-80996ba6b8c32f81.xml -
2025-12-04T12:15:06.0353564Z =========================== short test summary info ============================
2025-12-04T12:15:06.0354428Z FAILED [0.4455s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.0355143Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0355239Z ^
2025-12-04T12:15:06.0355700Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0355718Z 
2025-12-04T12:15:06.0356423Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0356429Z 
2025-12-04T12:15:06.0356433Z 
2025-12-04T12:15:06.0356650Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0357375Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T12:15:06.0357383Z 
2025-12-04T12:15:06.0357655Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0357851Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.0358054Z ================== 1 failed, 36 deselected, 2 rerun in 4.38s ===================
2025-12-04T12:15:06.0358157Z Got exit code 1
2025-12-04T12:15:06.0358280Z Retrying single test...
2025-12-04T12:15:06.0358750Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8b26ba548538abde.xml
2025-12-04T12:15:06.0358915Z ============================= test session starts ==============================
2025-12-04T12:15:06.0359280Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.0359395Z cachedir: .pytest_cache
2025-12-04T12:15:06.0359928Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.0360104Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.0360219Z configfile: pytest.ini
2025-12-04T12:15:06.0360827Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.0361054Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:06.0361861Z stepcurrent: skipping 36 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T12:15:06.0361981Z Running 1 items in this shard
2025-12-04T12:15:06.0361986Z 
2025-12-04T12:15:06.0363381Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:06.0364497Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0364982Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T12:15:06.0365445Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:06.0365910Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:06.0366460Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:06.0367049Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.0367634Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:06.0368233Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:06.0368793Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:06.0369254Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:06.0369686Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.0370289Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0370893Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0371740Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0372340Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:06.0372870Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:06.0373407Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:06.0374026Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:06.0374512Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:06.0374996Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:06.0375778Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:06.0376380Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.0377042Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:06.0377772Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:06.0378489Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:06.0378967Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:06.0379602Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:06.0380196Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:06.0380845Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:06.0381624Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:06.0382109Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:06.0382602Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:06.0383074Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:06.0383728Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:06.0384259Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:06.0384807Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:06.0385401Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:06.0385932Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:06.0386471Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:06.0386960Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:06.0387441Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:06.0387962Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:06.0388753Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:06.0389296Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:06.0389794Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:06.0390287Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:06.0390804Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:06.0391270Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:06.0391781Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:06.0392352Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:06.0392847Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:06.0393380Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:06.0393975Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:06.0394602Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:06.0395161Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T12:15:06.0395672Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:06.0396137Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:06.0396718Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:06.0397189Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:06.0397770Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:06.0398324Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:06.0398920Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T12:15:06.0399500Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:06.0400069Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T12:15:06.0400430Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.0402850Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.0403393Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.0404491Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0405126Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0406081Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0406759Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0407644Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0408463Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0409070Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.0410174Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0410541Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:06.0411454Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0411597Z ('RERUN', {'yellow': True}) [3.4500s] [100%]
2025-12-04T12:15:06.0412950Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:06.0414056Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0414504Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T12:15:06.0414963Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:06.0415470Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:06.0416014Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:06.0416631Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.0417216Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:06.0417816Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:06.0418414Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:06.0418888Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:06.0419324Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.0419958Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0420554Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0421168Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0421764Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:06.0422333Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:06.0422875Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:06.0423371Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:06.0423852Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:06.0424331Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:06.0425117Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:06.0425661Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.0426251Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:06.0426974Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:06.0427596Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:06.0428005Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:06.0428675Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:06.0429270Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:06.0429934Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:06.0430644Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:06.0431124Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:06.0431655Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:06.0432132Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:06.0432788Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:06.0433345Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:06.0433896Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:06.0434493Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:06.0435028Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:06.0435602Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:06.0436091Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:06.0436571Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:06.0437050Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:06.0440671Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:06.0441234Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:06.0441760Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:06.0442224Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:06.0442750Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:06.0443205Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:06.0443720Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:06.0444293Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:06.0444785Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:06.0445325Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:06.0445918Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:06.0446502Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:06.0447077Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T12:15:06.0447636Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:06.0448117Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:06.0448696Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:06.0449184Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:06.0449772Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:06.0450312Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:06.0450926Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T12:15:06.0451502Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:06.0452102Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T12:15:06.0452465Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.0454889Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.0455441Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.0456629Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0457278Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0458179Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0458872Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0459757Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0460542Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0461151Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.0462326Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0462698Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:06.0463591Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0463773Z ('RERUN', {'yellow': True}) [0.4527s] [100%]
2025-12-04T12:15:06.0465123Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:06.0466219Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0466697Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T12:15:06.0467162Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:06.0467622Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:06.0468154Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:06.0468761Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.0469347Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:06.0469940Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:06.0470493Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:06.0471166Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:06.0471695Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.0472388Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0472996Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0473608Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0474193Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:06.0474745Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:06.0475276Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:06.0475949Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:06.0476434Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:06.0476904Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:06.0477770Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:06.0478366Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.0478968Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:06.0479691Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:06.0480356Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:06.0480758Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:06.0481374Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:06.0481976Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:06.0482672Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:06.0483391Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:06.0483869Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:06.0484344Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:06.0484831Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:06.0485461Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:06.0486001Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:06.0486549Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:06.0487143Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:06.0487674Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:06.0488204Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:06.0488711Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:06.0489191Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:06.0489708Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:06.0490503Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:06.0491031Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:06.0491572Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:06.0492033Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:06.0492547Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:06.0493004Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:06.0493530Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:06.0494079Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:06.0494573Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:06.0495109Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:06.0495702Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:06.0496408Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:06.0496974Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T12:15:06.0497471Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:06.0497951Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:06.0498527Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:06.0498996Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:06.0499574Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:06.0500115Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:06.0500722Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T12:15:06.0501302Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:06.0501860Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T12:15:06.0502223Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.0504623Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.0505190Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.0506247Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0506873Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0507804Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0508503Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0509389Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0510224Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0510840Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.0511944Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0512314Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:06.0513211Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0513333Z FAILED [0.4487s] [100%]
2025-12-04T12:15:06.0513342Z 
2025-12-04T12:15:06.0513489Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.0513909Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.0514037Z Traceback (most recent call last):
2025-12-04T12:15:06.0514467Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:06.0514721Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:06.0515213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.0515479Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.0515997Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.0516227Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.0516754Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.0516906Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.0517437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.0517802Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.0518324Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.0518486Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.0518967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.0519090Z     return self._compile_to_module()
2025-12-04T12:15:06.0519592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.0519793Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.0520322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.0520455Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.0520955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.0521200Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.0521787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.0521914Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.0522466Z   File "/tmp/tmprm7t157q/ry/cryfvhdvbdirhsvo7sox7acx2ecvuu3dc4h2kzejuuvfgwhl77kp.py", line 65, in <module>
2025-12-04T12:15:06.0522931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.0523076Z     kernel.precompile(
2025-12-04T12:15:06.0523632Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.0523756Z     self._precompile_worker()
2025-12-04T12:15:06.0524369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.0524555Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.0525167Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0525368Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0525825Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0526088Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0526536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0526887Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0527119Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.0527777Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0527884Z ^
2025-12-04T12:15:06.0528342Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0528348Z 
2025-12-04T12:15:06.0529106Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0529129Z 
2025-12-04T12:15:06.0529134Z 
2025-12-04T12:15:06.0529353Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0530067Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T12:15:06.0530104Z 
2025-12-04T12:15:06.0530395Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0530626Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0530749Z frames [('total', 1)]
2025-12-04T12:15:06.0530869Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0531339Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0531584Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0531688Z graph_break []
2025-12-04T12:15:06.0532123Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.0532262Z Traceback (most recent call last):
2025-12-04T12:15:06.0532687Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:06.0532941Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:06.0533433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.0533680Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.0534209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.0534441Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.0534968Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.0535117Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.0535650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.0535994Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.0536593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.0536744Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.0537244Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.0537369Z     return self._compile_to_module()
2025-12-04T12:15:06.0537875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.0538042Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.0538560Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.0538707Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.0539206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.0539453Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.0540040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.0540167Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.0540682Z   File "/tmp/tmp4i4cd1io/cx/ccxp6zwyisw4fbmmmicno2erqws77zkvu5jbkgbicpk3vb7bdqp3.py", line 65, in <module>
2025-12-04T12:15:06.0541189Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.0541305Z     kernel.precompile(
2025-12-04T12:15:06.0541874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.0541993Z     self._precompile_worker()
2025-12-04T12:15:06.0542638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.0542821Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.0543414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0543625Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0544080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0544343Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0544825Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0545162Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0545404Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.0546055Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0546149Z ^
2025-12-04T12:15:06.0546623Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0546628Z 
2025-12-04T12:15:06.0547383Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0547391Z 
2025-12-04T12:15:06.0547396Z 
2025-12-04T12:15:06.0547629Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0548345Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T12:15:06.0548353Z 
2025-12-04T12:15:06.0548637Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0548861Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0548968Z frames [('total', 1)]
2025-12-04T12:15:06.0549100Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0549569Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0549797Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0549914Z graph_break []
2025-12-04T12:15:06.0550134Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0550253Z frames [('total', 1)]
2025-12-04T12:15:06.0550369Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0550587Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0551063Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0551163Z graph_break []
2025-12-04T12:15:06.0551312Z =================================== FAILURES ===================================
2025-12-04T12:15:06.0551725Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.0551850Z Traceback (most recent call last):
2025-12-04T12:15:06.0552328Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:06.0552565Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:06.0553057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.0553318Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.0553829Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.0554067Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.0554579Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.0554728Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.0555274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.0555600Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.0556176Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.0556338Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.0556819Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.0556957Z     return self._compile_to_module()
2025-12-04T12:15:06.0557444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.0557610Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.0558142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.0558273Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.0558825Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.0559064Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.0559655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.0559797Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.0560295Z   File "/tmp/tmpgn88xo6k/x7/cx7cgqmedpm36qrxnz56ka5amiatgv4cwg3j3zdaadc6olyqssht.py", line 65, in <module>
2025-12-04T12:15:06.0560755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.0560879Z     kernel.precompile(
2025-12-04T12:15:06.0561432Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.0561565Z     self._precompile_worker()
2025-12-04T12:15:06.0562164Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.0562346Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.0562957Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0563159Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0563621Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0563866Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0564308Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0564659Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0564917Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.0565569Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0565675Z ^
2025-12-04T12:15:06.0566134Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0566170Z 
2025-12-04T12:15:06.0566894Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0566900Z 
2025-12-04T12:15:06.0566906Z 
2025-12-04T12:15:06.0567125Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0567851Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T12:15:06.0567857Z 
2025-12-04T12:15:06.0568126Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0568380Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0568502Z frames [('total', 1)]
2025-12-04T12:15:06.0568620Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0569101Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0569324Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0569424Z graph_break []
2025-12-04T12:15:06.0569654Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0569758Z frames [('total', 1)]
2025-12-04T12:15:06.0569875Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0570143Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0570611Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0570711Z graph_break []
2025-12-04T12:15:06.0571213Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0571325Z frames [('total', 1)]
2025-12-04T12:15:06.0571457Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0571680Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0572144Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0572256Z graph_break []
2025-12-04T12:15:06.0572906Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8b26ba548538abde.xml -
2025-12-04T12:15:06.0573080Z =========================== short test summary info ============================
2025-12-04T12:15:06.0573958Z FAILED [0.4487s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.0574608Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0574717Z ^
2025-12-04T12:15:06.0575177Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0575183Z 
2025-12-04T12:15:06.0575902Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0575908Z 
2025-12-04T12:15:06.0575913Z 
2025-12-04T12:15:06.0576133Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0576997Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T12:15:06.0577020Z 
2025-12-04T12:15:06.0577289Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0577470Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.0577733Z ================== 1 failed, 187 deselected, 2 rerun in 4.39s ==================
2025-12-04T12:15:06.0577838Z Got exit code 1
2025-12-04T12:15:06.0577952Z Retrying single test...
2025-12-04T12:15:06.0578441Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d73817a3e5f02a06.xml
2025-12-04T12:15:06.0578607Z ============================= test session starts ==============================
2025-12-04T12:15:06.0578965Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.0579097Z cachedir: .pytest_cache
2025-12-04T12:15:06.0579622Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.0579805Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.0579917Z configfile: pytest.ini
2025-12-04T12:15:06.0580509Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.0580748Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:06.0581542Z stepcurrent: skipping 36 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T12:15:06.0581670Z Running 1 items in this shard
2025-12-04T12:15:06.0581675Z 
2025-12-04T12:15:06.0583070Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:06.0584174Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0584624Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T12:15:06.0585077Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:06.0585549Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:06.0586087Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:06.0586641Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.0587229Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:06.0587813Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:06.0588381Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:06.0588827Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:06.0589305Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.0589904Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0590490Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0591136Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0591714Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:06.0592255Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:06.0592789Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:06.0593277Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:06.0593817Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:06.0594288Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:06.0595085Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:06.0595614Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.0596262Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:06.0596987Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:06.0597594Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:06.0598015Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:06.0598633Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:06.0599239Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:06.0599889Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:06.0600612Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:06.0601096Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:06.0601572Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:06.0602061Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:06.0602730Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:06.0603272Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:06.0603829Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:06.0604462Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:06.0605013Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:06.0605541Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:06.0606054Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:06.0606535Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:06.0607034Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:06.0607835Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:06.0608366Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:06.0608881Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:06.0609385Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:06.0609889Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:06.0610364Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:06.0610861Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:06.0611418Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:06.0611914Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:06.0612439Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:06.0613053Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:06.0613629Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:06.0614210Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T12:15:06.0614710Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:06.0615190Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:06.0615767Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:06.0616259Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:06.0616938Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:06.0617478Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:06.0618125Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T12:15:06.0618697Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:06.0619244Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T12:15:06.0619620Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.0621991Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.0622574Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.0623622Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0624267Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0625156Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0625850Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0626736Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0627517Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0628129Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.0629230Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0629614Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:06.0630541Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0630694Z ('RERUN', {'yellow': True}) [3.4181s] [100%]
2025-12-04T12:15:06.0632046Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:06.0633175Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0633622Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T12:15:06.0634076Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:06.0634587Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:06.0635121Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:06.0635678Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.0636260Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:06.0636843Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:06.0637452Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:06.0637907Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:06.0638353Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.0638955Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0639539Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0640158Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0640742Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:06.0641288Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:06.0641811Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:06.0642314Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:06.0642794Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:06.0643260Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:06.0644086Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:06.0644614Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.0645218Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:06.0645967Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:06.0646571Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:06.0646996Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:06.0647613Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:06.0648247Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:06.0648899Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:06.0649616Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:06.0650098Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:06.0650608Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:06.0651098Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:06.0651731Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:06.0652267Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:06.0652819Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:06.0653400Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:06.0653947Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:06.0654476Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:06.0654981Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:06.0655463Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:06.0655929Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:06.0656794Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:06.0657388Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:06.0657901Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:06.0658361Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:06.0658906Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:06.0659365Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:06.0659863Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:06.0660417Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:06.0660910Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:06.0661479Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:06.0662073Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:06.0662653Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:06.0663226Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T12:15:06.0663780Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:06.0664262Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:06.0664841Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:06.0665297Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:06.0665889Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:06.0666426Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:06.0667039Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T12:15:06.0667614Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:06.0668158Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T12:15:06.0668536Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.0670920Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.0671686Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.0672803Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0673445Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0674339Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0675038Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0675975Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0676761Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0677368Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.0678584Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0678970Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:06.0679857Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0680005Z ('RERUN', {'yellow': True}) [0.4365s] [100%]
2025-12-04T12:15:06.0681540Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T12:15:06.0682643Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0683092Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T12:15:06.0683543Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T12:15:06.0684018Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T12:15:06.0684558Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T12:15:06.0685117Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.0685757Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T12:15:06.0686359Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T12:15:06.0686912Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T12:15:06.0687391Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T12:15:06.0687836Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.0688442Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0689045Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0689682Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T12:15:06.0690260Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:06.0690810Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:06.0691336Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:06.0691872Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:06.0692358Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:06.0692824Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:06.0693619Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T12:15:06.0694148Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.0694753Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:06.0695476Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T12:15:06.0696091Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T12:15:06.0696560Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T12:15:06.0697181Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T12:15:06.0697785Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T12:15:06.0698433Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T12:15:06.0699196Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T12:15:06.0699684Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T12:15:06.0700157Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T12:15:06.0700675Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T12:15:06.0701327Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T12:15:06.0701864Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T12:15:06.0702414Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T12:15:06.0703032Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T12:15:06.0703582Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T12:15:06.0704114Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T12:15:06.0704617Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T12:15:06.0705101Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T12:15:06.0705630Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T12:15:06.0706420Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T12:15:06.0706951Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T12:15:06.0707469Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T12:15:06.0707933Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T12:15:06.0708449Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T12:15:06.0708911Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T12:15:06.0709413Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T12:15:06.0709968Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T12:15:06.0710471Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T12:15:06.0711008Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T12:15:06.0711607Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T12:15:06.0712237Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T12:15:06.0712816Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T12:15:06.0713317Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T12:15:06.0713832Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T12:15:06.0714412Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T12:15:06.0714872Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T12:15:06.0715470Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T12:15:06.0716012Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T12:15:06.0716660Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T12:15:06.0717239Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T12:15:06.0717802Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T12:15:06.0718167Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.0720539Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.0721094Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.0722143Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0722782Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0723690Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0724388Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0725275Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0726103Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0726717Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.0727815Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0728226Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:06.0729164Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0729329Z FAILED [0.4437s] [100%]
2025-12-04T12:15:06.0729340Z 
2025-12-04T12:15:06.0729511Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.0729972Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.0730097Z Traceback (most recent call last):
2025-12-04T12:15:06.0730521Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:06.0730771Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:06.0731256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.0731507Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.0732032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.0732258Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.0732782Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.0732931Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.0733468Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.0733806Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.0734327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.0734488Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.0734970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.0735094Z     return self._compile_to_module()
2025-12-04T12:15:06.0735599Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.0735767Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.0736349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.0736497Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.0736996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.0737244Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.0737834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.0737962Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.0738482Z   File "/tmp/tmpv46aehkk/wb/cwbdu5clk27grsjparnqcqxu533u4tall67yccoeu3w7liyrugrq.py", line 65, in <module>
2025-12-04T12:15:06.0738996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.0739128Z     kernel.precompile(
2025-12-04T12:15:06.0739688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.0739809Z     self._precompile_worker()
2025-12-04T12:15:06.0740453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.0740636Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.0741248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0741449Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0741905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0742161Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0742637Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0742973Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0743216Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.0743870Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0743976Z ^
2025-12-04T12:15:06.0744433Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0744439Z 
2025-12-04T12:15:06.0745190Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0745212Z 
2025-12-04T12:15:06.0745218Z 
2025-12-04T12:15:06.0745438Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0746153Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T12:15:06.0746161Z 
2025-12-04T12:15:06.0746444Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0746675Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0746796Z frames [('total', 1)]
2025-12-04T12:15:06.0746915Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0747382Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0747625Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0747727Z graph_break []
2025-12-04T12:15:06.0748132Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.0748272Z Traceback (most recent call last):
2025-12-04T12:15:06.0748698Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:06.0748948Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:06.0749440Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.0749689Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.0750215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.0750411Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.0750960Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.0751124Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.0751656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.0751990Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.0752542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.0752690Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.0753184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.0753308Z     return self._compile_to_module()
2025-12-04T12:15:06.0753812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.0753977Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.0754545Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.0754689Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.0755184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.0755420Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.0756023Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.0756149Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.0756628Z   File "/tmp/tmpjikr7_i1/z2/cz2ugkaiwh4il4c4mrl27mf257qzxt4nf5ka7czk3gk6jk5l5ypd.py", line 65, in <module>
2025-12-04T12:15:06.0757124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.0757240Z     kernel.precompile(
2025-12-04T12:15:06.0757812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.0757931Z     self._precompile_worker()
2025-12-04T12:15:06.0758544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.0758725Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.0759320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0759533Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0759988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0760234Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0760692Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0761027Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0761268Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.0761920Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0762010Z ^
2025-12-04T12:15:06.0762481Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0762488Z 
2025-12-04T12:15:06.0763236Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0763242Z 
2025-12-04T12:15:06.0763249Z 
2025-12-04T12:15:06.0763480Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0764191Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T12:15:06.0764231Z 
2025-12-04T12:15:06.0764515Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0764737Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0764843Z frames [('total', 1)]
2025-12-04T12:15:06.0764972Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0765439Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0765667Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0765779Z graph_break []
2025-12-04T12:15:06.0765999Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0766149Z frames [('total', 1)]
2025-12-04T12:15:06.0766267Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0766487Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0766962Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0767065Z graph_break []
2025-12-04T12:15:06.0767218Z =================================== FAILURES ===================================
2025-12-04T12:15:06.0767639Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.0767765Z Traceback (most recent call last):
2025-12-04T12:15:06.0768248Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T12:15:06.0768487Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T12:15:06.0768978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.0769243Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.0769757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.0769956Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.0770480Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.0770631Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.0771395Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.0771727Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.0772247Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.0772415Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.0772898Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.0773039Z     return self._compile_to_module()
2025-12-04T12:15:06.0778760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.0778985Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.0779539Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.0779675Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.0780340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.0780606Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.0781196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.0781345Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.0781899Z   File "/tmp/tmp8xe0k353/x4/cx4q76mhbm4gdp4jsi72uakzjarpab5hz6blpceocira7y3nyci7.py", line 65, in <module>
2025-12-04T12:15:06.0782368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.0782501Z     kernel.precompile(
2025-12-04T12:15:06.0783057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.0783178Z     self._precompile_worker()
2025-12-04T12:15:06.0783796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.0784031Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.0784636Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0784841Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0785295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0785560Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0786003Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0786352Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0786637Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.0787295Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0787408Z ^
2025-12-04T12:15:06.0787867Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0787876Z 
2025-12-04T12:15:06.0788602Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0788608Z 
2025-12-04T12:15:06.0788613Z 
2025-12-04T12:15:06.0788832Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0789546Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T12:15:06.0789569Z 
2025-12-04T12:15:06.0789837Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0790066Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0790187Z frames [('total', 1)]
2025-12-04T12:15:06.0790306Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0790773Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0791012Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0791115Z graph_break []
2025-12-04T12:15:06.0791341Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0791458Z frames [('total', 1)]
2025-12-04T12:15:06.0791576Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0791809Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0792309Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0792415Z graph_break []
2025-12-04T12:15:06.0792649Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0792760Z frames [('total', 1)]
2025-12-04T12:15:06.0792874Z stats [('calls_captured', 10)]
2025-12-04T12:15:06.0793136Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.0793593Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0793708Z graph_break []
2025-12-04T12:15:06.0794357Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d73817a3e5f02a06.xml -
2025-12-04T12:15:06.0794532Z =========================== short test summary info ============================
2025-12-04T12:15:06.0795416Z FAILED [0.4437s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.0796100Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T12:15:06.0796204Z ^
2025-12-04T12:15:06.0796668Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0796674Z 
2025-12-04T12:15:06.0797380Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0797386Z 
2025-12-04T12:15:06.0797405Z 
2025-12-04T12:15:06.0797626Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0798374Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T12:15:06.0798382Z 
2025-12-04T12:15:06.0798670Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0798856Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.0799058Z ================== 1 failed, 187 deselected, 2 rerun in 4.34s ==================
2025-12-04T12:15:06.0799175Z Got exit code 1
2025-12-04T12:15:06.0799808Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T12:15:06.0800230Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:06.0800697Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-90e37d7f0968dad1.xml
2025-12-04T12:15:06.0800865Z ============================= test session starts ==============================
2025-12-04T12:15:06.0801235Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.0801350Z cachedir: .pytest_cache
2025-12-04T12:15:06.0801882Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.0802013Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.0802123Z configfile: pytest.ini
2025-12-04T12:15:06.0802729Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.0802954Z collecting ... collected 188 items / 37 deselected / 151 selected
2025-12-04T12:15:06.0803099Z stepcurrent: skipping 37 already run items.
2025-12-04T12:15:06.0803229Z Running 151 items in this shard
2025-12-04T12:15:06.0803234Z 
2025-12-04T12:15:06.0803836Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,1,15_cuda PASSED [3.5371s] [  0%]
2025-12-04T12:15:06.0804418Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,15_cuda PASSED [0.6738s] [  1%]
2025-12-04T12:15:06.0804985Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,4096_cuda PASSED [0.8417s] [  1%]
2025-12-04T12:15:06.0805601Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,512_cuda PASSED [0.5862s] [  2%]
2025-12-04T12:15:06.0806185Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_4,2048,4096_cuda PASSED [0.9603s] [  3%]
2025-12-04T12:15:06.0806731Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,1,15_cuda PASSED [0.7028s] [  3%]
2025-12-04T12:15:06.0807297Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,15_cuda PASSED [0.7668s] [  4%]
2025-12-04T12:15:06.0807900Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,4096_cuda PASSED [1.0653s] [  5%]
2025-12-04T12:15:06.0808458Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,512_cuda PASSED [0.7732s] [  5%]
2025-12-04T12:15:06.0809041Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_4,2048,4096_cuda PASSED [1.1238s] [  6%]
2025-12-04T12:15:06.0809554Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda PASSED [0.4479s] [  7%]
2025-12-04T12:15:06.0810092Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda PASSED [0.4350s] [  7%]
2025-12-04T12:15:06.0810629Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_16,16,16_cuda PASSED [0.3957s] [  8%]
2025-12-04T12:15:06.0811158Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.4104s] [  9%]
2025-12-04T12:15:06.0811740Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda ('RERUN', {'yellow': True}) [1.3575s] [  9%]
2025-12-04T12:15:06.0812318Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda ('RERUN', {'yellow': True}) [1.1440s] [  9%]
2025-12-04T12:15:06.0812836Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda FAILED [1.0731s] [  9%]
2025-12-04T12:15:06.0812842Z 
2025-12-04T12:15:06.0812986Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.0813317Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T12:15:06.0813448Z Traceback (most recent call last):
2025-12-04T12:15:06.0813859Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.0814027Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.0814520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.0814770Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.0815297Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.0815493Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.0816016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.0816164Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.0816847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.0817190Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.0817710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.0817871Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.0818385Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.0818508Z     return self._compile_to_module()
2025-12-04T12:15:06.0819006Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.0819168Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.0819684Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.0819825Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.0820354Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.0820600Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.0821184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.0821313Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.0821827Z   File "/tmp/tmpw4l6efgy/rm/crmgjskr4wu2ah2xky35exzl3u4jvb7w5dsvy63hh3bu3hqsieaq.py", line 84, in <module>
2025-12-04T12:15:06.0822279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T12:15:06.0822408Z     self._wait_futures(scope)
2025-12-04T12:15:06.0822941Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T12:15:06.0823061Z     kernel = result.result()
2025-12-04T12:15:06.0823519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T12:15:06.0823635Z     return self.result_fn()
2025-12-04T12:15:06.0824113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T12:15:06.0824256Z     raise e.with_name(kernel_name) from e
2025-12-04T12:15:06.0824636Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T12:15:06.0824643Z 
2025-12-04T12:15:06.0824786Z Name=triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.0824910Z Traceback (most recent call last):
2025-12-04T12:15:06.0825449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T12:15:06.0825565Z     result = job()
2025-12-04T12:15:06.0826158Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T12:15:06.0826314Z     kernel.precompile(warm_cache_only=True)
2025-12-04T12:15:06.0826872Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T12:15:06.0826993Z     self._precompile_worker()
2025-12-04T12:15:06.0827603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.0827785Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.0828375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0828583Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0829068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0829326Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0829773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0830107Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0830338Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.0830649Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.0830736Z ^
2025-12-04T12:15:06.0831207Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0831212Z 
2025-12-04T12:15:06.0831217Z 
2025-12-04T12:15:06.0831936Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0831942Z 
2025-12-04T12:15:06.0831981Z 
2025-12-04T12:15:06.0832212Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0832961Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.0832973Z 
2025-12-04T12:15:06.0833319Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0833552Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0833658Z frames [('total', 1)]
2025-12-04T12:15:06.0833789Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.0834011Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.0834653Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T12:15:06.0834759Z graph_break []
2025-12-04T12:15:06.0835078Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T12:15:06.0835222Z Traceback (most recent call last):
2025-12-04T12:15:06.0835629Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.0835778Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.0836286Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.0836537Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.0837063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.0837259Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.0837774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.0837937Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.0838470Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.0838806Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.0839332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.0839483Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.0839973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.0840095Z     return self._compile_to_module()
2025-12-04T12:15:06.0840582Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.0840795Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.0841311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.0841457Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.0841954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.0842221Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.0842824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.0842952Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.0843457Z   File "/tmp/tmpsws2xeka/3y/c3ykxdx5kb37kftbduqqa2c4ga6td42ejfr5fep3ellkp6vbyla4.py", line 84, in <module>
2025-12-04T12:15:06.0844015Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T12:15:06.0844134Z     self._wait_futures(scope)
2025-12-04T12:15:06.0844684Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T12:15:06.0844800Z     kernel = result.result()
2025-12-04T12:15:06.0845246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T12:15:06.0845375Z     return self.result_fn()
2025-12-04T12:15:06.0845853Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T12:15:06.0845996Z     raise e.with_name(kernel_name) from e
2025-12-04T12:15:06.0846381Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T12:15:06.0846387Z 
2025-12-04T12:15:06.0846518Z Name=triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.0846689Z Traceback (most recent call last):
2025-12-04T12:15:06.0847231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T12:15:06.0847334Z     result = job()
2025-12-04T12:15:06.0847937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T12:15:06.0848081Z     kernel.precompile(warm_cache_only=True)
2025-12-04T12:15:06.0848651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T12:15:06.0848770Z     self._precompile_worker()
2025-12-04T12:15:06.0849363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.0849560Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.0850162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0850377Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0850835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0851085Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0851548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0851885Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0852070Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.0852394Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.0852486Z ^
2025-12-04T12:15:06.0853011Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0853017Z 
2025-12-04T12:15:06.0853022Z 
2025-12-04T12:15:06.0853737Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0853745Z 
2025-12-04T12:15:06.0853750Z 
2025-12-04T12:15:06.0853979Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0854642Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.0854648Z 
2025-12-04T12:15:06.0854915Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0855156Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0855262Z frames [('total', 1)]
2025-12-04T12:15:06.0855393Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.0855622Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.0856201Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T12:15:06.0856441Z graph_break []
2025-12-04T12:15:06.0856668Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0856776Z frames [('total', 1)]
2025-12-04T12:15:06.0856909Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.0857131Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.0857725Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T12:15:06.0857828Z graph_break []
2025-12-04T12:15:06.0857977Z =================================== FAILURES ===================================
2025-12-04T12:15:06.0858350Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T12:15:06.0858476Z Traceback (most recent call last):
2025-12-04T12:15:06.0858887Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.0859048Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.0859533Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.0859794Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.0860306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.0860498Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.0861021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.0861170Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.0861709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.0862040Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.0862556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.0862723Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.0863203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.0863328Z     return self._compile_to_module()
2025-12-04T12:15:06.0863826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.0863991Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.0864562Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.0864697Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.0865195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.0865441Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.0866057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.0866182Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.0866670Z   File "/tmp/tmp9m_mh11q/i2/ci26xsjmz6ze2lfn72pqyotyrk5vbfpz32vup26viakrnept7s6z.py", line 84, in <module>
2025-12-04T12:15:06.0867121Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T12:15:06.0867252Z     self._wait_futures(scope)
2025-12-04T12:15:06.0867752Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T12:15:06.0867905Z     kernel = result.result()
2025-12-04T12:15:06.0868358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T12:15:06.0868473Z     return self.result_fn()
2025-12-04T12:15:06.0868965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T12:15:06.0869094Z     raise e.with_name(kernel_name) from e
2025-12-04T12:15:06.0869476Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T12:15:06.0869482Z 
2025-12-04T12:15:06.0869629Z Name=triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.0869750Z Traceback (most recent call last):
2025-12-04T12:15:06.0870338Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T12:15:06.0870453Z     result = job()
2025-12-04T12:15:06.0871283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T12:15:06.0871445Z     kernel.precompile(warm_cache_only=True)
2025-12-04T12:15:06.0872002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T12:15:06.0872122Z     self._precompile_worker()
2025-12-04T12:15:06.0872731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.0872909Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.0873515Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0873719Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0874174Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0874436Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0874879Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0875218Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0875418Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.0875730Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.0875838Z ^
2025-12-04T12:15:06.0876296Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0876302Z 
2025-12-04T12:15:06.0876307Z 
2025-12-04T12:15:06.0877123Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0877145Z 
2025-12-04T12:15:06.0877149Z 
2025-12-04T12:15:06.0877370Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0878002Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.0878062Z 
2025-12-04T12:15:06.0878349Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0878574Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0878695Z frames [('total', 1)]
2025-12-04T12:15:06.0878815Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.0879040Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.0879639Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T12:15:06.0879870Z graph_break []
2025-12-04T12:15:06.0880091Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0880210Z frames [('total', 1)]
2025-12-04T12:15:06.0880326Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.0880549Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.0881143Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T12:15:06.0881243Z graph_break []
2025-12-04T12:15:06.0881473Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0881579Z frames [('total', 1)]
2025-12-04T12:15:06.0881696Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.0881975Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.0882554Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T12:15:06.0882655Z graph_break []
2025-12-04T12:15:06.0883321Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-90e37d7f0968dad1.xml -
2025-12-04T12:15:06.0883497Z =========================== short test summary info ============================
2025-12-04T12:15:06.0884445Z FAILED [1.0731s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T12:15:06.0884452Z 
2025-12-04T12:15:06.0884582Z Name=triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.0884705Z Traceback (most recent call last):
2025-12-04T12:15:06.0885273Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T12:15:06.0885371Z     result = job()
2025-12-04T12:15:06.0885975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T12:15:06.0886115Z     kernel.precompile(warm_cache_only=True)
2025-12-04T12:15:06.0886666Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T12:15:06.0886800Z     self._precompile_worker()
2025-12-04T12:15:06.0887391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.0887582Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.0888175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0888408Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0888871Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0889118Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0889560Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0889938Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0890121Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.0890442Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.0890531Z ^
2025-12-04T12:15:06.0890986Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0890992Z 
2025-12-04T12:15:06.0890996Z 
2025-12-04T12:15:06.0891724Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0891759Z 
2025-12-04T12:15:06.0891764Z 
2025-12-04T12:15:06.0891981Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0892627Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.0892634Z 
2025-12-04T12:15:06.0892902Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0893095Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.0893318Z ============ 1 failed, 14 passed, 37 deselected, 2 rerun in 16.37s =============
2025-12-04T12:15:06.0893420Z Got exit code 1
2025-12-04T12:15:06.0893541Z Retrying single test...
2025-12-04T12:15:06.0894059Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6ba281452d587f38.xml
2025-12-04T12:15:06.0894227Z ============================= test session starts ==============================
2025-12-04T12:15:06.0894593Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.0894705Z cachedir: .pytest_cache
2025-12-04T12:15:06.0895241Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.0895374Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.0895485Z configfile: pytest.ini
2025-12-04T12:15:06.0896090Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.0896387Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:06.0897098Z stepcurrent: skipping 51 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.0897232Z Running 1 items in this shard
2025-12-04T12:15:06.0897237Z 
2025-12-04T12:15:06.0898372Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.0899141Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.0899688Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.0900314Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.0900821Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.0901261Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.0901881Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.0902429Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.0902891Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T12:15:06.0903459Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T12:15:06.0903906Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T12:15:06.0904517Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T12:15:06.0905030Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:06.0905572Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T12:15:06.0906122Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T12:15:06.0906483Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.0908204Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.0908748Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.0909804Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0910436Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0911352Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0912037Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0912941Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0913735Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0914392Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.0915150Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.0915524Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.0916467Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0916604Z ('RERUN', {'yellow': True}) [3.9740s] [100%]
2025-12-04T12:15:06.0917748Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.0918503Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.0919087Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.0919673Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.0920176Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.0920628Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.0921260Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.0921789Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.0922245Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T12:15:06.0922816Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T12:15:06.0923278Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T12:15:06.0923844Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T12:15:06.0924376Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:06.0924913Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T12:15:06.0925469Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T12:15:06.0925851Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.0927517Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.0928113Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.0929168Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0929818Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0930747Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0931433Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0932338Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0933143Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0933772Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.0934529Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.0934912Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.0935839Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0935979Z ('RERUN', {'yellow': True}) [0.7456s] [100%]
2025-12-04T12:15:06.0937178Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.0937935Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.0938503Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.0939078Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.0939598Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.0940041Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.0940640Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.0941172Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.0941619Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T12:15:06.0942243Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T12:15:06.0942691Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T12:15:06.0943260Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T12:15:06.0943786Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:06.0944353Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T12:15:06.0944919Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T12:15:06.0945285Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.0946956Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.0947547Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.0948591Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0949270Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0950165Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0950862Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0951750Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0952539Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0953155Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.0953911Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.0954300Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.0955205Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0955323Z FAILED [0.7389s] [100%]
2025-12-04T12:15:06.0955329Z 
2025-12-04T12:15:06.0955475Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.0955841Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T12:15:06.0955970Z Traceback (most recent call last):
2025-12-04T12:15:06.0956380Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.0956545Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.0957034Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.0957317Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.0957842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.0958038Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.0958557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.0958706Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.0959243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.0959607Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.0960126Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.0960293Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.0960778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.0960901Z     return self._compile_to_module()
2025-12-04T12:15:06.0961397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.0961561Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.0962109Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.0962259Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.0962754Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.0963006Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.0963593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.0963720Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.0964228Z   File "/tmp/tmpkk2kc1li/x4/cx4yo2xtzpj37rqe6m7qsvlhsokh4mckkizs3c3i4wz7m4xrpilx.py", line 50, in <module>
2025-12-04T12:15:06.0964694Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.0964820Z     kernel.precompile(
2025-12-04T12:15:06.0965379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.0965500Z     self._precompile_worker()
2025-12-04T12:15:06.0966104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.0966285Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.0966883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0967099Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0967551Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0967809Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0968289Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0968623Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0968865Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.0969174Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.0969300Z ^
2025-12-04T12:15:06.0969772Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0969777Z 
2025-12-04T12:15:06.0970492Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0970498Z 
2025-12-04T12:15:06.0970502Z 
2025-12-04T12:15:06.0970732Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0971627Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.0971723Z 
2025-12-04T12:15:06.0972013Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0972238Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0972345Z frames [('total', 1)]
2025-12-04T12:15:06.0972483Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.0972949Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0973187Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.0973288Z graph_break []
2025-12-04T12:15:06.0973603Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T12:15:06.0973742Z Traceback (most recent call last):
2025-12-04T12:15:06.0974200Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.0974351Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.0974855Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.0975103Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.0975627Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.0975822Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.0976390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.0976555Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.0977088Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.0977412Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.0977949Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.0978097Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.0978588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.0978714Z     return self._compile_to_module()
2025-12-04T12:15:06.0979201Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.0979381Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.0979897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.0980042Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.0980595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.0980832Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.0981428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.0981596Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.0982092Z   File "/tmp/tmp3v20ssgk/t4/ct4xw3u2u3v3vzzrrtwonytmh6n3mgbjmegdscqsl3oawb4ptn4q.py", line 50, in <module>
2025-12-04T12:15:06.0982567Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.0982679Z     kernel.precompile(
2025-12-04T12:15:06.0983245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.0983367Z     self._precompile_worker()
2025-12-04T12:15:06.0983968Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.0984201Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.0984795Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.0985013Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.0985461Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.0985707Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.0986162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.0986532Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.0986766Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.0987094Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.0987189Z ^
2025-12-04T12:15:06.0987661Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.0987669Z 
2025-12-04T12:15:06.0988381Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.0988387Z 
2025-12-04T12:15:06.0988392Z 
2025-12-04T12:15:06.0988627Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.0989259Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.0989265Z 
2025-12-04T12:15:06.0989537Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.0989773Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0989880Z frames [('total', 1)]
2025-12-04T12:15:06.0990000Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.0990478Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0990702Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.0990819Z graph_break []
2025-12-04T12:15:06.0991038Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.0991142Z frames [('total', 1)]
2025-12-04T12:15:06.0991270Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.0991489Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.0991985Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.0992101Z graph_break []
2025-12-04T12:15:06.0992251Z =================================== FAILURES ===================================
2025-12-04T12:15:06.0992580Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T12:15:06.0992706Z Traceback (most recent call last):
2025-12-04T12:15:06.0993189Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.0993353Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.0993845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.0994096Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.0994622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.0994823Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.0995349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.0995540Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.0996076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.0996417Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.0996940Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.0997104Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.0997589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.0997744Z     return self._compile_to_module()
2025-12-04T12:15:06.0998248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.0998420Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.0998938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.0999084Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.0999582Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.0999828Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1000415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1000543Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1001073Z   File "/tmp/tmpm6mlzava/in/cinos7dxtkcjfqeuab6hdpxejcycroispq5vl6d5ovoehvkr2qwa.py", line 50, in <module>
2025-12-04T12:15:06.1001539Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1001672Z     kernel.precompile(
2025-12-04T12:15:06.1002228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1002350Z     self._precompile_worker()
2025-12-04T12:15:06.1002958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1003146Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1003741Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1003958Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1004444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1004706Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1005149Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1005486Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1005771Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1006080Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1006184Z ^
2025-12-04T12:15:06.1006642Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1006647Z 
2025-12-04T12:15:06.1007364Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1007371Z 
2025-12-04T12:15:06.1007404Z 
2025-12-04T12:15:06.1007634Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1008259Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1008267Z 
2025-12-04T12:15:06.1008549Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1008769Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1008877Z frames [('total', 1)]
2025-12-04T12:15:06.1009010Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1009474Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1009749Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1009853Z graph_break []
2025-12-04T12:15:06.1010073Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1010199Z frames [('total', 1)]
2025-12-04T12:15:06.1010314Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1010532Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1011009Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1011112Z graph_break []
2025-12-04T12:15:06.1011340Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1011446Z frames [('total', 1)]
2025-12-04T12:15:06.1011560Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1011789Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1012248Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1012349Z graph_break []
2025-12-04T12:15:06.1013009Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6ba281452d587f38.xml -
2025-12-04T12:15:06.1013186Z =========================== short test summary info ============================
2025-12-04T12:15:06.1013981Z FAILED [0.7389s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1014300Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1014390Z ^
2025-12-04T12:15:06.1014858Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1014864Z 
2025-12-04T12:15:06.1015610Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1015616Z 
2025-12-04T12:15:06.1015621Z 
2025-12-04T12:15:06.1015854Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1016560Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1016605Z 
2025-12-04T12:15:06.1016876Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1017072Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.1017274Z ================== 1 failed, 187 deselected, 2 rerun in 5.50s ==================
2025-12-04T12:15:06.1017390Z Got exit code 1
2025-12-04T12:15:06.1017499Z Retrying single test...
2025-12-04T12:15:06.1017972Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-85d1d6e9267cc116.xml
2025-12-04T12:15:06.1018156Z ============================= test session starts ==============================
2025-12-04T12:15:06.1018509Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.1018652Z cachedir: .pytest_cache
2025-12-04T12:15:06.1019184Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.1019312Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.1019436Z configfile: pytest.ini
2025-12-04T12:15:06.1020025Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.1020250Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:06.1020967Z stepcurrent: skipping 51 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1021117Z Running 1 items in this shard
2025-12-04T12:15:06.1021122Z 
2025-12-04T12:15:06.1022275Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1023030Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1023584Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1024162Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1024668Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1025120Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1025725Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.1026252Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.1026705Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T12:15:06.1027276Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T12:15:06.1027731Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T12:15:06.1028338Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T12:15:06.1028873Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:06.1029402Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T12:15:06.1029984Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T12:15:06.1030364Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1032041Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1032631Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1033884Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1034532Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1035472Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1036160Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1037058Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1037832Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1038453Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1039214Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1039602Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1040505Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1040642Z ('RERUN', {'yellow': True}) [3.9679s] [100%]
2025-12-04T12:15:06.1041782Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1042579Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1043143Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1043710Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1044250Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1044690Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1045292Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.1045822Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.1046269Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T12:15:06.1046892Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T12:15:06.1047332Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T12:15:06.1047900Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T12:15:06.1048422Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:06.1048985Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T12:15:06.1049546Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T12:15:06.1049914Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1051605Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1052145Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1053193Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1053837Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1054728Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1055428Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1056459Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1057256Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1057867Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1058652Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1059037Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1059939Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1060087Z ('RERUN', {'yellow': True}) [0.7474s] [100%]
2025-12-04T12:15:06.1061247Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1062013Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1062564Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1063163Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1063678Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1064117Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1064729Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.1065248Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.1065702Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T12:15:06.1066282Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T12:15:06.1066725Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T12:15:06.1067311Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T12:15:06.1067823Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:06.1068370Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T12:15:06.1068920Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T12:15:06.1069288Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1071246Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1071799Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1072905Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1073538Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1074453Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1075307Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1076204Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1076993Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1077691Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1078458Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1078838Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1079751Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1079856Z FAILED [0.7434s] [100%]
2025-12-04T12:15:06.1079862Z 
2025-12-04T12:15:06.1080008Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.1080339Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T12:15:06.1080467Z Traceback (most recent call last):
2025-12-04T12:15:06.1080893Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1081046Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1081536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1081799Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1082314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1082508Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1083033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1083178Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1083765Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1084087Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1084613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1084790Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1085379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1085519Z     return self._compile_to_module()
2025-12-04T12:15:06.1086005Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1086171Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1086704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1086842Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1087339Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1087618Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1088207Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1088354Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1088855Z   File "/tmp/tmpp2r2fe96/yf/cyfrmu5oofhiqi7gdqlt24x4w5coydol7zmfuv5lbwr5rgz7fvmd.py", line 50, in <module>
2025-12-04T12:15:06.1089319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1089448Z     kernel.precompile(
2025-12-04T12:15:06.1090048Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1090187Z     self._precompile_worker()
2025-12-04T12:15:06.1090785Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1090969Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1091582Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1091785Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1092242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1092507Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1092951Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1093307Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1093536Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1093850Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1093958Z ^
2025-12-04T12:15:06.1094423Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1094431Z 
2025-12-04T12:15:06.1095163Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1095169Z 
2025-12-04T12:15:06.1095174Z 
2025-12-04T12:15:06.1095394Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1096026Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1096082Z 
2025-12-04T12:15:06.1096422Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1096656Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1096778Z frames [('total', 1)]
2025-12-04T12:15:06.1096898Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1097365Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1097641Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1097744Z graph_break []
2025-12-04T12:15:06.1098077Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T12:15:06.1098206Z Traceback (most recent call last):
2025-12-04T12:15:06.1098614Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1098783Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1099275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1099561Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1100091Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1100287Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1100815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1100960Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1101490Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1101854Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1102380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1102543Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1103021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1103143Z     return self._compile_to_module()
2025-12-04T12:15:06.1103647Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1103812Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1104329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1104470Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1104969Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1105212Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1105799Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1105926Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1106414Z   File "/tmp/tmpfes_8dbg/t4/ct4nuoctpw6xjefrqvcmsjk2byuv57oevub7n5jmumauwqhfd5oc.py", line 50, in <module>
2025-12-04T12:15:06.1106880Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1107005Z     kernel.precompile(
2025-12-04T12:15:06.1107559Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1107676Z     self._precompile_worker()
2025-12-04T12:15:06.1108326Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1108508Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1109106Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1109319Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1109804Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1110069Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1110511Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1110846Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1111090Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1111400Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1111524Z ^
2025-12-04T12:15:06.1111993Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1111998Z 
2025-12-04T12:15:06.1112707Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1112716Z 
2025-12-04T12:15:06.1112721Z 
2025-12-04T12:15:06.1112950Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1113582Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1113588Z 
2025-12-04T12:15:06.1113869Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1114127Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1114235Z frames [('total', 1)]
2025-12-04T12:15:06.1114370Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1114834Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1115069Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1115172Z graph_break []
2025-12-04T12:15:06.1115391Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1115506Z frames [('total', 1)]
2025-12-04T12:15:06.1115623Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1115842Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1116314Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1116414Z graph_break []
2025-12-04T12:15:06.1116565Z =================================== FAILURES ===================================
2025-12-04T12:15:06.1116899Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T12:15:06.1117023Z Traceback (most recent call last):
2025-12-04T12:15:06.1117446Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1117599Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1118091Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1118354Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1118865Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1119073Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1119616Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1119766Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1120310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1120631Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1121178Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1121342Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1121823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1121959Z     return self._compile_to_module()
2025-12-04T12:15:06.1122445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1122608Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1123172Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1123300Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1123806Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1124042Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1124625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1124765Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1125268Z   File "/tmp/tmpdtqwhkrj/ae/caeu5yzsejv7nmnevpk5vutu2gbz4clgszbzwsqmlxvdwamdh2sw.py", line 50, in <module>
2025-12-04T12:15:06.1125763Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1125891Z     kernel.precompile(
2025-12-04T12:15:06.1126448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1126577Z     self._precompile_worker()
2025-12-04T12:15:06.1127171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1127351Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1127959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1128157Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1128620Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1128866Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1129310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1129655Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1129882Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1130193Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1130295Z ^
2025-12-04T12:15:06.1130752Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1130757Z 
2025-12-04T12:15:06.1131477Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1131483Z 
2025-12-04T12:15:06.1131490Z 
2025-12-04T12:15:06.1131756Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1132399Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1132405Z 
2025-12-04T12:15:06.1132675Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1132930Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1133047Z frames [('total', 1)]
2025-12-04T12:15:06.1133164Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1133627Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1133862Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1133963Z graph_break []
2025-12-04T12:15:06.1134198Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1134303Z frames [('total', 1)]
2025-12-04T12:15:06.1134417Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1134712Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1135172Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1135274Z graph_break []
2025-12-04T12:15:06.1135505Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1135608Z frames [('total', 1)]
2025-12-04T12:15:06.1135738Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1135958Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1136498Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1136616Z graph_break []
2025-12-04T12:15:06.1137308Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-85d1d6e9267cc116.xml -
2025-12-04T12:15:06.1137489Z =========================== short test summary info ============================
2025-12-04T12:15:06.1138280Z FAILED [0.7434s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1138595Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1138701Z ^
2025-12-04T12:15:06.1139161Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1139167Z 
2025-12-04T12:15:06.1139878Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1139896Z 
2025-12-04T12:15:06.1139903Z 
2025-12-04T12:15:06.1140125Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1140755Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1140761Z 
2025-12-04T12:15:06.1141043Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1141229Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.1141446Z ================== 1 failed, 187 deselected, 2 rerun in 5.50s ==================
2025-12-04T12:15:06.1141549Z Got exit code 1
2025-12-04T12:15:06.1142090Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1142513Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:06.1143029Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7a610c26dd7fa0e9.xml
2025-12-04T12:15:06.1143197Z ============================= test session starts ==============================
2025-12-04T12:15:06.1143564Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.1143677Z cachedir: .pytest_cache
2025-12-04T12:15:06.1144208Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.1144376Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.1144487Z configfile: pytest.ini
2025-12-04T12:15:06.1145092Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.1145319Z collecting ... collected 188 items / 52 deselected / 136 selected
2025-12-04T12:15:06.1145464Z stepcurrent: skipping 52 already run items.
2025-12-04T12:15:06.1145600Z Running 136 items in this shard
2025-12-04T12:15:06.1145605Z 
2025-12-04T12:15:06.1146773Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1147595Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1148157Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1148743Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1149290Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1149737Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1150360Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.1150880Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.1151346Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T12:15:06.1151917Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T12:15:06.1152364Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T12:15:06.1152950Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T12:15:06.1153472Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:06.1154018Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T12:15:06.1154574Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T12:15:06.1154945Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1156675Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1157221Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1158322Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1158954Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1159878Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1160606Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1161516Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1162295Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1162907Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1163722Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1164104Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1165017Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1165155Z ('RERUN', {'yellow': True}) [3.8173s] [  0%]
2025-12-04T12:15:06.1166318Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1167078Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1167627Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1168210Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1168715Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1169169Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1169774Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.1170328Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.1170801Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T12:15:06.1171584Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T12:15:06.1172107Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T12:15:06.1172677Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T12:15:06.1173191Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:06.1173739Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T12:15:06.1174292Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T12:15:06.1174719Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1176453Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1177071Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1178114Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1178762Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1179658Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1180343Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1181251Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1182022Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1182649Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1183400Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1183792Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1184733Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1184868Z ('RERUN', {'yellow': True}) [0.7586s] [  0%]
2025-12-04T12:15:06.1186031Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1186837Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1187400Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1187967Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1188484Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1188959Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1189560Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.1190091Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.1190539Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T12:15:06.1191120Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T12:15:06.1191598Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T12:15:06.1192328Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T12:15:06.1192855Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:06.1193387Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T12:15:06.1193953Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T12:15:06.1194318Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1195995Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1196550Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1197594Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1198243Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1199204Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1199903Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1200826Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1201612Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1202224Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1202979Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1203403Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1204303Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1204421Z FAILED [0.7579s] [  0%]
2025-12-04T12:15:06.1204427Z 
2025-12-04T12:15:06.1204577Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.1204907Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.1205083Z Traceback (most recent call last):
2025-12-04T12:15:06.1205493Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1205686Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1206236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1206485Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1207022Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1207216Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1207738Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1207886Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1208428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1208760Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1209284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1209434Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1209932Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1210077Z     return self._compile_to_module()
2025-12-04T12:15:06.1210571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1210736Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1211252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1211449Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1211945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1212192Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1212775Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1212944Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1213463Z   File "/tmp/tmpglafxchy/gt/cgtrdiumaxbfkm3z4iktfl6rcsgrvwz4zy6dlm6ighmkh6qwxqbj.py", line 50, in <module>
2025-12-04T12:15:06.1213929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1214043Z     kernel.precompile(
2025-12-04T12:15:06.1214619Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1214738Z     self._precompile_worker()
2025-12-04T12:15:06.1215345Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1215561Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1216156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1216439Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1216896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1217156Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1217599Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1217976Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1218224Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1218538Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1218628Z ^
2025-12-04T12:15:06.1219102Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1219111Z 
2025-12-04T12:15:06.1219826Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1219833Z 
2025-12-04T12:15:06.1219838Z 
2025-12-04T12:15:06.1220067Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1220712Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.1220720Z 
2025-12-04T12:15:06.1221002Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1221231Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1221336Z frames [('total', 1)]
2025-12-04T12:15:06.1221469Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1221935Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1222160Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1222273Z graph_break []
2025-12-04T12:15:06.1222603Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.1222741Z Traceback (most recent call last):
2025-12-04T12:15:06.1223145Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1223330Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1223832Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1224083Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1224595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1224832Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1225339Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1225499Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1226033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1226355Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1226891Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1227075Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1227570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1227700Z     return self._compile_to_module()
2025-12-04T12:15:06.1228182Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1228361Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1228875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1229006Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1229549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1229784Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1230383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1230512Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1231011Z   File "/tmp/tmpftjp552m/ib/cibvr32quo63oqbwd35beydkb2mcnitnhf3qgzuspp5oa6piso5t.py", line 50, in <module>
2025-12-04T12:15:06.1231490Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1231603Z     kernel.precompile(
2025-12-04T12:15:06.1232174Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1232292Z     self._precompile_worker()
2025-12-04T12:15:06.1232895Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1233087Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1233685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1233882Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1234348Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1234594Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1235050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1235386Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1235615Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1235991Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1236085Z ^
2025-12-04T12:15:06.1236552Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1236558Z 
2025-12-04T12:15:06.1237268Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1237307Z 
2025-12-04T12:15:06.1237312Z 
2025-12-04T12:15:06.1237548Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1238203Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.1238209Z 
2025-12-04T12:15:06.1238482Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1238726Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1238868Z frames [('total', 1)]
2025-12-04T12:15:06.1238985Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1239466Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1239690Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1239809Z graph_break []
2025-12-04T12:15:06.1240034Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1240139Z frames [('total', 1)]
2025-12-04T12:15:06.1240272Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1240493Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1240958Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1241108Z graph_break []
2025-12-04T12:15:06.1241266Z =================================== FAILURES ===================================
2025-12-04T12:15:06.1241613Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.1241741Z Traceback (most recent call last):
2025-12-04T12:15:06.1242151Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1242322Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1242812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1243063Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1243590Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1243787Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1244313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1244465Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1245001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1245337Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1245862Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1246027Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1246509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1246634Z     return self._compile_to_module()
2025-12-04T12:15:06.1247171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1247341Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1247863Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1248011Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1248509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1248789Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1249381Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1249509Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1250033Z   File "/tmp/tmpxr130svn/kh/ckhezzz5lqtipruw74qnrsgwepab2nbounkw3shmzhpipxbsqm3y.py", line 50, in <module>
2025-12-04T12:15:06.1250503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1250658Z     kernel.precompile(
2025-12-04T12:15:06.1251208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1251326Z     self._precompile_worker()
2025-12-04T12:15:06.1251936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1252118Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1252714Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1252924Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1253406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1253670Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1254117Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1254450Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1254687Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1254998Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1255099Z ^
2025-12-04T12:15:06.1255557Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1255562Z 
2025-12-04T12:15:06.1256269Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1256335Z 
2025-12-04T12:15:06.1256348Z 
2025-12-04T12:15:06.1256583Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1257226Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.1257233Z 
2025-12-04T12:15:06.1257514Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1257742Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1257848Z frames [('total', 1)]
2025-12-04T12:15:06.1257980Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1258444Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1258679Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1258781Z graph_break []
2025-12-04T12:15:06.1259049Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1259168Z frames [('total', 1)]
2025-12-04T12:15:06.1259290Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1259509Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1259982Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1260117Z graph_break []
2025-12-04T12:15:06.1260333Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1260452Z frames [('total', 1)]
2025-12-04T12:15:06.1260569Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1260801Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1261257Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1261358Z graph_break []
2025-12-04T12:15:06.1262033Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7a610c26dd7fa0e9.xml -
2025-12-04T12:15:06.1262250Z =========================== short test summary info ============================
2025-12-04T12:15:06.1263055Z FAILED [0.7579s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1263369Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1263460Z ^
2025-12-04T12:15:06.1263930Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1263935Z 
2025-12-04T12:15:06.1264645Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1264683Z 
2025-12-04T12:15:06.1264691Z 
2025-12-04T12:15:06.1264921Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1265567Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.1265573Z 
2025-12-04T12:15:06.1265841Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1266046Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.1266251Z ================== 1 failed, 52 deselected, 2 rerun in 5.38s ===================
2025-12-04T12:15:06.1266369Z Got exit code 1
2025-12-04T12:15:06.1266481Z Retrying single test...
2025-12-04T12:15:06.1266953Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-269f6089cafc9f3b.xml
2025-12-04T12:15:06.1267133Z ============================= test session starts ==============================
2025-12-04T12:15:06.1267488Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.1267602Z cachedir: .pytest_cache
2025-12-04T12:15:06.1268136Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.1268264Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.1268390Z configfile: pytest.ini
2025-12-04T12:15:06.1268976Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.1269200Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:06.1269934Z stepcurrent: skipping 52 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.1270051Z Running 1 items in this shard
2025-12-04T12:15:06.1270058Z 
2025-12-04T12:15:06.1271526Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1272289Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1273495Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1274083Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1274588Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1275042Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1275701Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.1276216Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.1276681Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T12:15:06.1277250Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T12:15:06.1277706Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T12:15:06.1278393Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T12:15:06.1278928Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:06.1279456Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T12:15:06.1280007Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T12:15:06.1280386Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1282075Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1282630Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1283679Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1284326Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1285348Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1286032Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1286939Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1287745Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1288371Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1289132Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1289547Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1290441Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1290577Z ('RERUN', {'yellow': True}) [3.8400s] [100%]
2025-12-04T12:15:06.1291735Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1292516Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1293077Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1293646Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1294158Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1294596Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1295193Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.1295723Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.1296170Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T12:15:06.1296823Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T12:15:06.1297266Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T12:15:06.1297839Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T12:15:06.1298367Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:06.1298894Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T12:15:06.1299495Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T12:15:06.1299867Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1301532Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1302117Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1303169Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1303847Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1304743Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1305447Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1306361Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1307150Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1307765Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1308527Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1308914Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1309812Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1309963Z ('RERUN', {'yellow': True}) [0.7598s] [100%]
2025-12-04T12:15:06.1311114Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1311877Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1312428Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1312993Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1313542Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1313982Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1314595Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.1315139Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.1315585Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T12:15:06.1316166Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T12:15:06.1316612Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T12:15:06.1317193Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T12:15:06.1317738Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:06.1318266Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T12:15:06.1318831Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T12:15:06.1319199Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1320914Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1321459Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1322513Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1323143Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1324058Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1324744Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1325639Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1326439Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1327047Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1327847Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1328223Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1329134Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1329276Z FAILED [0.7564s] [100%]
2025-12-04T12:15:06.1329283Z 
2025-12-04T12:15:06.1329428Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.1329774Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.1329901Z Traceback (most recent call last):
2025-12-04T12:15:06.1330315Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1330479Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1331020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1331286Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1331799Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1331995Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1332521Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1332672Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1333222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1333579Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1334105Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1334267Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1334751Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1334893Z     return self._compile_to_module()
2025-12-04T12:15:06.1335380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1335548Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1336082Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1336216Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1336787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1337037Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1337624Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1337767Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1338264Z   File "/tmp/tmpb725rz13/si/csiwejudcmit4ta34fze6lhmqx34pvu2t2ebnx7kjnxkwk6ejs4h.py", line 50, in <module>
2025-12-04T12:15:06.1338727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1338855Z     kernel.precompile(
2025-12-04T12:15:06.1339412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1339546Z     self._precompile_worker()
2025-12-04T12:15:06.1340187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1340372Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1340978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1341210Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1341665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1341927Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1342372Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1342724Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1342957Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1343299Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1343406Z ^
2025-12-04T12:15:06.1343863Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1343868Z 
2025-12-04T12:15:06.1344595Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1344601Z 
2025-12-04T12:15:06.1344606Z 
2025-12-04T12:15:06.1344824Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1345466Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.1345487Z 
2025-12-04T12:15:06.1345792Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1346018Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1346140Z frames [('total', 1)]
2025-12-04T12:15:06.1346257Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1346722Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1346962Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1347068Z graph_break []
2025-12-04T12:15:06.1347397Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.1347533Z Traceback (most recent call last):
2025-12-04T12:15:06.1347940Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1348101Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1348594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1348845Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1349375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1349569Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1350096Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1350244Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1350778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1351111Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1351667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1351822Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1352319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1352441Z     return self._compile_to_module()
2025-12-04T12:15:06.1352933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1353128Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1353641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1353785Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1354283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1354530Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1355114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1355275Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1355790Z   File "/tmp/tmpahkop17p/mn/cmn6d6w7uw7bkz7axfb2xh6lp7nwiymocv7nqm6qjcqriqzuldhl.py", line 50, in <module>
2025-12-04T12:15:06.1356258Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1356370Z     kernel.precompile(
2025-12-04T12:15:06.1356934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1357052Z     self._precompile_worker()
2025-12-04T12:15:06.1357689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1357874Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1358473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1358685Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1359137Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1359398Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1359841Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1360177Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1360417Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1360732Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1360822Z ^
2025-12-04T12:15:06.1361291Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1361299Z 
2025-12-04T12:15:06.1362009Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1362018Z 
2025-12-04T12:15:06.1362023Z 
2025-12-04T12:15:06.1362253Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1362893Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.1362899Z 
2025-12-04T12:15:06.1363181Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1363403Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1363540Z frames [('total', 1)]
2025-12-04T12:15:06.1363672Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1364137Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1364358Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1364471Z graph_break []
2025-12-04T12:15:06.1364720Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1364836Z frames [('total', 1)]
2025-12-04T12:15:06.1364954Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1365171Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1365642Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1365745Z graph_break []
2025-12-04T12:15:06.1365898Z =================================== FAILURES ===================================
2025-12-04T12:15:06.1366239Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.1366393Z Traceback (most recent call last):
2025-12-04T12:15:06.1366812Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1366960Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1367449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1367710Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1368219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1368413Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1368968Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1369115Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1369660Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1369980Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1370500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1370661Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1371328Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1371466Z     return self._compile_to_module()
2025-12-04T12:15:06.1371955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1372127Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1372659Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1372791Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1373289Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1373540Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1374122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1374263Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1374770Z   File "/tmp/tmpc1xvcb6p/mg/cmgvvw2kxh7ublbprutm5gnt3ve5cmrleevfnk4i44xwqujttayy.py", line 50, in <module>
2025-12-04T12:15:06.1375305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1375431Z     kernel.precompile(
2025-12-04T12:15:06.1375985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1376120Z     self._precompile_worker()
2025-12-04T12:15:06.1376778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1377027Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1377632Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1377831Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1378291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1378543Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1378985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1379376Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1379603Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1379919Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1380026Z ^
2025-12-04T12:15:06.1380481Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1380487Z 
2025-12-04T12:15:06.1381211Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1381217Z 
2025-12-04T12:15:06.1381222Z 
2025-12-04T12:15:06.1381485Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1382126Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.1382149Z 
2025-12-04T12:15:06.1382420Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1382639Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1382760Z frames [('total', 1)]
2025-12-04T12:15:06.1382877Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1383347Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1383581Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1383683Z graph_break []
2025-12-04T12:15:06.1383916Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1384026Z frames [('total', 1)]
2025-12-04T12:15:06.1384142Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1384374Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1384833Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1384932Z graph_break []
2025-12-04T12:15:06.1385166Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1385270Z frames [('total', 1)]
2025-12-04T12:15:06.1385385Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1385618Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1386074Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1386186Z graph_break []
2025-12-04T12:15:06.1386873Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-269f6089cafc9f3b.xml -
2025-12-04T12:15:06.1387049Z =========================== short test summary info ============================
2025-12-04T12:15:06.1387875Z FAILED [0.7564s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1388186Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1388322Z ^
2025-12-04T12:15:06.1388780Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1388785Z 
2025-12-04T12:15:06.1389492Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1389498Z 
2025-12-04T12:15:06.1389517Z 
2025-12-04T12:15:06.1389742Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1390382Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.1390418Z 
2025-12-04T12:15:06.1390703Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1390886Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.1391091Z ================== 1 failed, 187 deselected, 2 rerun in 5.40s ==================
2025-12-04T12:15:06.1391208Z Got exit code 1
2025-12-04T12:15:06.1391319Z Retrying single test...
2025-12-04T12:15:06.1391805Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f11fe18ee197cc1f.xml
2025-12-04T12:15:06.1391972Z ============================= test session starts ==============================
2025-12-04T12:15:06.1392361Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.1392489Z cachedir: .pytest_cache
2025-12-04T12:15:06.1393014Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.1393142Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.1393267Z configfile: pytest.ini
2025-12-04T12:15:06.1393889Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.1394205Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:06.1394978Z stepcurrent: skipping 52 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.1395099Z Running 1 items in this shard
2025-12-04T12:15:06.1395105Z 
2025-12-04T12:15:06.1396284Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1397049Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1397618Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1398193Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1398711Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1399200Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1399801Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.1400336Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.1400820Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T12:15:06.1401405Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T12:15:06.1401849Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T12:15:06.1402422Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T12:15:06.1402952Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:06.1403515Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T12:15:06.1404077Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T12:15:06.1404449Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1406188Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1406734Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1407778Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1408426Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1409323Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1410023Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1410906Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1411694Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1412305Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1413061Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1413480Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1414382Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1414572Z ('RERUN', {'yellow': True}) [3.8573s] [100%]
2025-12-04T12:15:06.1415721Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1416556Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1417110Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1417717Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1418231Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1418671Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1419282Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.1419797Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.1420287Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T12:15:06.1420869Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T12:15:06.1421317Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T12:15:06.1421901Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T12:15:06.1422414Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:06.1422946Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T12:15:06.1423517Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T12:15:06.1423884Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1425570Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1426114Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1427236Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1427871Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1428779Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1429495Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1430380Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1431171Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1431824Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1432591Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1432965Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1433872Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1434046Z ('RERUN', {'yellow': True}) [0.7700s] [100%]
2025-12-04T12:15:06.1435212Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1436107Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1436662Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1437240Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1437741Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1438195Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1438798Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.1439314Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T12:15:06.1439779Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T12:15:06.1440346Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T12:15:06.1440799Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T12:15:06.1441434Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T12:15:06.1441955Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T12:15:06.1442500Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T12:15:06.1443087Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T12:15:06.1443464Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1445143Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1445732Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1446785Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1447417Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1448377Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1449066Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1449975Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1450747Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1451368Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1452129Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1452504Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1453411Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1453517Z FAILED [0.7644s] [100%]
2025-12-04T12:15:06.1453524Z 
2025-12-04T12:15:06.1453682Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.1454010Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.1454135Z Traceback (most recent call last):
2025-12-04T12:15:06.1454556Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1454767Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1455271Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1455523Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1456038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1456346Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1456875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1457024Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1457573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1457897Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1458430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1458626Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1459109Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1459252Z     return self._compile_to_module()
2025-12-04T12:15:06.1459737Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1459917Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1460431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1460562Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1461104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1461339Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1461940Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1462068Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1462570Z   File "/tmp/tmp0p6vosii/ow/cowyk5fa37ssybq6rme34kpjpmxcejuynwrnyi3tx2boipbg6oge.py", line 50, in <module>
2025-12-04T12:15:06.1463046Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1463159Z     kernel.precompile(
2025-12-04T12:15:06.1463711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1463841Z     self._precompile_worker()
2025-12-04T12:15:06.1464441Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1464635Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1465231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1465429Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1465892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1466136Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1466592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1466928Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1467157Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1467516Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1467612Z ^
2025-12-04T12:15:06.1468072Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1468078Z 
2025-12-04T12:15:06.1468800Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1468841Z 
2025-12-04T12:15:06.1468846Z 
2025-12-04T12:15:06.1469065Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1469719Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.1469726Z 
2025-12-04T12:15:06.1470001Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1470240Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1470384Z frames [('total', 1)]
2025-12-04T12:15:06.1470503Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1471166Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1471392Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1471496Z graph_break []
2025-12-04T12:15:06.1471842Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.1471968Z Traceback (most recent call last):
2025-12-04T12:15:06.1472388Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1472538Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1473136Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1473402Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1473919Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1474114Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1479466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1479648Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1480215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1480538Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1481069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1481239Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1481722Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1481856Z     return self._compile_to_module()
2025-12-04T12:15:06.1482355Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1482523Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1483055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1483186Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1483682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1483933Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1484654Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1484803Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1485308Z   File "/tmp/tmpss8tq8h7/me/cmetmdfmdndlnpbb2a36fujjsxkodhwtv2hita3jt4dzvoaayfeu.py", line 50, in <module>
2025-12-04T12:15:06.1485772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1485949Z     kernel.precompile(
2025-12-04T12:15:06.1486506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1486626Z     self._precompile_worker()
2025-12-04T12:15:06.1487239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1487424Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1488040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1488288Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1488739Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1489007Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1489449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1489797Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1490028Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1490337Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1490578Z ^
2025-12-04T12:15:06.1491042Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1491051Z 
2025-12-04T12:15:06.1491763Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1491783Z 
2025-12-04T12:15:06.1491790Z 
2025-12-04T12:15:06.1492007Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1492651Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.1492658Z 
2025-12-04T12:15:06.1492942Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1493166Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1493293Z frames [('total', 1)]
2025-12-04T12:15:06.1493417Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1493885Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1494128Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1494233Z graph_break []
2025-12-04T12:15:06.1494456Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1494580Z frames [('total', 1)]
2025-12-04T12:15:06.1494693Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1494913Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1495382Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1495481Z graph_break []
2025-12-04T12:15:06.1495639Z =================================== FAILURES ===================================
2025-12-04T12:15:06.1496005Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.1496134Z Traceback (most recent call last):
2025-12-04T12:15:06.1496677Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1496827Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1497333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1497621Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1498131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1498341Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1498848Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1499003Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1499547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1499902Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1500440Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1500593Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1501074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1501207Z     return self._compile_to_module()
2025-12-04T12:15:06.1501690Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1501864Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1502415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1502551Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1503062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1503296Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1503882Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1504022Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1504526Z   File "/tmp/tmpfuj1hu4w/qv/cqvj6nhtmpiaob3lx7ddo35lewveggcp7cpbeulpxkm5kn4exrbs.py", line 50, in <module>
2025-12-04T12:15:06.1505000Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1505116Z     kernel.precompile(
2025-12-04T12:15:06.1505670Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1505799Z     self._precompile_worker()
2025-12-04T12:15:06.1506393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1506583Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1507173Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1507371Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1507835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1508081Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1508565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1508916Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1509142Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1509465Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1509587Z ^
2025-12-04T12:15:06.1510047Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1510053Z 
2025-12-04T12:15:06.1510780Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1510786Z 
2025-12-04T12:15:06.1510791Z 
2025-12-04T12:15:06.1511010Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1511668Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.1511706Z 
2025-12-04T12:15:06.1511977Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1512213Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1512322Z frames [('total', 1)]
2025-12-04T12:15:06.1512439Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1512913Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1513136Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1513238Z graph_break []
2025-12-04T12:15:06.1513471Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1513575Z frames [('total', 1)]
2025-12-04T12:15:06.1513726Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1513956Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1514417Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1514533Z graph_break []
2025-12-04T12:15:06.1514750Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1514856Z frames [('total', 1)]
2025-12-04T12:15:06.1514984Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1515204Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1515661Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1515771Z graph_break []
2025-12-04T12:15:06.1516426Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f11fe18ee197cc1f.xml -
2025-12-04T12:15:06.1516614Z =========================== short test summary info ============================
2025-12-04T12:15:06.1517411Z FAILED [0.7644s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1517720Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1517826Z ^
2025-12-04T12:15:06.1518284Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1518290Z 
2025-12-04T12:15:06.1519008Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1519014Z 
2025-12-04T12:15:06.1519018Z 
2025-12-04T12:15:06.1519236Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1519912Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.1519933Z 
2025-12-04T12:15:06.1520200Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1520379Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.1520629Z ================== 1 failed, 187 deselected, 2 rerun in 5.43s ==================
2025-12-04T12:15:06.1520729Z Got exit code 1
2025-12-04T12:15:06.1521284Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.1521707Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:06.1522179Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0b8acd36d7258295.xml
2025-12-04T12:15:06.1522363Z ============================= test session starts ==============================
2025-12-04T12:15:06.1522713Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.1522884Z cachedir: .pytest_cache
2025-12-04T12:15:06.1523415Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.1523543Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.1523653Z configfile: pytest.ini
2025-12-04T12:15:06.1524256Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.1524482Z collecting ... collected 188 items / 53 deselected / 135 selected
2025-12-04T12:15:06.1524639Z stepcurrent: skipping 53 already run items.
2025-12-04T12:15:06.1524756Z Running 135 items in this shard
2025-12-04T12:15:06.1524762Z 
2025-12-04T12:15:06.1525305Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_16,16,16_cuda PASSED [4.1125s] [  0%]
2025-12-04T12:15:06.1525835Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.8458s] [  1%]
2025-12-04T12:15:06.1526966Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1527738Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1528283Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1528869Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1529369Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1529810Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1530370Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.1530819Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T12:15:06.1531400Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T12:15:06.1531885Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T12:15:06.1532452Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T12:15:06.1533000Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T12:15:06.1533548Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T12:15:06.1533961Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1535640Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1536232Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1537363Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1538000Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1538926Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1539808Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1540753Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1541532Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1542152Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1542911Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1543283Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1544197Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1544334Z ('RERUN', {'yellow': True}) [0.5372s] [  2%]
2025-12-04T12:15:06.1545573Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1546326Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1546931Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1547502Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1548001Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1548485Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1549029Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.1549493Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T12:15:06.1550070Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T12:15:06.1550546Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T12:15:06.1551127Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T12:15:06.1551654Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T12:15:06.1552214Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T12:15:06.1552582Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1554300Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1554843Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1555898Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1556538Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1557441Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1558147Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1559043Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1559836Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1560486Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1561265Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1561641Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1562543Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1562722Z ('RERUN', {'yellow': True}) [0.7003s] [  2%]
2025-12-04T12:15:06.1563233Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda FAILED [0.9874s] [  2%]
2025-12-04T12:15:06.1563239Z 
2025-12-04T12:15:06.1563399Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.1563721Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T12:15:06.1563848Z Traceback (most recent call last):
2025-12-04T12:15:06.1564302Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1564455Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1564948Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1565222Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1565739Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1565950Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1566466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1566650Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1567201Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1567526Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1568065Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1568214Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1568697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1568835Z     return self._compile_to_module()
2025-12-04T12:15:06.1569319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1569489Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1570022Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1570157Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1570669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1570901Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1571686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1571831Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1572334Z   File "/tmp/tmpuzy8y7ce/b6/cb64tokfvetyct5qwmegpfoqpi7vhmuw3rfzaf7ogpqiwu6ucwh7.py", line 48, in <module>
2025-12-04T12:15:06.1572807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1572923Z     kernel.precompile(
2025-12-04T12:15:06.1573553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1573690Z     self._precompile_worker()
2025-12-04T12:15:06.1574288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1574532Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1575142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1575340Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1575803Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1576047Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1576561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1576972Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1577197Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1577519Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1577611Z ^
2025-12-04T12:15:06.1578070Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1578076Z 
2025-12-04T12:15:06.1578803Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1578810Z 
2025-12-04T12:15:06.1578815Z 
2025-12-04T12:15:06.1579031Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1579726Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1579736Z 
2025-12-04T12:15:06.1580003Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1580229Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1580349Z frames [('total', 1)]
2025-12-04T12:15:06.1580466Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1580702Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1580940Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1581040Z graph_break []
2025-12-04T12:15:06.1581371Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T12:15:06.1581494Z Traceback (most recent call last):
2025-12-04T12:15:06.1581906Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1582075Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1582572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1582835Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1583348Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1583543Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1584069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1584216Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1584752Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1585120Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1585644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1585806Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1586286Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1586444Z     return self._compile_to_module()
2025-12-04T12:15:06.1586940Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1587104Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1587634Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1587764Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1588264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1588541Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1589128Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1589258Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1589777Z   File "/tmp/tmp8oergiu0/yr/cyrbzp3fehkavilzeoot5t43mmzhncemqmlaqmax2sbvm4b5ddeb.py", line 48, in <module>
2025-12-04T12:15:06.1590240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1590365Z     kernel.precompile(
2025-12-04T12:15:06.1590918Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1591067Z     self._precompile_worker()
2025-12-04T12:15:06.1591680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1591864Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1592473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1592674Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1593128Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1593388Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1593831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1594168Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1594410Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1594723Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1594826Z ^
2025-12-04T12:15:06.1595283Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1595288Z 
2025-12-04T12:15:06.1596000Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1596018Z 
2025-12-04T12:15:06.1596023Z 
2025-12-04T12:15:06.1596239Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1596872Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1596878Z 
2025-12-04T12:15:06.1597197Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1597422Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1597548Z frames [('total', 1)]
2025-12-04T12:15:06.1597668Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1597893Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1598141Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1598271Z graph_break []
2025-12-04T12:15:06.1598493Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1598610Z frames [('total', 1)]
2025-12-04T12:15:06.1598727Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1598945Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1599192Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1599292Z graph_break []
2025-12-04T12:15:06.1599454Z =================================== FAILURES ===================================
2025-12-04T12:15:06.1599768Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T12:15:06.1599925Z Traceback (most recent call last):
2025-12-04T12:15:06.1600350Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1600500Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1600992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1601255Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1601765Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1601972Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1602512Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1602663Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1603216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1603536Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1604068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1604215Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1604694Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1604829Z     return self._compile_to_module()
2025-12-04T12:15:06.1605317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1605484Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1606015Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1606144Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1606651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1606884Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1607470Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1607609Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1608082Z   File "/tmp/tmp_obofr6o/o7/co7thpiwcdsglmsh4mbxk6fqgabth7jbut74r3wqihix6g3mj5xi.py", line 80, in <module>
2025-12-04T12:15:06.1608586Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T12:15:06.1608702Z     self._wait_futures(scope)
2025-12-04T12:15:06.1609199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T12:15:06.1609327Z     kernel = result.result()
2025-12-04T12:15:06.1609768Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T12:15:06.1609915Z     return self.result_fn()
2025-12-04T12:15:06.1610410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T12:15:06.1610542Z     raise e.with_name(kernel_name) from e
2025-12-04T12:15:06.1610939Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T12:15:06.1610945Z 
2025-12-04T12:15:06.1611081Z Name=triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1611207Z Traceback (most recent call last):
2025-12-04T12:15:06.1611759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T12:15:06.1611893Z     result = job()
2025-12-04T12:15:06.1612487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T12:15:06.1612644Z     kernel.precompile(warm_cache_only=True)
2025-12-04T12:15:06.1613201Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T12:15:06.1613331Z     self._precompile_worker()
2025-12-04T12:15:06.1613928Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1614109Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1614755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1614954Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1615420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1615666Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1616113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1616540Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1616724Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1617074Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1617178Z ^
2025-12-04T12:15:06.1617641Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1617647Z 
2025-12-04T12:15:06.1617652Z 
2025-12-04T12:15:06.1618380Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1618386Z 
2025-12-04T12:15:06.1618390Z 
2025-12-04T12:15:06.1618610Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1619255Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1619261Z 
2025-12-04T12:15:06.1619532Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1619757Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1619891Z frames [('total', 1)]
2025-12-04T12:15:06.1620008Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1620293Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1620545Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1620651Z graph_break []
2025-12-04T12:15:06.1620891Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1620999Z frames [('total', 1)]
2025-12-04T12:15:06.1621116Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1621396Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1621634Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1621736Z graph_break []
2025-12-04T12:15:06.1621970Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1622076Z frames [('total', 1)]
2025-12-04T12:15:06.1622206Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1622427Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1622792Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T12:15:06.1622939Z graph_break []
2025-12-04T12:15:06.1623594Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0b8acd36d7258295.xml -
2025-12-04T12:15:06.1623771Z =========================== short test summary info ============================
2025-12-04T12:15:06.1624734Z FAILED [0.9874s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T12:15:06.1624741Z 
2025-12-04T12:15:06.1624876Z Name=triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1625021Z Traceback (most recent call last):
2025-12-04T12:15:06.1625572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T12:15:06.1625714Z     result = job()
2025-12-04T12:15:06.1626331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T12:15:06.1626479Z     kernel.precompile(warm_cache_only=True)
2025-12-04T12:15:06.1627052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T12:15:06.1627174Z     self._precompile_worker()
2025-12-04T12:15:06.1627770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1627966Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1628561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1628765Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1629236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1629483Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1629944Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1630285Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1630473Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1630801Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1630894Z ^
2025-12-04T12:15:06.1631367Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1631373Z 
2025-12-04T12:15:06.1631377Z 
2025-12-04T12:15:06.1632138Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1632145Z 
2025-12-04T12:15:06.1632152Z 
2025-12-04T12:15:06.1632378Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1633025Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1633067Z 
2025-12-04T12:15:06.1633339Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1633536Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.1633752Z ============= 1 failed, 2 passed, 53 deselected, 2 rerun in 7.23s ==============
2025-12-04T12:15:06.1633853Z Got exit code 1
2025-12-04T12:15:06.1633973Z Retrying single test...
2025-12-04T12:15:06.1634453Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-babe12520ea62fea.xml
2025-12-04T12:15:06.1634634Z ============================= test session starts ==============================
2025-12-04T12:15:06.1635026Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.1635136Z cachedir: .pytest_cache
2025-12-04T12:15:06.1635674Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.1635802Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.1635911Z configfile: pytest.ini
2025-12-04T12:15:06.1636520Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.1636743Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:06.1637496Z stepcurrent: skipping 55 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1637617Z Running 1 items in this shard
2025-12-04T12:15:06.1637622Z 
2025-12-04T12:15:06.1638753Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1639527Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1640078Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1640655Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1641162Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1641616Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1642163Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.1642621Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T12:15:06.1643208Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T12:15:06.1643652Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T12:15:06.1644274Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T12:15:06.1644810Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T12:15:06.1645368Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T12:15:06.1645747Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1647458Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1648016Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1649104Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1649748Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1650640Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1651371Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1652271Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1653044Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1653668Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1654422Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1654810Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1655705Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1655854Z ('RERUN', {'yellow': True}) [3.9235s] [100%]
2025-12-04T12:15:06.1657099Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1657854Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1658420Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1659040Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1659564Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1660004Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1660585Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.1661047Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T12:15:06.1661617Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T12:15:06.1662074Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T12:15:06.1662647Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T12:15:06.1663210Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T12:15:06.1663773Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T12:15:06.1664143Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1665874Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1666412Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1667481Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1668113Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1669024Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1669706Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1670589Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1671597Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1672209Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1673087Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1673462Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1674378Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1674562Z ('RERUN', {'yellow': True}) [0.6941s] [100%]
2025-12-04T12:15:06.1675693Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1676463Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1677011Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1677644Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1678141Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1678582Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1679149Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.1679594Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T12:15:06.1680228Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T12:15:06.1680675Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T12:15:06.1681242Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T12:15:06.1681791Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T12:15:06.1682339Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T12:15:06.1682721Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1684382Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1684936Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1685978Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1686626Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1687553Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1688241Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1689172Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1689946Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1690571Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1691329Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1691828Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1692723Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1692827Z FAILED [0.6842s] [100%]
2025-12-04T12:15:06.1692833Z 
2025-12-04T12:15:06.1692997Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.1693315Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T12:15:06.1693511Z Traceback (most recent call last):
2025-12-04T12:15:06.1693926Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1694078Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1694582Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1694833Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1695348Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1695557Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1696067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1696230Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1696841Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1697163Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1697700Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1697849Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1698346Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1698469Z     return self._compile_to_module()
2025-12-04T12:15:06.1698953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1699133Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1699652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1699831Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1700341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1700577Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1701176Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1701335Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1701827Z   File "/tmp/tmp355j3ia6/6q/c6q5lrws2s2lfh7tb3zi5gdm5ownwn5kgk366qipxg6w45ivw4bz.py", line 48, in <module>
2025-12-04T12:15:06.1702303Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1702415Z     kernel.precompile(
2025-12-04T12:15:06.1702986Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1703105Z     self._precompile_worker()
2025-12-04T12:15:06.1703726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1703917Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1704508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1704711Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1705176Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1705423Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1705878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1706245Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1706476Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1706805Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1706895Z ^
2025-12-04T12:15:06.1707364Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1707373Z 
2025-12-04T12:15:06.1708083Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1708089Z 
2025-12-04T12:15:06.1708094Z 
2025-12-04T12:15:06.1708314Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1708964Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1708970Z 
2025-12-04T12:15:06.1709240Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1709485Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1709593Z frames [('total', 1)]
2025-12-04T12:15:06.1709712Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1709970Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1710192Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1710309Z graph_break []
2025-12-04T12:15:06.1710624Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T12:15:06.1710752Z Traceback (most recent call last):
2025-12-04T12:15:06.1711171Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1711323Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1711848Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1712118Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1712634Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1712870Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1713379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1713528Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1714077Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1714400Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1714941Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1715146Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1715791Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1715964Z     return self._compile_to_module()
2025-12-04T12:15:06.1716456Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1716626Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1717160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1717294Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1717854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1718092Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1718677Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1718821Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1719320Z   File "/tmp/tmpdgvtg6yp/ex/cex2xpd4z4rp3zd4aewrxwc5fgvspiaio2kha6m4gsaedqqxoju7.py", line 48, in <module>
2025-12-04T12:15:06.1719801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1719917Z     kernel.precompile(
2025-12-04T12:15:06.1720478Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1720612Z     self._precompile_worker()
2025-12-04T12:15:06.1721214Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1721395Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1722010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1722211Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1722680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1722926Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1723370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1723723Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1723951Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1724307Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1724402Z ^
2025-12-04T12:15:06.1724861Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1724867Z 
2025-12-04T12:15:06.1725587Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1725631Z 
2025-12-04T12:15:06.1725636Z 
2025-12-04T12:15:06.1725854Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1726497Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1726503Z 
2025-12-04T12:15:06.1726771Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1727000Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1727118Z frames [('total', 1)]
2025-12-04T12:15:06.1727270Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1727508Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1727741Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1727841Z graph_break []
2025-12-04T12:15:06.1728076Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1728180Z frames [('total', 1)]
2025-12-04T12:15:06.1728294Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1728523Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1728761Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1728860Z graph_break []
2025-12-04T12:15:06.1729021Z =================================== FAILURES ===================================
2025-12-04T12:15:06.1729398Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T12:15:06.1729536Z Traceback (most recent call last):
2025-12-04T12:15:06.1729946Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1730095Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1730601Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1730853Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1731370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1731577Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1732082Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1732246Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1732779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1733101Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1733634Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1733787Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1734281Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1734406Z     return self._compile_to_module()
2025-12-04T12:15:06.1734892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1735070Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1735649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1735782Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1736371Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1736609Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1737256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1737387Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1737892Z   File "/tmp/tmpb9r7p03z/id/cidalxryarvebqahbawkdsf3vyl2c3mm6ckmtgw6gi7o7txg6djg.py", line 48, in <module>
2025-12-04T12:15:06.1738372Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1738489Z     kernel.precompile(
2025-12-04T12:15:06.1739061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1739220Z     self._precompile_worker()
2025-12-04T12:15:06.1739822Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1740020Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1740624Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1740824Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1741291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1741536Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1742039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1742376Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1742605Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1742929Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1743022Z ^
2025-12-04T12:15:06.1743492Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1743497Z 
2025-12-04T12:15:06.1744207Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1744214Z 
2025-12-04T12:15:06.1744218Z 
2025-12-04T12:15:06.1744433Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1745080Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1745088Z 
2025-12-04T12:15:06.1745361Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1745594Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1745701Z frames [('total', 1)]
2025-12-04T12:15:06.1745818Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1746072Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1746294Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1746393Z graph_break []
2025-12-04T12:15:06.1746627Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1746730Z frames [('total', 1)]
2025-12-04T12:15:06.1746857Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1747121Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1747358Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1747474Z graph_break []
2025-12-04T12:15:06.1747690Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1747793Z frames [('total', 1)]
2025-12-04T12:15:06.1747922Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1748174Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1748419Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1748519Z graph_break []
2025-12-04T12:15:06.1749171Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-babe12520ea62fea.xml -
2025-12-04T12:15:06.1749359Z =========================== short test summary info ============================
2025-12-04T12:15:06.1750146Z FAILED [0.6842s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1750494Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1750599Z ^
2025-12-04T12:15:06.1751055Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1751063Z 
2025-12-04T12:15:06.1751782Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1751788Z 
2025-12-04T12:15:06.1751794Z 
2025-12-04T12:15:06.1752011Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1752655Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1752660Z 
2025-12-04T12:15:06.1752962Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1753144Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.1753362Z ================== 1 failed, 187 deselected, 2 rerun in 5.35s ==================
2025-12-04T12:15:06.1753463Z Got exit code 1
2025-12-04T12:15:06.1753573Z Retrying single test...
2025-12-04T12:15:06.1754062Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-08a6bb29b776e6ca.xml
2025-12-04T12:15:06.1754231Z ============================= test session starts ==============================
2025-12-04T12:15:06.1754595Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.1754707Z cachedir: .pytest_cache
2025-12-04T12:15:06.1755226Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.1755371Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.1755481Z configfile: pytest.ini
2025-12-04T12:15:06.1756074Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.1756310Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:06.1757158Z stepcurrent: skipping 55 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1757299Z Running 1 items in this shard
2025-12-04T12:15:06.1757304Z 
2025-12-04T12:15:06.1758441Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1759282Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1759838Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1760408Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1760977Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1761418Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1761981Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.1762434Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T12:15:06.1763038Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T12:15:06.1763497Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T12:15:06.1764067Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T12:15:06.1764607Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T12:15:06.1765159Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T12:15:06.1765562Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1767246Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1767790Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1768866Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1769500Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1770417Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1771338Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1772245Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1773107Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1773721Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1774497Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1774921Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1775838Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1775977Z ('RERUN', {'yellow': True}) [3.9491s] [100%]
2025-12-04T12:15:06.1777199Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1778016Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1778567Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1779158Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1779672Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1780175Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1780725Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.1781176Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T12:15:06.1781763Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T12:15:06.1782211Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T12:15:06.1782799Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T12:15:06.1783329Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T12:15:06.1783900Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T12:15:06.1784272Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1785947Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1786506Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1787600Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1788260Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1789160Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1789901Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1790791Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1791581Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1792232Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1792981Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1793372Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1794309Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1794463Z ('RERUN', {'yellow': True}) [0.7026s] [100%]
2025-12-04T12:15:06.1795590Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1796354Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1796929Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1797496Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1798016Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1798458Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1799025Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.1799478Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T12:15:06.1800047Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T12:15:06.1800508Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T12:15:06.1801081Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T12:15:06.1801667Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T12:15:06.1802224Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T12:15:06.1802587Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1804319Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1804867Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1805951Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1806590Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1807496Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1808216Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1809113Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1809887Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1810494Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1811259Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1811628Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1812536Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1812643Z FAILED [0.6908s] [100%]
2025-12-04T12:15:06.1812649Z 
2025-12-04T12:15:06.1812794Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.1813125Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T12:15:06.1813251Z Traceback (most recent call last):
2025-12-04T12:15:06.1813673Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1813822Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1814312Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1814583Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1815150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1815360Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1815873Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1816073Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1816687Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1817012Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1817532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1817698Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1818184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1818359Z     return self._compile_to_module()
2025-12-04T12:15:06.1818844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1819009Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1819546Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1819678Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1820188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1820420Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1821041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1821183Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1821693Z   File "/tmp/tmpiqzjxtdx/qw/cqwl6dvzf6hiuzyhu4ecf3hfxlro2wctkaasadwz2vxywthnrjvn.py", line 48, in <module>
2025-12-04T12:15:06.1822156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1822284Z     kernel.precompile(
2025-12-04T12:15:06.1822838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1822970Z     self._precompile_worker()
2025-12-04T12:15:06.1823561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1823742Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1824352Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1824550Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1825014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1825261Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1825708Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1826056Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1826285Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1826594Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1826697Z ^
2025-12-04T12:15:06.1827194Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1827201Z 
2025-12-04T12:15:06.1827934Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1827942Z 
2025-12-04T12:15:06.1827947Z 
2025-12-04T12:15:06.1828165Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1828860Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1828865Z 
2025-12-04T12:15:06.1829137Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1829361Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1829480Z frames [('total', 1)]
2025-12-04T12:15:06.1829598Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1829841Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1830075Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1830207Z graph_break []
2025-12-04T12:15:06.1830537Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T12:15:06.1830661Z Traceback (most recent call last):
2025-12-04T12:15:06.1831071Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1831234Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1831721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1831970Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1832495Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1832731Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1833255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1833404Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1833937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1834274Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1834795Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1834956Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1835433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1835556Z     return self._compile_to_module()
2025-12-04T12:15:06.1836058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1836222Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1836735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1836878Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1837378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1837623Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1838212Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1838338Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1838858Z   File "/tmp/tmp68hil5v3/x6/cx6n6eh7bdyvggwbtpgtufw6pak3eeepotow2x7id2elvlgd24by.py", line 48, in <module>
2025-12-04T12:15:06.1839361Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1839489Z     kernel.precompile(
2025-12-04T12:15:06.1840042Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1840160Z     self._precompile_worker()
2025-12-04T12:15:06.1840801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1840978Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1841573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1841783Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1842236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1842495Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1842975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1843309Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1843550Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1843859Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1843963Z ^
2025-12-04T12:15:06.1844420Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1844426Z 
2025-12-04T12:15:06.1845168Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1845176Z 
2025-12-04T12:15:06.1845180Z 
2025-12-04T12:15:06.1845413Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1846044Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1846050Z 
2025-12-04T12:15:06.1846333Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1846559Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1846668Z frames [('total', 1)]
2025-12-04T12:15:06.1846803Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1847042Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1847278Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1847387Z graph_break []
2025-12-04T12:15:06.1847613Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1847731Z frames [('total', 1)]
2025-12-04T12:15:06.1847850Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1848068Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1848321Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1848423Z graph_break []
2025-12-04T12:15:06.1848573Z =================================== FAILURES ===================================
2025-12-04T12:15:06.1848903Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T12:15:06.1849026Z Traceback (most recent call last):
2025-12-04T12:15:06.1849444Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1849596Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1850090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1850386Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1850904Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1851097Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1851623Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1851806Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1852353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1852674Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1853195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1853360Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1853840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1854017Z     return self._compile_to_module()
2025-12-04T12:15:06.1854502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1854669Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1855202Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1855333Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1855829Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1856074Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1856800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1856953Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1857465Z   File "/tmp/tmpoj5idkhc/cp/ccph6a4leq7aciraybmukyegynqx6xxwbjinolcqutbbvelo3x5e.py", line 48, in <module>
2025-12-04T12:15:06.1857930Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1858064Z     kernel.precompile(
2025-12-04T12:15:06.1858621Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1858754Z     self._precompile_worker()
2025-12-04T12:15:06.1859538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1859725Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1860337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1860539Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1861007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1861258Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1861707Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1862058Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1862286Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1862596Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1862706Z ^
2025-12-04T12:15:06.1863215Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1863225Z 
2025-12-04T12:15:06.1863952Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1863958Z 
2025-12-04T12:15:06.1863992Z 
2025-12-04T12:15:06.1864213Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1864843Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1864863Z 
2025-12-04T12:15:06.1865136Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1865360Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1865482Z frames [('total', 1)]
2025-12-04T12:15:06.1865605Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1865843Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1866126Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1866228Z graph_break []
2025-12-04T12:15:06.1866464Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1866569Z frames [('total', 1)]
2025-12-04T12:15:06.1866691Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1866922Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1867157Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1867258Z graph_break []
2025-12-04T12:15:06.1867483Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1867587Z frames [('total', 1)]
2025-12-04T12:15:06.1867704Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1867969Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1868202Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1868318Z graph_break []
2025-12-04T12:15:06.1868964Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-08a6bb29b776e6ca.xml -
2025-12-04T12:15:06.1869137Z =========================== short test summary info ============================
2025-12-04T12:15:06.1869927Z FAILED [0.6908s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1870238Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1870328Z ^
2025-12-04T12:15:06.1870800Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1870806Z 
2025-12-04T12:15:06.1871748Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1871758Z 
2025-12-04T12:15:06.1871763Z 
2025-12-04T12:15:06.1871997Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1872623Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1872631Z 
2025-12-04T12:15:06.1872916Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1873098Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.1873303Z ================== 1 failed, 187 deselected, 2 rerun in 5.39s ==================
2025-12-04T12:15:06.1873420Z Got exit code 1
2025-12-04T12:15:06.1874070Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T12:15:06.1874497Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:06.1874972Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ef8db3fa00c6c1d7.xml
2025-12-04T12:15:06.1875139Z ============================= test session starts ==============================
2025-12-04T12:15:06.1875555Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.1875669Z cachedir: .pytest_cache
2025-12-04T12:15:06.1876187Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.1876326Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.1876435Z configfile: pytest.ini
2025-12-04T12:15:06.1877040Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.1877271Z collecting ... collected 188 items / 56 deselected / 132 selected
2025-12-04T12:15:06.1877462Z stepcurrent: skipping 56 already run items.
2025-12-04T12:15:06.1877594Z Running 132 items in this shard
2025-12-04T12:15:06.1877599Z 
2025-12-04T12:15:06.1878759Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1879533Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1880086Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1880708Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1881229Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1881668Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1882229Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.1882681Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T12:15:06.1883250Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T12:15:06.1883707Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T12:15:06.1884271Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T12:15:06.1884818Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T12:15:06.1885367Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T12:15:06.1885742Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1887455Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1887998Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1889058Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1889804Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1890717Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1891405Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1892335Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1893110Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1893736Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1894553Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1894929Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1895854Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1895993Z ('RERUN', {'yellow': True}) [3.7823s] [  0%]
2025-12-04T12:15:06.1897235Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1897997Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1898550Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1899133Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1899632Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1900087Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1900636Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.1901102Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T12:15:06.1901721Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T12:15:06.1902171Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T12:15:06.1902754Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T12:15:06.1903317Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T12:15:06.1903877Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T12:15:06.1904241Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1905916Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1906519Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1907565Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1908242Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1909144Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1909844Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1910727Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1911515Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1912128Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1912883Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1913272Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1914179Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1914327Z ('RERUN', {'yellow': True}) [0.7077s] [  0%]
2025-12-04T12:15:06.1915510Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.1916274Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1916825Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.1917425Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.1917937Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.1918372Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.1918936Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.1919383Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T12:15:06.1919979Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T12:15:06.1920429Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T12:15:06.1921001Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T12:15:06.1921546Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T12:15:06.1922128Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T12:15:06.1922498Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.1924172Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.1924710Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.1925776Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1926403Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1927317Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1928004Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1928895Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1929701Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1930315Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.1931079Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1931481Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.1932390Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1932495Z FAILED [0.7078s] [  0%]
2025-12-04T12:15:06.1932501Z 
2025-12-04T12:15:06.1932653Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.1932995Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.1933150Z Traceback (most recent call last):
2025-12-04T12:15:06.1933568Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1933718Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1934216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1934484Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1935002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1935214Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1935762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1935913Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1936538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1936860Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1937386Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1937550Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1938032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1938170Z     return self._compile_to_module()
2025-12-04T12:15:06.1938656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1938824Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1939356Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1939491Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1939999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1940233Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1940820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1940962Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1941447Z   File "/tmp/tmpx98mpn5_/lw/clw6xaxj6sudxztfcfu6io3ulrezf4tprx7jroekbdgwjmiiquzl.py", line 48, in <module>
2025-12-04T12:15:06.1941955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1942082Z     kernel.precompile(
2025-12-04T12:15:06.1942640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1942771Z     self._precompile_worker()
2025-12-04T12:15:06.1943381Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1943593Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1944200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1944402Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1944867Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1945120Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1945566Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1945953Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1946186Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1946502Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1946609Z ^
2025-12-04T12:15:06.1947068Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1947075Z 
2025-12-04T12:15:06.1947808Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1947814Z 
2025-12-04T12:15:06.1947849Z 
2025-12-04T12:15:06.1948072Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1948732Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.1948741Z 
2025-12-04T12:15:06.1949011Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1949239Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1949363Z frames [('total', 1)]
2025-12-04T12:15:06.1949484Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1949723Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1949961Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1950063Z graph_break []
2025-12-04T12:15:06.1950412Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.1950540Z Traceback (most recent call last):
2025-12-04T12:15:06.1950954Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1951123Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1951615Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1951867Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1952402Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1952596Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1953121Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1953272Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1953866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1954208Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1954739Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1954903Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1955439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1955561Z     return self._compile_to_module()
2025-12-04T12:15:06.1956063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1956229Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1956750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1956899Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1957394Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1957669Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1958255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1958386Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1958878Z   File "/tmp/tmppoz_38bc/xh/cxhicdx3rgo2cr6w24iitqtoq4q7nzkqsghyojlki6goph7hz222.py", line 48, in <module>
2025-12-04T12:15:06.1959341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1959463Z     kernel.precompile(
2025-12-04T12:15:06.1960048Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1960169Z     self._precompile_worker()
2025-12-04T12:15:06.1960777Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1960957Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1961550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1961769Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1962218Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1962477Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1962921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1963258Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1963499Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1963810Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1963915Z ^
2025-12-04T12:15:06.1964369Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1964377Z 
2025-12-04T12:15:06.1965086Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1965092Z 
2025-12-04T12:15:06.1965110Z 
2025-12-04T12:15:06.1965327Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1965998Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.1966005Z 
2025-12-04T12:15:06.1966287Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1966514Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1966619Z frames [('total', 1)]
2025-12-04T12:15:06.1966752Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1967024Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1967260Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1967362Z graph_break []
2025-12-04T12:15:06.1967582Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1967702Z frames [('total', 1)]
2025-12-04T12:15:06.1967817Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1968035Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1968286Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1968385Z graph_break []
2025-12-04T12:15:06.1968532Z =================================== FAILURES ===================================
2025-12-04T12:15:06.1968931Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.1969056Z Traceback (most recent call last):
2025-12-04T12:15:06.1969471Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.1969623Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.1970110Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.1970372Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.1970884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.1971377Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.1971894Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.1972046Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.1972595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.1972919Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.1973442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.1973609Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.1974092Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.1974230Z     return self._compile_to_module()
2025-12-04T12:15:06.1974721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.1974890Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.1975422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.1975552Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.1976062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.1976366Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.1976955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.1977103Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.1977654Z   File "/tmp/tmp561o8arb/4l/c4lqqty7unedmq22u77uoqgnvprm5hukzraavwdhhir5bm6njnux.py", line 48, in <module>
2025-12-04T12:15:06.1978123Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.1978254Z     kernel.precompile(
2025-12-04T12:15:06.1978811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.1978992Z     self._precompile_worker()
2025-12-04T12:15:06.1979592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.1979772Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.1980382Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.1980579Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.1981050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.1981294Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.1981780Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.1982128Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.1982359Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1982666Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1982766Z ^
2025-12-04T12:15:06.1983225Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1983232Z 
2025-12-04T12:15:06.1983986Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1983993Z 
2025-12-04T12:15:06.1983997Z 
2025-12-04T12:15:06.1984217Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1984866Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.1984872Z 
2025-12-04T12:15:06.1985144Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1985365Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1985483Z frames [('total', 1)]
2025-12-04T12:15:06.1985601Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1985836Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1986068Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1986170Z graph_break []
2025-12-04T12:15:06.1986406Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1986510Z frames [('total', 1)]
2025-12-04T12:15:06.1986628Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1986862Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1987096Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1987196Z graph_break []
2025-12-04T12:15:06.1987429Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.1987532Z frames [('total', 1)]
2025-12-04T12:15:06.1987645Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.1987872Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.1988102Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.1988214Z graph_break []
2025-12-04T12:15:06.1988902Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ef8db3fa00c6c1d7.xml -
2025-12-04T12:15:06.1989079Z =========================== short test summary info ============================
2025-12-04T12:15:06.1989885Z FAILED [0.7078s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.1990195Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.1990332Z ^
2025-12-04T12:15:06.1990788Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.1990795Z 
2025-12-04T12:15:06.1991503Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.1991508Z 
2025-12-04T12:15:06.1991513Z 
2025-12-04T12:15:06.1991747Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.1992386Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.1992423Z 
2025-12-04T12:15:06.1992703Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.1992885Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.1993092Z ================== 1 failed, 56 deselected, 2 rerun in 5.24s ===================
2025-12-04T12:15:06.1993208Z Got exit code 1
2025-12-04T12:15:06.1993317Z Retrying single test...
2025-12-04T12:15:06.1993809Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7a3ac84fc91fa02b.xml
2025-12-04T12:15:06.1993974Z ============================= test session starts ==============================
2025-12-04T12:15:06.1994358Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.1994487Z cachedir: .pytest_cache
2025-12-04T12:15:06.1995010Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.1995139Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.1995263Z configfile: pytest.ini
2025-12-04T12:15:06.1995857Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.1996096Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:06.1996810Z stepcurrent: skipping 56 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.1996927Z Running 1 items in this shard
2025-12-04T12:15:06.1996932Z 
2025-12-04T12:15:06.1998105Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.2000201Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2001645Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2002924Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2004143Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.2005291Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.2006434Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.2007629Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T12:15:06.2008800Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T12:15:06.2009990Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T12:15:06.2011138Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T12:15:06.2012371Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T12:15:06.2013597Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T12:15:06.2014694Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.2016966Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.2019314Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.2021093Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2022901Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2024581Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2026301Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2028022Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2029832Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2031350Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.2032870Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2034144Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.2035598Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.2036760Z ('RERUN', {'yellow': True}) [3.7988s] [100%]
2025-12-04T12:15:06.2038178Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.2040253Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2041859Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2043155Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2044371Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.2045517Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.2046657Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.2047810Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T12:15:06.2048974Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T12:15:06.2050154Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T12:15:06.2051352Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T12:15:06.2052609Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T12:15:06.2053834Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T12:15:06.2054912Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.2057186Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.2059550Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.2061280Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2063088Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2064746Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2066521Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2068232Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2070048Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2071842Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.2073355Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2074640Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.2076070Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.2077341Z ('RERUN', {'yellow': True}) [0.7157s] [100%]
2025-12-04T12:15:06.2078753Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.2080799Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2082311Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2083577Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2084789Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.2085861Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.2087001Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.2088146Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T12:15:06.2089310Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T12:15:06.2090471Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T12:15:06.2091638Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T12:15:06.2092898Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T12:15:06.2094122Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T12:15:06.2095177Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.2097497Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.2099855Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.2101586Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2103460Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2105136Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2106837Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2108629Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2110529Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2112054Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.2113625Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2114879Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.2116305Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.2117451Z FAILED [0.7141s] [100%]
2025-12-04T12:15:06.2117637Z 
2025-12-04T12:15:06.2117802Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.2118418Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.2119026Z Traceback (most recent call last):
2025-12-04T12:15:06.2119683Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.2120392Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.2121162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.2122053Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.2122967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.2123816Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.2124666Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.2125483Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.2126307Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.2127298Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.2128331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.2129160Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.2129934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.2130676Z     return self._compile_to_module()
2025-12-04T12:15:06.2131453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.2132258Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.2133070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.2133866Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.2134636Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.2135512Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.2136567Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.2137422Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.2138170Z   File "/tmp/tmpsb_t3t81/pn/cpnj7uzwucqhgl2z5fsfdj7ymx5po6ue5rfhnfah4hosyiya6erj.py", line 48, in <module>
2025-12-04T12:15:06.2139264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.2139973Z     kernel.precompile(
2025-12-04T12:15:06.2140725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.2141553Z     self._precompile_worker()
2025-12-04T12:15:06.2142400Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.2143322Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.2144236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2145185Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2145972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2146828Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2147667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2148599Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2149292Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.2149970Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2150522Z ^
2025-12-04T12:15:06.2151097Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.2151705Z 
2025-12-04T12:15:06.2152418Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.2153281Z 
2025-12-04T12:15:06.2153286Z 
2025-12-04T12:15:06.2153508Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.2154506Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.2155279Z 
2025-12-04T12:15:06.2155566Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.2156233Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2156713Z frames [('total', 1)]
2025-12-04T12:15:06.2157020Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.2157463Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2158068Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.2158565Z graph_break []
2025-12-04T12:15:06.2159057Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.2159645Z Traceback (most recent call last):
2025-12-04T12:15:06.2160291Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.2160984Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.2161750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.2162634Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.2163540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.2164423Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.2165253Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.2166056Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.2166876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.2167874Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.2168847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.2169695Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.2170467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.2171403Z     return self._compile_to_module()
2025-12-04T12:15:06.2172139Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.2172935Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.2173758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.2174535Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.2175291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.2176164Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.2177203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.2178052Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.2178816Z   File "/tmp/tmpbf617sg9/7t/c7tfdw6utbmvjfoxsl22bakcthcus765bhk4sxl3zr4dnquos5mo.py", line 48, in <module>
2025-12-04T12:15:06.2179932Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.2180644Z     kernel.precompile(
2025-12-04T12:15:06.2181398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.2182222Z     self._precompile_worker()
2025-12-04T12:15:06.2183044Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.2183948Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.2184943Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2185893Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2186689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2193400Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2194410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2195349Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2196050Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.2196731Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2197282Z ^
2025-12-04T12:15:06.2197868Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.2198480Z 
2025-12-04T12:15:06.2199262Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.2200121Z 
2025-12-04T12:15:06.2200126Z 
2025-12-04T12:15:06.2200352Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.2201363Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.2202135Z 
2025-12-04T12:15:06.2202418Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.2203044Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2203523Z frames [('total', 1)]
2025-12-04T12:15:06.2203825Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.2204326Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2204928Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.2205399Z graph_break []
2025-12-04T12:15:06.2205776Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2206232Z frames [('total', 1)]
2025-12-04T12:15:06.2206531Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.2206971Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.2207555Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2208036Z graph_break []
2025-12-04T12:15:06.2208340Z =================================== FAILURES ===================================
2025-12-04T12:15:06.2208952Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.2209544Z Traceback (most recent call last):
2025-12-04T12:15:06.2210191Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.2210886Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.2211649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.2212532Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.2213436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.2214391Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.2215278Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.2216075Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.2216970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.2218092Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.2219179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.2219998Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.2220762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.2221541Z     return self._compile_to_module()
2025-12-04T12:15:06.2222268Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.2223059Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.2223858Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.2224651Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.2225403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.2226329Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.2227274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.2228134Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.2228904Z   File "/tmp/tmpdlz471w8/ks/cksf4jmqddtjemybjlcyhewuloalp4s4lf5nqdasdtmjasynwwve.py", line 48, in <module>
2025-12-04T12:15:06.2230026Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.2230732Z     kernel.precompile(
2025-12-04T12:15:06.2231475Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.2232360Z     self._precompile_worker()
2025-12-04T12:15:06.2233169Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.2234094Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.2235012Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2235957Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2236738Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2237573Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2238402Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2239325Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2240017Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.2240698Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2241244Z ^
2025-12-04T12:15:06.2241815Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.2242422Z 
2025-12-04T12:15:06.2243133Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.2243986Z 
2025-12-04T12:15:06.2243991Z 
2025-12-04T12:15:06.2244211Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.2245206Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.2245986Z 
2025-12-04T12:15:06.2246318Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.2246948Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2247415Z frames [('total', 1)]
2025-12-04T12:15:06.2247712Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.2248155Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2248783Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.2249243Z graph_break []
2025-12-04T12:15:06.2249623Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2250078Z frames [('total', 1)]
2025-12-04T12:15:06.2250380Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.2250817Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.2251397Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2251877Z graph_break []
2025-12-04T12:15:06.2252254Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2252749Z frames [('total', 1)]
2025-12-04T12:15:06.2253044Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.2253482Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.2254072Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2254528Z graph_break []
2025-12-04T12:15:06.2255336Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7a3ac84fc91fa02b.xml -
2025-12-04T12:15:06.2256390Z =========================== short test summary info ============================
2025-12-04T12:15:06.2257504Z FAILED [0.7141s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.2258770Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2259314Z ^
2025-12-04T12:15:06.2259898Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.2260495Z 
2025-12-04T12:15:06.2261205Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.2262064Z 
2025-12-04T12:15:06.2262069Z 
2025-12-04T12:15:06.2262290Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.2263284Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.2264053Z 
2025-12-04T12:15:06.2264332Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.2264923Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.2265445Z ================== 1 failed, 187 deselected, 2 rerun in 5.27s ==================
2025-12-04T12:15:06.2265897Z Got exit code 1
2025-12-04T12:15:06.2266169Z Retrying single test...
2025-12-04T12:15:06.2266823Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e162f70cb76e49ff.xml
2025-12-04T12:15:06.2267606Z ============================= test session starts ==============================
2025-12-04T12:15:06.2268278Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.2268872Z cachedir: .pytest_cache
2025-12-04T12:15:06.2269606Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.2270384Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.2270739Z configfile: pytest.ini
2025-12-04T12:15:06.2271753Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.2272713Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:06.2273783Z stepcurrent: skipping 56 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.2274751Z Running 1 items in this shard
2025-12-04T12:15:06.2275009Z 
2025-12-04T12:15:06.2276166Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.2278232Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2279675Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2280975Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2282186Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.2283255Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.2284394Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.2285530Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T12:15:06.2286738Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T12:15:06.2287889Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T12:15:06.2289046Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T12:15:06.2290282Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T12:15:06.2291498Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T12:15:06.2292550Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.2294733Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.2297139Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.2298890Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2300748Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2302507Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2304215Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2305959Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2307757Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2309278Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.2310776Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2312061Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.2313482Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.2314658Z ('RERUN', {'yellow': True}) [3.7868s] [100%]
2025-12-04T12:15:06.2316072Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.2318131Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2319586Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2320851Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2322066Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.2323132Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.2324260Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.2325399Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T12:15:06.2326561Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T12:15:06.2327704Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T12:15:06.2328841Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T12:15:06.2330081Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T12:15:06.2331308Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T12:15:06.2332414Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.2334586Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.2337041Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.2338789Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2340604Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2342314Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2344030Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2345718Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2347555Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2349076Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.2350592Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2351866Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.2353274Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.2354440Z ('RERUN', {'yellow': True}) [0.7081s] [100%]
2025-12-04T12:15:06.2355859Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T12:15:06.2357910Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2359354Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2360625Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2361838Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T12:15:06.2362961Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T12:15:06.2364079Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.2365221Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T12:15:06.2366385Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T12:15:06.2367587Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T12:15:06.2368741Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T12:15:06.2369979Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T12:15:06.2371400Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T12:15:06.2372537Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T12:15:06.2374722Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.2377161Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T12:15:06.2378944Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2380775Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2382458Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2384193Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2385929Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2387731Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2389268Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.2390786Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2392062Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T12:15:06.2393547Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.2394684Z FAILED [0.7100s] [100%]
2025-12-04T12:15:06.2394887Z 
2025-12-04T12:15:06.2395039Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.2395679Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.2396273Z Traceback (most recent call last):
2025-12-04T12:15:06.2396993Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.2397703Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.2398636Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.2399533Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.2400459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.2401327Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.2402234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.2403036Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.2403863Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.2404873Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.2405864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.2406696Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.2407474Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.2408263Z     return self._compile_to_module()
2025-12-04T12:15:06.2408987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.2409794Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.2410616Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.2411413Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.2412166Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.2413046Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.2414013Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.2414864Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.2415639Z   File "/tmp/tmp8e1yhjx4/no/cnopm3fdvd4yz7kcpibin3dcbfdh7fat63u5mupswtlcu553sgha.py", line 48, in <module>
2025-12-04T12:15:06.2416844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.2417580Z     kernel.precompile(
2025-12-04T12:15:06.2418316Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.2419146Z     self._precompile_worker()
2025-12-04T12:15:06.2419966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.2420883Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.2421787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2422734Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2423583Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2424421Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2425257Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2426184Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2426939Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.2427604Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2428158Z ^
2025-12-04T12:15:06.2428752Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.2429340Z 
2025-12-04T12:15:06.2430075Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.2430957Z 
2025-12-04T12:15:06.2430963Z 
2025-12-04T12:15:06.2431185Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.2432190Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.2432982Z 
2025-12-04T12:15:06.2433254Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.2433893Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2434356Z frames [('total', 1)]
2025-12-04T12:15:06.2434663Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.2435126Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2435711Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.2436220Z graph_break []
2025-12-04T12:15:06.2436712Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.2437312Z Traceback (most recent call last):
2025-12-04T12:15:06.2437950Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.2438647Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.2439431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.2440302Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.2441211Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.2442138Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.2442991Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.2443806Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.2444626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.2445629Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.2446620Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.2447452Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.2448220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.2448979Z     return self._compile_to_module()
2025-12-04T12:15:06.2449721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.2450591Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.2451428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.2452234Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.2453005Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.2453912Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.2454883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.2455751Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.2456600Z   File "/tmp/tmpywvjwlis/ym/cymldqugyfrcg7f2sj4ovu6wlfsq6vwwkh3mt4z4kht7mfzuiybj.py", line 48, in <module>
2025-12-04T12:15:06.2457734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.2458474Z     kernel.precompile(
2025-12-04T12:15:06.2459243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.2460098Z     self._precompile_worker()
2025-12-04T12:15:06.2460931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.2461866Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.2462782Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2463724Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2464532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2465436Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2466415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2467354Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2468071Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.2468755Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2469301Z ^
2025-12-04T12:15:06.2469892Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.2470502Z 
2025-12-04T12:15:06.2471434Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.2472284Z 
2025-12-04T12:15:06.2472288Z 
2025-12-04T12:15:06.2472533Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.2473539Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.2474324Z 
2025-12-04T12:15:06.2474598Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.2475248Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2475720Z frames [('total', 1)]
2025-12-04T12:15:06.2476012Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.2476467Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2477068Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.2477535Z graph_break []
2025-12-04T12:15:06.2477901Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2478377Z frames [('total', 1)]
2025-12-04T12:15:06.2478682Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.2479222Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.2479834Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2480311Z graph_break []
2025-12-04T12:15:06.2480606Z =================================== FAILURES ===================================
2025-12-04T12:15:06.2481242Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T12:15:06.2481896Z Traceback (most recent call last):
2025-12-04T12:15:06.2482556Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T12:15:06.2483249Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T12:15:06.2484030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.2484914Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.2485809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.2486714Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.2487554Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.2488363Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.2489173Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.2490179Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.2491180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.2491995Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.2492800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.2493556Z     return self._compile_to_module()
2025-12-04T12:15:06.2494295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.2495074Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.2495893Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.2496772Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.2497533Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.2498393Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.2499365Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.2500226Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.2500987Z   File "/tmp/tmp9ugzbsiq/44/c444otvirizvptr6l2p2f2pqnptejoxwcpdzsujgljt4ih5n5inl.py", line 48, in <module>
2025-12-04T12:15:06.2502075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.2502800Z     kernel.precompile(
2025-12-04T12:15:06.2503556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.2504361Z     self._precompile_worker()
2025-12-04T12:15:06.2505185Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.2506108Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.2507067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2507271Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2507725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2507988Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2508433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2509341Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2509888Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.2510203Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2510312Z ^
2025-12-04T12:15:06.2511246Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.2511262Z 
2025-12-04T12:15:06.2511993Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.2512047Z 
2025-12-04T12:15:06.2512052Z 
2025-12-04T12:15:06.2512271Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.2512915Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.2512935Z 
2025-12-04T12:15:06.2513204Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.2513429Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2513551Z frames [('total', 1)]
2025-12-04T12:15:06.2513668Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.2513943Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2514182Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.2514284Z graph_break []
2025-12-04T12:15:06.2514507Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2514625Z frames [('total', 1)]
2025-12-04T12:15:06.2514742Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.2514977Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.2515221Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2515321Z graph_break []
2025-12-04T12:15:06.2515550Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2515654Z frames [('total', 1)]
2025-12-04T12:15:06.2515768Z stats [('calls_captured', 8)]
2025-12-04T12:15:06.2516005Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T12:15:06.2516241Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2516345Z graph_break []
2025-12-04T12:15:06.2517015Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e162f70cb76e49ff.xml -
2025-12-04T12:15:06.2517193Z =========================== short test summary info ============================
2025-12-04T12:15:06.2518009Z FAILED [0.7100s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.2518324Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2518415Z ^
2025-12-04T12:15:06.2518887Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.2518893Z 
2025-12-04T12:15:06.2519641Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.2519648Z 
2025-12-04T12:15:06.2519653Z 
2025-12-04T12:15:06.2519889Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.2520535Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.2520542Z 
2025-12-04T12:15:06.2520858Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.2521041Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.2521246Z ================== 1 failed, 187 deselected, 2 rerun in 5.25s ==================
2025-12-04T12:15:06.2521362Z Got exit code 1
2025-12-04T12:15:06.2521919Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T12:15:06.2522339Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:06.2522829Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e70a5c274fb86b8e.xml
2025-12-04T12:15:06.2523030Z ============================= test session starts ==============================
2025-12-04T12:15:06.2523398Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.2523513Z cachedir: .pytest_cache
2025-12-04T12:15:06.2524033Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.2524175Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.2524284Z configfile: pytest.ini
2025-12-04T12:15:06.2524903Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.2525189Z collecting ... collected 188 items / 57 deselected / 131 selected
2025-12-04T12:15:06.2525341Z stepcurrent: skipping 57 already run items.
2025-12-04T12:15:06.2525472Z Running 131 items in this shard
2025-12-04T12:15:06.2525480Z 
2025-12-04T12:15:06.2525991Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_16,16,16_cuda PASSED [4.0439s] [  0%]
2025-12-04T12:15:06.2526500Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.7977s] [  1%]
2025-12-04T12:15:06.2527018Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 PASSED [0.1583s] [  2%]
2025-12-04T12:15:06.2527546Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 PASSED [0.1683s] [  3%]
2025-12-04T12:15:06.2528650Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.2529416Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2529988Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2530555Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2531048Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.2531496Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.2532131Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2532671Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2533181Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.2533722Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.2534240Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.2534788Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.2535353Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.2535721Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2537636Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.2538177Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.2539083Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.2539611Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.2540450Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.2541179Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.2542040Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.2542565Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.2543408Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.2544050Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.2544943Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.2545760Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.2546656Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.2547357Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.2548215Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.2548933Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.2549841Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2550206Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2550917Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.2551298Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2551836Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.2552884Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2554130Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2555022Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2555725Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2556612Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2557399Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2558014Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.2558796Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2559336Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2559903Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2560408Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.2560839Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.2561474Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2562002Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2562419Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.2563290Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2563426Z ('RERUN', {'yellow': True}) [0.2462s] [  3%]
2025-12-04T12:15:06.2564530Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.2565293Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2566707Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2568141Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2569527Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.2569980Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.2570633Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2571366Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2571885Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.2572396Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.2572916Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.2573462Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.2574019Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.2574384Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2576217Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.2576828Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.2577793Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.2578313Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.2579150Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.2579921Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.2580771Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.2581293Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.2582132Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.2582833Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.2583718Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.2584530Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.2585507Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.2586212Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.2587075Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.2587766Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.2588668Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2589036Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2589721Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.2590103Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2590639Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.2591693Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2592368Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2593421Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2594131Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2595064Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2595858Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2596477Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.2597284Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2597826Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2598388Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2598964Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.2599406Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.2600054Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2600581Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2600998Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.2601835Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2601970Z ('RERUN', {'yellow': True}) [0.4453s] [  3%]
2025-12-04T12:15:06.2603087Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.2603846Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2604404Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2604962Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2605456Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.2605903Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.2606497Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2607070Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2607580Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.2608087Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.2608637Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.2609182Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.2609734Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.2610098Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2611937Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.2612471Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.2613369Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.2613880Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.2614715Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.2615435Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.2616361Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.2616886Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.2617731Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.2618378Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.2619253Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.2620064Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.2620962Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.2621657Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.2622517Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.2623855Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.2625526Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2625902Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2626579Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.2627005Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2629000Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.2630201Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2630879Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2631778Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2632476Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2633360Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2634146Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2634767Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.2635544Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2636091Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2636648Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2637154Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.2637585Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.2638229Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2638761Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2639192Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.2640074Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2640183Z FAILED [0.4130s] [  3%]
2025-12-04T12:15:06.2640189Z 
2025-12-04T12:15:06.2640351Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.2640667Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _
2025-12-04T12:15:06.2640794Z Traceback (most recent call last):
2025-12-04T12:15:06.2641188Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.2641318Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.2641859Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.2642110Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.2642627Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.2642834Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.2643346Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.2643504Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.2644069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.2644395Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.2644931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.2645079Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.2645560Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.2645700Z     return self._compile_to_module()
2025-12-04T12:15:06.2646188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.2646367Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.2646883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.2647020Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.2647535Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.2647774Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.2648378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.2648510Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.2649016Z   File "/tmp/tmplmuksdhk/my/cmyqtfcskab3ydlogxb3r6dtgztlq5pbmlcnzdf5yowooyb3qrwb.py", line 51, in <module>
2025-12-04T12:15:06.2649494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.2649609Z     kernel.precompile(
2025-12-04T12:15:06.2650168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.2650338Z     self._precompile_worker()
2025-12-04T12:15:06.2650935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.2651134Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.2651730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2651962Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2652433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2652681Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2653144Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2653486Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2653724Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.2654093Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2654221Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2654364Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2654492Z     xmask = xindex < xnumel
2025-12-04T12:15:06.2654591Z     x0 = xindex
2025-12-04T12:15:06.2654780Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2654903Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2654999Z            ^
2025-12-04T12:15:06.2655402Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2655409Z 
2025-12-04T12:15:06.2656155Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.2656162Z 
2025-12-04T12:15:06.2656168Z 
2025-12-04T12:15:06.2656484Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.2657113Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T12:15:06.2657118Z 
2025-12-04T12:15:06.2657395Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.2657639Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2657748Z frames [('total', 1)]
2025-12-04T12:15:06.2657869Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.2658109Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.2658579Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2658697Z graph_break []
2025-12-04T12:15:06.2659019Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _
2025-12-04T12:15:06.2659147Z Traceback (most recent call last):
2025-12-04T12:15:06.2659536Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.2659663Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.2660156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.2660418Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.2660930Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.2661139Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.2661649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.2661848Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.2662401Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.2662724Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.2663253Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.2663440Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.2663919Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.2664054Z     return self._compile_to_module()
2025-12-04T12:15:06.2664539Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.2664720Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.2665242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.2665406Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.2665917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.2666151Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.2666740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.2666882Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.2667364Z   File "/tmp/tmpvltzk6_5/yb/cyboltm5bweutm2o3lswgzf32bdwzhpvb32eanp77hnagze5ck47.py", line 51, in <module>
2025-12-04T12:15:06.2667843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.2667991Z     kernel.precompile(
2025-12-04T12:15:06.2668550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.2668684Z     self._precompile_worker()
2025-12-04T12:15:06.2669283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.2669481Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.2670079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2670279Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2670743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2671168Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2671618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2671969Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2672202Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.2672538Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2672667Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2672808Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2672934Z     xmask = xindex < xnumel
2025-12-04T12:15:06.2673033Z     x0 = xindex
2025-12-04T12:15:06.2673203Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2673340Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2673433Z            ^
2025-12-04T12:15:06.2673840Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2673932Z 
2025-12-04T12:15:06.2674653Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.2674662Z 
2025-12-04T12:15:06.2674667Z 
2025-12-04T12:15:06.2674889Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.2675578Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T12:15:06.2675585Z 
2025-12-04T12:15:06.2675859Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.2676103Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2676211Z frames [('total', 1)]
2025-12-04T12:15:06.2676333Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.2676576Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.2677045Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2677210Z graph_break []
2025-12-04T12:15:06.2677432Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2677539Z frames [('total', 1)]
2025-12-04T12:15:06.2677670Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.2677893Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.2678351Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2678465Z graph_break []
2025-12-04T12:15:06.2678612Z =================================== FAILURES ===================================
2025-12-04T12:15:06.2678927Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _
2025-12-04T12:15:06.2679106Z Traceback (most recent call last):
2025-12-04T12:15:06.2679487Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.2679633Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.2680120Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.2680368Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.2680897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.2681094Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.2683546Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.2683705Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.2684312Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.2684651Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.2685931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.2686893Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.2687440Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.2687563Z     return self._compile_to_module()
2025-12-04T12:15:06.2688064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.2689404Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.2689986Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.2690227Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.2691729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.2691988Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.2692580Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.2692754Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.2693248Z   File "/tmp/tmpz_ft4e9s/zc/czcpsk3mcmiadknd77dy3sn35d6awyvzophsapvkjkr5udk3zhyb.py", line 51, in <module>
2025-12-04T12:15:06.2693714Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.2693828Z     kernel.precompile(
2025-12-04T12:15:06.2694402Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.2694526Z     self._precompile_worker()
2025-12-04T12:15:06.2695139Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.2695354Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.2695957Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2696172Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2696701Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2696964Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2697406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2697780Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2698027Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.2698351Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2698476Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2698630Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2698745Z     xmask = xindex < xnumel
2025-12-04T12:15:06.2698854Z     x0 = xindex
2025-12-04T12:15:06.2699023Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2699143Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2699250Z            ^
2025-12-04T12:15:06.2699639Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2699646Z 
2025-12-04T12:15:06.2700365Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.2700386Z 
2025-12-04T12:15:06.2700391Z 
2025-12-04T12:15:06.2700611Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.2701238Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T12:15:06.2701247Z 
2025-12-04T12:15:06.2701532Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.2701757Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2701882Z frames [('total', 1)]
2025-12-04T12:15:06.2702005Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.2702228Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.2702712Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2702851Z graph_break []
2025-12-04T12:15:06.2703073Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2703191Z frames [('total', 1)]
2025-12-04T12:15:06.2703307Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.2703528Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.2704001Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2704134Z graph_break []
2025-12-04T12:15:06.2704366Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2704471Z frames [('total', 1)]
2025-12-04T12:15:06.2704586Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.2704817Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.2705279Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2705381Z graph_break []
2025-12-04T12:15:06.2706046Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e70a5c274fb86b8e.xml -
2025-12-04T12:15:06.2706253Z =========================== short test summary info ============================
2025-12-04T12:15:06.2707033Z FAILED [0.4130s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.2707361Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2707488Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2707643Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2707754Z     xmask = xindex < xnumel
2025-12-04T12:15:06.2707848Z     x0 = xindex
2025-12-04T12:15:06.2708046Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2708202Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2708309Z            ^
2025-12-04T12:15:06.2708703Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2708712Z 
2025-12-04T12:15:06.2709423Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.2709432Z 
2025-12-04T12:15:06.2709451Z 
2025-12-04T12:15:06.2709671Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.2710298Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T12:15:06.2710304Z 
2025-12-04T12:15:06.2710636Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.2710825Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.2711049Z ============= 1 failed, 4 passed, 57 deselected, 2 rerun in 6.33s ==============
2025-12-04T12:15:06.2711169Z Got exit code 1
2025-12-04T12:15:06.2711283Z Retrying single test...
2025-12-04T12:15:06.2711768Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0c17434f07767682.xml
2025-12-04T12:15:06.2711938Z ============================= test session starts ==============================
2025-12-04T12:15:06.2712297Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.2712425Z cachedir: .pytest_cache
2025-12-04T12:15:06.2712949Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.2713078Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.2713204Z configfile: pytest.ini
2025-12-04T12:15:06.2713828Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.2714071Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:06.2714779Z stepcurrent: skipping 61 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T12:15:06.2714899Z Running 1 items in this shard
2025-12-04T12:15:06.2714935Z 
2025-12-04T12:15:06.2716047Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.2716814Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2717378Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2717974Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2718483Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.2718917Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.2719514Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2720061Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2720599Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.2721132Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.2721642Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.2722190Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.2722756Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.2723123Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2724949Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.2725492Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.2726381Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.2726885Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.2727751Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.2728473Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.2729330Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.2729886Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.2730726Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.2731375Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.2732276Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.2733091Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.2733944Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.2734687Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.2735550Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.2736233Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.2737202Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2737567Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2738504Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.2738885Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2739628Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.2741173Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2742782Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2743745Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2745043Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2745939Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2747737Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2748350Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.2749128Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2749677Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2751062Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2751669Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.2752161Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.2752870Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2753396Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2753869Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.2754699Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2754836Z ('RERUN', {'yellow': True}) [3.4912s] [100%]
2025-12-04T12:15:06.2756200Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.2757017Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2757634Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2758253Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2758843Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.2759335Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.2759987Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2760529Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2761039Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.2761702Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.2762210Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.2762758Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.2763354Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.2763714Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2765532Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.2766101Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.2766981Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.2767488Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.2768349Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.2769076Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.2769935Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.2770458Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.2771483Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.2772139Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.2773012Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.2773836Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.2774712Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.2775412Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.2776426Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.2777129Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.2778070Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2778435Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2779116Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.2779497Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2780033Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.2781125Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2781756Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2782665Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2783396Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2784285Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2785068Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2785680Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.2786451Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2786998Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2787572Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2788067Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.2788501Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.2789110Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2789641Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2790121Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.2790942Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2791081Z ('RERUN', {'yellow': True}) [0.4372s] [100%]
2025-12-04T12:15:06.2792186Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.2792982Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2793542Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2794097Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2794810Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.2795306Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.2796022Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2796563Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2797244Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.2797862Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.2798437Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.2798987Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.2799721Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.2800088Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2802127Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.2802664Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.2803858Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.2804422Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.2805432Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.2806157Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.2807087Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.2807755Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.2808602Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.2809254Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.2810124Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.2811068Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.2812161Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.2812858Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.2813885Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.2814576Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.2815479Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2815846Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2816605Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.2816973Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2817504Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.2818559Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2819187Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2820095Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2820818Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2821874Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2822660Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2823319Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.2824093Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2824638Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2825244Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2825743Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.2826174Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.2826780Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2827306Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2827764Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.2828588Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2828698Z FAILED [0.4331s] [100%]
2025-12-04T12:15:06.2828717Z 
2025-12-04T12:15:06.2828865Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.2829184Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _
2025-12-04T12:15:06.2829324Z Traceback (most recent call last):
2025-12-04T12:15:06.2829702Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.2829835Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.2830341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.2830601Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.2831132Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.2831332Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.2831843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.2832009Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.2832545Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.2832867Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.2833402Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.2833555Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.2834083Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.2834216Z     return self._compile_to_module()
2025-12-04T12:15:06.2834702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.2834883Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.2835434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.2835583Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.2836080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.2836315Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.2836918Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.2837049Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.2837579Z   File "/tmp/tmpb3b7m996/i3/ci3i5t3cwbigr6gq4kn5uylj6623h24vvwgdqbszvaaqeooimkcw.py", line 51, in <module>
2025-12-04T12:15:06.2838058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.2838177Z     kernel.precompile(
2025-12-04T12:15:06.2838747Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.2838870Z     self._precompile_worker()
2025-12-04T12:15:06.2839469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.2839671Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.2840300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2840522Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2840975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2841225Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2841685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2842020Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2842253Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.2842590Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2842716Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2842875Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2842988Z     xmask = xindex < xnumel
2025-12-04T12:15:06.2843088Z     x0 = xindex
2025-12-04T12:15:06.2843275Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2843397Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2843489Z            ^
2025-12-04T12:15:06.2843892Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2843901Z 
2025-12-04T12:15:06.2844618Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.2844625Z 
2025-12-04T12:15:06.2844629Z 
2025-12-04T12:15:06.2844862Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.2845485Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T12:15:06.2845523Z 
2025-12-04T12:15:06.2845807Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.2846036Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2846142Z frames [('total', 1)]
2025-12-04T12:15:06.2846273Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.2846734Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2847005Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.2847120Z graph_break []
2025-12-04T12:15:06.2847436Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _
2025-12-04T12:15:06.2847573Z Traceback (most recent call last):
2025-12-04T12:15:06.2847949Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.2848079Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.2848584Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.2848866Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.2849375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.2849585Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.2850094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.2850260Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.2850794Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.2851144Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.2851686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.2851837Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.2852331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.2852457Z     return self._compile_to_module()
2025-12-04T12:15:06.2852947Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.2853129Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.2853647Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.2853781Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.2854295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.2854527Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.2855122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.2855250Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.2855755Z   File "/tmp/tmpcj5k8xzb/23/c23abf7ytetlyzxb3s7egzkdo7cpfa2bemghxqitrhte6qiyfdju.py", line 51, in <module>
2025-12-04T12:15:06.2856238Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.2856451Z     kernel.precompile(
2025-12-04T12:15:06.2857026Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.2857147Z     self._precompile_worker()
2025-12-04T12:15:06.2857801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.2857998Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.2858595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2858796Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2859297Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2859546Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2860002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2860339Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2860575Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.2860918Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2861078Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2861219Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2861344Z     xmask = xindex < xnumel
2025-12-04T12:15:06.2861441Z     x0 = xindex
2025-12-04T12:15:06.2861624Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2861749Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2861841Z            ^
2025-12-04T12:15:06.2862243Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2862248Z 
2025-12-04T12:15:06.2862961Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.2862968Z 
2025-12-04T12:15:06.2862972Z 
2025-12-04T12:15:06.2863237Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.2863859Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T12:15:06.2863867Z 
2025-12-04T12:15:06.2864134Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.2864371Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2864480Z frames [('total', 1)]
2025-12-04T12:15:06.2864610Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.2865071Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2865296Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.2865410Z graph_break []
2025-12-04T12:15:06.2865638Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2865744Z frames [('total', 1)]
2025-12-04T12:15:06.2865876Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.2866096Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.2866560Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2866672Z graph_break []
2025-12-04T12:15:06.2866821Z =================================== FAILURES ===================================
2025-12-04T12:15:06.2867149Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _
2025-12-04T12:15:06.2867272Z Traceback (most recent call last):
2025-12-04T12:15:06.2867645Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.2867788Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.2868276Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.2868573Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.2869168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.2869395Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.2869929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.2870120Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.2870653Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.2871166Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.2871688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.2871855Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.2872336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.2872533Z     return self._compile_to_module()
2025-12-04T12:15:06.2873038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.2873206Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.2873737Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.2873869Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.2874368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.2874619Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.2875254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.2875388Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.2875906Z   File "/tmp/tmpcrwducs3/3i/c3i6ijzztcvlfmucnu3llmrfhfa3cmsb3qtgms2uj7mn7z3wxqq6.py", line 51, in <module>
2025-12-04T12:15:06.2876372Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.2876500Z     kernel.precompile(
2025-12-04T12:15:06.2877055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.2877174Z     self._precompile_worker()
2025-12-04T12:15:06.2877779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.2877960Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.2878573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2878775Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2879227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2879489Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2879936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2880272Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2880523Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.2880844Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2880984Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2881168Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2881282Z     xmask = xindex < xnumel
2025-12-04T12:15:06.2881395Z     x0 = xindex
2025-12-04T12:15:06.2881567Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2881686Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2881794Z            ^
2025-12-04T12:15:06.2882185Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2882234Z 
2025-12-04T12:15:06.2882966Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.2882973Z 
2025-12-04T12:15:06.2882977Z 
2025-12-04T12:15:06.2883198Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.2883822Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T12:15:06.2883841Z 
2025-12-04T12:15:06.2884110Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.2884366Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2884491Z frames [('total', 1)]
2025-12-04T12:15:06.2884608Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.2885078Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2885317Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.2885418Z graph_break []
2025-12-04T12:15:06.2885638Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2885756Z frames [('total', 1)]
2025-12-04T12:15:06.2885870Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.2886129Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.2886595Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2886697Z graph_break []
2025-12-04T12:15:06.2886944Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.2887049Z frames [('total', 1)]
2025-12-04T12:15:06.2887163Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.2887400Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.2887865Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.2887984Z graph_break []
2025-12-04T12:15:06.2888630Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0c17434f07767682.xml -
2025-12-04T12:15:06.2888808Z =========================== short test summary info ============================
2025-12-04T12:15:06.2889596Z FAILED [0.4331s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.2889922Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2890063Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2890205Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2890320Z     xmask = xindex < xnumel
2025-12-04T12:15:06.2890431Z     x0 = xindex
2025-12-04T12:15:06.2890604Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2890727Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2890835Z            ^
2025-12-04T12:15:06.2891226Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2891231Z 
2025-12-04T12:15:06.2892010Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.2892016Z 
2025-12-04T12:15:06.2892023Z 
2025-12-04T12:15:06.2892246Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.2892866Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T12:15:06.2892902Z 
2025-12-04T12:15:06.2893189Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.2893375Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.2893598Z ================== 1 failed, 187 deselected, 2 rerun in 4.41s ==================
2025-12-04T12:15:06.2893704Z Got exit code 1
2025-12-04T12:15:06.2893814Z Retrying single test...
2025-12-04T12:15:06.2894307Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3815c1aa47a06d85.xml
2025-12-04T12:15:06.2894478Z ============================= test session starts ==============================
2025-12-04T12:15:06.2894867Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.2894994Z cachedir: .pytest_cache
2025-12-04T12:15:06.2895517Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.2895664Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.2895777Z configfile: pytest.ini
2025-12-04T12:15:06.2896433Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.2896680Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:06.2897426Z stepcurrent: skipping 61 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T12:15:06.2897564Z Running 1 items in this shard
2025-12-04T12:15:06.2897569Z 
2025-12-04T12:15:06.2898660Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.2899428Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2899995Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2900559Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2901073Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.2901512Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.2902110Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2902653Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2903162Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.2903683Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.2904190Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.2904783Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.2905331Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.2905694Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2907546Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.2908083Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.2908993Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.2909505Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.2910348Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.2911052Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.2911937Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.2912465Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.2913309Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.2913958Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.2914836Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.2915666Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.2916508Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.2917206Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.2918063Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.2918779Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.2919682Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2920048Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2920771Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.2921134Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2921670Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.2922725Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2923387Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2924292Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2924973Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2925906Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2926675Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2927291Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.2928067Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2928608Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2929182Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2929679Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.2930127Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.2930721Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2931243Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2931680Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.2932503Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2932690Z ('RERUN', {'yellow': True}) [3.4816s] [100%]
2025-12-04T12:15:06.2933767Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.2934528Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2935123Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2935684Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2936197Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.2936692Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.2937328Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2937868Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2938379Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.2938905Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.2939410Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.2940023Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.2940573Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.2940939Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2942767Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.2943348Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.2944228Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.2944732Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.2945577Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.2946284Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.2947168Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.2947694Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.2948534Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.2949226Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.2950090Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.2950924Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.2951805Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.2952503Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.2953358Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.2954073Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.2954970Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2955336Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2956032Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.2956392Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2956924Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.2957977Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.2958609Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.2959511Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.2960190Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.2961094Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.2961896Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.2962513Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.2963313Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2963853Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2964426Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2964921Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.2965393Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.2965986Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2966512Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2966940Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.2967760Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2967941Z ('RERUN', {'yellow': True}) [0.4281s] [100%]
2025-12-04T12:15:06.2969036Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.2969799Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.2970478Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.2971215Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.2971730Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.2972166Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.2972770Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.2973309Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.2973818Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.2974344Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.2974844Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.2975495Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.2976047Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.2976472Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2978429Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.2978972Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.2979908Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.2980416Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.2981263Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.2981969Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.2982869Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.2983390Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.2984238Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.2984892Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.2985769Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.2986612Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.2987458Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.2988155Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.2995819Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.2996640Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.2997544Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.2997917Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2998650Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.2999014Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.2999550Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3000609Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3001303Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3002214Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3002893Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3003793Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3004597Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3005213Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.3005990Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3006539Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3007114Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3007611Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3008046Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3008658Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3009186Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3009618Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.3010438Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3010545Z FAILED [0.4306s] [100%]
2025-12-04T12:15:06.3010568Z 
2025-12-04T12:15:06.3010754Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.3011074Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _
2025-12-04T12:15:06.3011216Z Traceback (most recent call last):
2025-12-04T12:15:06.3011588Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.3011751Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.3012256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.3012507Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.3013035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.3013227Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.3013743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.3013938Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.3014476Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.3014799Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.3015341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.3015489Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.3015988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.3016114Z     return self._compile_to_module()
2025-12-04T12:15:06.3016752Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.3016936Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.3017454Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.3017596Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.3018088Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.3018321Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.3018915Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.3019043Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.3019540Z   File "/tmp/tmp326ewhxr/ei/ceilwcfm3zbt52h2etvmwfnzahy2fjygyu5fyod7cxvfuxbjibsb.py", line 51, in <module>
2025-12-04T12:15:06.3020017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.3020129Z     kernel.precompile(
2025-12-04T12:15:06.3020698Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.3020816Z     self._precompile_worker()
2025-12-04T12:15:06.3021407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.3021604Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.3022195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3022405Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3022855Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3023131Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3023582Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3023915Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3024157Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3024581Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3024705Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3024854Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3024961Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3025056Z     x0 = xindex
2025-12-04T12:15:06.3025232Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3025351Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3025442Z            ^
2025-12-04T12:15:06.3025840Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3025878Z 
2025-12-04T12:15:06.3026596Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3026603Z 
2025-12-04T12:15:06.3026608Z 
2025-12-04T12:15:06.3026841Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3027464Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T12:15:06.3027470Z 
2025-12-04T12:15:06.3027747Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3027971Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3028076Z frames [('total', 1)]
2025-12-04T12:15:06.3028228Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3028694Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3028919Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3029027Z graph_break []
2025-12-04T12:15:06.3029340Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _
2025-12-04T12:15:06.3029476Z Traceback (most recent call last):
2025-12-04T12:15:06.3029848Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.3029974Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.3030469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.3030714Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.3031234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.3031439Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.3031953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.3032109Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.3032643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.3032962Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.3033494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.3033639Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.3034131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.3034286Z     return self._compile_to_module()
2025-12-04T12:15:06.3034771Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.3034949Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.3035461Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.3035623Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.3036129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.3036357Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.3036953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.3037080Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.3037584Z   File "/tmp/tmpkual5uit/bk/cbkktboxumkejm2d2l6vpn5ws4em3tvuhn4q72hetkujcqovsh66.py", line 51, in <module>
2025-12-04T12:15:06.3038087Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.3038199Z     kernel.precompile(
2025-12-04T12:15:06.3038761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.3038882Z     self._precompile_worker()
2025-12-04T12:15:06.3039474Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.3039664Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.3040262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3040493Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3040952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3041202Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3041651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3041987Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3042218Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3042547Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3042669Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3042814Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3042926Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3043021Z     x0 = xindex
2025-12-04T12:15:06.3043202Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3043322Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3043416Z            ^
2025-12-04T12:15:06.3043817Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3043823Z 
2025-12-04T12:15:06.3044530Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3044539Z 
2025-12-04T12:15:06.3044544Z 
2025-12-04T12:15:06.3044772Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3045394Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T12:15:06.3045399Z 
2025-12-04T12:15:06.3045666Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3045933Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3046040Z frames [('total', 1)]
2025-12-04T12:15:06.3046167Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3046634Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3046853Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3047010Z graph_break []
2025-12-04T12:15:06.3047226Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3047328Z frames [('total', 1)]
2025-12-04T12:15:06.3047448Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3047664Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3048131Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3048236Z graph_break []
2025-12-04T12:15:06.3048385Z =================================== FAILURES ===================================
2025-12-04T12:15:06.3048729Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _
2025-12-04T12:15:06.3048854Z Traceback (most recent call last):
2025-12-04T12:15:06.3049230Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.3049369Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.3049857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.3050111Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.3050623Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.3050816Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.3051370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.3051524Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.3052050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.3052377Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.3052895Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.3053053Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.3053531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.3053649Z     return self._compile_to_module()
2025-12-04T12:15:06.3054147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.3054308Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.3054834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.3054963Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.3055454Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.3055698Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.3056361Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.3056495Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.3057001Z   File "/tmp/tmp3vcz5kjy/ma/cmaqxfqbp3fmiawf67x57kyjg6syy2ickuczmsulf2afa5dlc5rr.py", line 51, in <module>
2025-12-04T12:15:06.3057504Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.3057630Z     kernel.precompile(
2025-12-04T12:15:06.3058181Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.3058297Z     self._precompile_worker()
2025-12-04T12:15:06.3058903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.3059113Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.3059715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3059912Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3060361Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3060617Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3061089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3061423Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3061659Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3061980Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3062111Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3062249Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3062359Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3062467Z     x0 = xindex
2025-12-04T12:15:06.3062636Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3062757Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3062889Z            ^
2025-12-04T12:15:06.3063283Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3063292Z 
2025-12-04T12:15:06.3064018Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3064024Z 
2025-12-04T12:15:06.3064031Z 
2025-12-04T12:15:06.3064249Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3064872Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T12:15:06.3064886Z 
2025-12-04T12:15:06.3065155Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3065379Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3065496Z frames [('total', 1)]
2025-12-04T12:15:06.3065618Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3066082Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3066319Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3066421Z graph_break []
2025-12-04T12:15:06.3066649Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3066755Z frames [('total', 1)]
2025-12-04T12:15:06.3066872Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3067103Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3067564Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3067667Z graph_break []
2025-12-04T12:15:06.3067895Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3068001Z frames [('total', 1)]
2025-12-04T12:15:06.3068158Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3068387Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3068849Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3068956Z graph_break []
2025-12-04T12:15:06.3069604Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3815c1aa47a06d85.xml -
2025-12-04T12:15:06.3069819Z =========================== short test summary info ============================
2025-12-04T12:15:06.3070597Z FAILED [0.4306s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3070921Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3071321Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3071466Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3071665Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3071766Z     x0 = xindex
2025-12-04T12:15:06.3071937Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3072056Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3072159Z            ^
2025-12-04T12:15:06.3072553Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3072560Z 
2025-12-04T12:15:06.3073280Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3073285Z 
2025-12-04T12:15:06.3073290Z 
2025-12-04T12:15:06.3073507Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3074170Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T12:15:06.3074177Z 
2025-12-04T12:15:06.3074461Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3074641Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.3074854Z ================== 1 failed, 187 deselected, 2 rerun in 4.38s ==================
2025-12-04T12:15:06.3074956Z Got exit code 1
2025-12-04T12:15:06.3075495Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T12:15:06.3075913Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:06.3076380Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-69850f25ab7699fd.xml
2025-12-04T12:15:06.3076557Z ============================= test session starts ==============================
2025-12-04T12:15:06.3076916Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.3077031Z cachedir: .pytest_cache
2025-12-04T12:15:06.3077559Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.3077683Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.3077795Z configfile: pytest.ini
2025-12-04T12:15:06.3078398Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.3078625Z collecting ... collected 188 items / 62 deselected / 126 selected
2025-12-04T12:15:06.3078776Z stepcurrent: skipping 62 already run items.
2025-12-04T12:15:06.3078889Z Running 126 items in this shard
2025-12-04T12:15:06.3078894Z 
2025-12-04T12:15:06.3080058Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.3080835Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3081374Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3081985Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3082483Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3082916Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3083530Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3084086Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3084603Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.3085110Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.3085624Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.3086170Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.3086748Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.3087124Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3088923Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.3089474Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3090341Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3090859Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3091689Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.3092398Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.3093265Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3093826Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3094683Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.3095315Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.3096228Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.3097112Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.3097957Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.3098699Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.3099547Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.3100244Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.3101160Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3101538Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3102213Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.3102573Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3103111Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3104151Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3104794Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3105676Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3106367Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3107241Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3108047Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3108665Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.3109427Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3110013Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3110569Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3111073Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3111504Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3112097Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3112674Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3113088Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.3113923Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3114058Z ('RERUN', {'yellow': True}) [3.4849s] [  0%]
2025-12-04T12:15:06.3115203Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.3115976Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3116518Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3117089Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3117585Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3118011Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3118621Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3119146Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3119660Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.3120173Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.3120678Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.3121222Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.3121825Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.3122199Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3123998Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.3124568Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3125437Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3125987Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3126818Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.3127526Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.3128391Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3128928Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3129791Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.3130421Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.3131307Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.3132123Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.3132966Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.3133672Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.3134515Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.3135209Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.3136124Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3136562Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3137241Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.3137639Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3138187Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3139217Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3139860Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3140787Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3141477Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3142357Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3143163Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3143790Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.3144547Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3145103Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3145662Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3146168Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3146604Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3147200Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3147734Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3148149Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.3149036Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3149219Z ('RERUN', {'yellow': True}) [0.4389s] [  0%]
2025-12-04T12:15:06.3150414Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.3151175Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3151718Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3152314Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3152807Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3153237Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3153839Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3154393Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3154915Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.3155422Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.3155938Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.3156482Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.3157056Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.3157446Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3159256Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.3159808Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3160680Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3161202Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3162042Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.3162754Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.3163622Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3164168Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3165032Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.3165671Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.3166589Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.3167411Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.3168256Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.3168997Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.3169844Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.3170545Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.3171673Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3172059Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3172741Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.3173104Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3173657Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3174698Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3175349Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3176242Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3176995Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3177890Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3178661Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3179352Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.3180111Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3180672Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3181274Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3181783Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3182218Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3182818Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3183404Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3183822Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.3184663Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3184773Z FAILED [0.4426s] [  0%]
2025-12-04T12:15:06.3184780Z 
2025-12-04T12:15:06.3184929Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.3185310Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _
2025-12-04T12:15:06.3185446Z Traceback (most recent call last):
2025-12-04T12:15:06.3185838Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.3185974Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.3186469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.3186738Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.3187254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.3187447Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.3187971Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.3188120Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.3188671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.3188998Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.3189516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.3189680Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.3190158Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.3190294Z     return self._compile_to_module()
2025-12-04T12:15:06.3190778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.3190940Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.3191517Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.3191651Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.3192150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.3192396Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.3192982Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.3193154Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.3193664Z   File "/tmp/tmp80bbx3c6/wi/cwivymddcbyswrvb4lnwapcjxwlo2mbmdks5ttplf6lzedjniysb.py", line 51, in <module>
2025-12-04T12:15:06.3194128Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.3194251Z     kernel.precompile(
2025-12-04T12:15:06.3194809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.3194941Z     self._precompile_worker()
2025-12-04T12:15:06.3195597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.3195781Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.3196390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3196590Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3197037Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3197295Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3197772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3198131Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3198368Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3198689Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3198832Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3198975Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3199096Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3199193Z     x0 = xindex
2025-12-04T12:15:06.3199360Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3199492Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3199585Z            ^
2025-12-04T12:15:06.3199974Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3199980Z 
2025-12-04T12:15:06.3200724Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3200736Z 
2025-12-04T12:15:06.3200741Z 
2025-12-04T12:15:06.3201054Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3201712Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T12:15:06.3201720Z 
2025-12-04T12:15:06.3201988Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3202213Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3202333Z frames [('total', 1)]
2025-12-04T12:15:06.3202451Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3202933Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3203203Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3203309Z graph_break []
2025-12-04T12:15:06.3203652Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _
2025-12-04T12:15:06.3203778Z Traceback (most recent call last):
2025-12-04T12:15:06.3204149Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.3204326Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.3204823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.3205087Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.3205604Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.3205798Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.3206327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.3206506Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.3207055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.3207374Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.3207897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.3208059Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.3208539Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.3208662Z     return self._compile_to_module()
2025-12-04T12:15:06.3209196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.3209364Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.3209896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.3210025Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.3210521Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.3210767Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.3211351Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.3211491Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.3211965Z   File "/tmp/tmp_gicfj3x/mb/cmbbr5crlh2ptxfo7qok6n3up7mmeimuqe4xybef6pb52lexcftt.py", line 51, in <module>
2025-12-04T12:15:06.3212434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.3212559Z     kernel.precompile(
2025-12-04T12:15:06.3213119Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.3213237Z     self._precompile_worker()
2025-12-04T12:15:06.3213845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.3214028Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.3214632Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3214830Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3215281Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3215575Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3216022Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3216437Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3216673Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3217031Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3217170Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3217310Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3217422Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3217532Z     x0 = xindex
2025-12-04T12:15:06.3217702Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3217822Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3217928Z            ^
2025-12-04T12:15:06.3218321Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3218355Z 
2025-12-04T12:15:06.3219079Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3219086Z 
2025-12-04T12:15:06.3219091Z 
2025-12-04T12:15:06.3219312Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3219965Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T12:15:06.3219971Z 
2025-12-04T12:15:06.3220240Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3220463Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3220582Z frames [('total', 1)]
2025-12-04T12:15:06.3220729Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3221202Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3221441Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3221543Z graph_break []
2025-12-04T12:15:06.3221774Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3221883Z frames [('total', 1)]
2025-12-04T12:15:06.3222000Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3222240Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3222703Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3222804Z graph_break []
2025-12-04T12:15:06.3222964Z =================================== FAILURES ===================================
2025-12-04T12:15:06.3223292Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _
2025-12-04T12:15:06.3223435Z Traceback (most recent call last):
2025-12-04T12:15:06.3223813Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.3223942Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.3224448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.3224703Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.3225215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.3225423Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.3225930Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.3226094Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.3226729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.3227053Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.3227585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.3227765Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.3228259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.3228383Z     return self._compile_to_module()
2025-12-04T12:15:06.3228869Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.3229048Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.3229570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.3229734Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.3230244Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.3230473Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.3231072Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.3231201Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.3231678Z   File "/tmp/tmp4u_71oi1/nx/cnxgc4rmxwyznuxvync4cnctmjmzcj4itoxroyo7obcx3jrk2mci.py", line 51, in <module>
2025-12-04T12:15:06.3232155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.3232269Z     kernel.precompile(
2025-12-04T12:15:06.3232866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.3232989Z     self._precompile_worker()
2025-12-04T12:15:06.3233585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.3233778Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.3234377Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3234577Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3235043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3235291Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3235754Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3236093Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3236331Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3236670Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3236800Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3236959Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3237072Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3237171Z     x0 = xindex
2025-12-04T12:15:06.3237355Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3237478Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3237573Z            ^
2025-12-04T12:15:06.3237976Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3237982Z 
2025-12-04T12:15:06.3238731Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3238740Z 
2025-12-04T12:15:06.3238745Z 
2025-12-04T12:15:06.3238980Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3239617Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T12:15:06.3239670Z 
2025-12-04T12:15:06.3239939Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3240178Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3240286Z frames [('total', 1)]
2025-12-04T12:15:06.3240419Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3240891Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3241118Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3241268Z graph_break []
2025-12-04T12:15:06.3241491Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3241597Z frames [('total', 1)]
2025-12-04T12:15:06.3241730Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3241950Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3242430Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3242534Z graph_break []
2025-12-04T12:15:06.3242754Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3242876Z frames [('total', 1)]
2025-12-04T12:15:06.3242996Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3243218Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3243740Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3243846Z graph_break []
2025-12-04T12:15:06.3244509Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-69850f25ab7699fd.xml -
2025-12-04T12:15:06.3244684Z =========================== short test summary info ============================
2025-12-04T12:15:06.3245479Z FAILED [0.4426s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3245815Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3245941Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3246082Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3246205Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3246302Z     x0 = xindex
2025-12-04T12:15:06.3246486Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3246609Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3246698Z            ^
2025-12-04T12:15:06.3247101Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3247108Z 
2025-12-04T12:15:06.3247817Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3247825Z 
2025-12-04T12:15:06.3247829Z 
2025-12-04T12:15:06.3248058Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3248697Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T12:15:06.3248703Z 
2025-12-04T12:15:06.3248971Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3249201Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.3249407Z ================== 1 failed, 62 deselected, 2 rerun in 4.41s ===================
2025-12-04T12:15:06.3249518Z Got exit code 1
2025-12-04T12:15:06.3249628Z Retrying single test...
2025-12-04T12:15:06.3250104Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-da23a1d59c747be6.xml
2025-12-04T12:15:06.3250316Z ============================= test session starts ==============================
2025-12-04T12:15:06.3250671Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.3250780Z cachedir: .pytest_cache
2025-12-04T12:15:06.3251313Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.3251442Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.3251555Z configfile: pytest.ini
2025-12-04T12:15:06.3252161Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.3252416Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:06.3253148Z stepcurrent: skipping 62 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T12:15:06.3253268Z Running 1 items in this shard
2025-12-04T12:15:06.3253273Z 
2025-12-04T12:15:06.3254399Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.3255197Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3255745Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3256415Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3256914Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3257357Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3257954Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3258484Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3259002Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.3259518Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.3260033Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.3260580Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.3261124Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.3261504Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3263350Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.3263930Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3264797Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3265324Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3266157Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.3266909Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.3267766Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3268271Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3269158Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.3269791Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.3270671Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.3271690Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.3272553Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.3273254Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.3274100Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.3274801Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.3275682Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3276061Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3276822Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.3277200Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3277738Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3278828Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3279471Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3280366Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3281113Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3281994Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3282778Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3283388Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.3284191Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3284757Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3285316Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3285893Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3286370Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3286960Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3287508Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3287923Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.3288766Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3288906Z ('RERUN', {'yellow': True}) [3.5066s] [100%]
2025-12-04T12:15:06.3290032Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.3290858Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3291408Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3291986Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3292479Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3292955Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3293546Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3294075Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3294600Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.3295147Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.3295663Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.3296207Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.3296818Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.3297198Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3299048Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.3299603Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3300465Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3300989Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3301832Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.3302563Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.3303427Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3303930Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3305051Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.3305697Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.3306585Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.3307443Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.3308299Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.3309001Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.3309875Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.3310576Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.3311462Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3311839Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3312550Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.3312928Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3313458Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3314501Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3315145Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3316043Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3316732Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3317624Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3318409Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3319020Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.3319816Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3320376Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3320933Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3321471Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3321900Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3322503Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3323037Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3323454Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.3324323Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3324473Z ('RERUN', {'yellow': True}) [0.4415s] [100%]
2025-12-04T12:15:06.3325609Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.3326399Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3326947Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3327524Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3328017Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3328469Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3329068Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3329591Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3330116Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.3330629Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.3331149Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.3331698Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.3332259Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.3332621Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3334457Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.3335042Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3335904Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3336501Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3337339Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.3338103Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.3338956Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3339463Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3340370Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.3341016Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.3341908Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.3342731Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.3343592Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.3344292Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.3345145Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.3345829Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.3346713Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3347090Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3347814Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.3348191Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3348726Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3349764Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3350441Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3351341Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3352037Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3352950Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3353737Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3354355Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.3355147Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3355703Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3356268Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3356781Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3357213Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3357817Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3358344Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3358757Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.3359592Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3359700Z FAILED [0.4396s] [100%]
2025-12-04T12:15:06.3359707Z 
2025-12-04T12:15:06.3359869Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.3360197Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _
2025-12-04T12:15:06.3360324Z Traceback (most recent call last):
2025-12-04T12:15:06.3360711Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.3360840Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.3361369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.3361636Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.3362151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.3362357Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.3362905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.3363055Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.3363605Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.3363925Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.3364461Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.3364612Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.3365129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.3365269Z     return self._compile_to_module()
2025-12-04T12:15:06.3365751Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.3365919Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.3366446Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.3366578Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.3367086Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.3367352Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.3367945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.3368089Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.3368588Z   File "/tmp/tmpei49bwpg/td/ctdmtmrtmfcu6i67yfe6smnv2h6pms7leuoa24eqbt25ywc6tajy.py", line 51, in <module>
2025-12-04T12:15:06.3369069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.3369185Z     kernel.precompile(
2025-12-04T12:15:06.3369742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.3369875Z     self._precompile_worker()
2025-12-04T12:15:06.3370476Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.3370659Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.3371456Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3371660Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3372127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3372378Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3372821Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3373173Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3373407Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3373828Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3373958Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3374104Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3374233Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3374333Z     x0 = xindex
2025-12-04T12:15:06.3374505Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3374645Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3374784Z            ^
2025-12-04T12:15:06.3375172Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3375193Z 
2025-12-04T12:15:06.3375909Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3375916Z 
2025-12-04T12:15:06.3375922Z 
2025-12-04T12:15:06.3376139Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3376860Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T12:15:06.3376917Z 
2025-12-04T12:15:06.3377191Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3377431Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3377541Z frames [('total', 1)]
2025-12-04T12:15:06.3377659Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3378141Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3378366Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3378467Z graph_break []
2025-12-04T12:15:06.3378809Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _
2025-12-04T12:15:06.3378935Z Traceback (most recent call last):
2025-12-04T12:15:06.3379368Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.3379501Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.3379991Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.3380254Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.3380774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.3380968Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.3381489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.3381635Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.3382189Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.3382509Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.3383032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.3383196Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.3383676Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.3383816Z     return self._compile_to_module()
2025-12-04T12:15:06.3384304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.3384469Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.3385001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.3385172Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.3385671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.3385918Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.3386503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.3386675Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.3387152Z   File "/tmp/tmps3_nni3i/ey/ceyh6hqunnyf3nragc52kdfbngbubqol4izvzo6wt7jyh543kmdz.py", line 51, in <module>
2025-12-04T12:15:06.3387616Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.3387740Z     kernel.precompile(
2025-12-04T12:15:06.3388295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.3388428Z     self._precompile_worker()
2025-12-04T12:15:06.3389024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.3389258Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.3389867Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3390069Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3390528Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3390774Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3391216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3391596Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3391830Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3392152Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3392291Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3392432Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3392557Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3392654Z     x0 = xindex
2025-12-04T12:15:06.3392825Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3392960Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3393053Z            ^
2025-12-04T12:15:06.3393444Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3393450Z 
2025-12-04T12:15:06.3394185Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3394192Z 
2025-12-04T12:15:06.3394196Z 
2025-12-04T12:15:06.3394413Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3395066Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T12:15:06.3395072Z 
2025-12-04T12:15:06.3395345Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3395567Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3395687Z frames [('total', 1)]
2025-12-04T12:15:06.3395804Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3396282Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3396504Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3396607Z graph_break []
2025-12-04T12:15:06.3396871Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3396980Z frames [('total', 1)]
2025-12-04T12:15:06.3397095Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3397326Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3397788Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3397935Z graph_break []
2025-12-04T12:15:06.3398084Z =================================== FAILURES ===================================
2025-12-04T12:15:06.3398408Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _
2025-12-04T12:15:06.3398543Z Traceback (most recent call last):
2025-12-04T12:15:06.3398917Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.3399047Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.3399550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.3399835Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.3400359Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.3400557Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.3401066Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.3401227Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.3401762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.3402098Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.3402660Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.3402815Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.3403311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.3403436Z     return self._compile_to_module()
2025-12-04T12:15:06.3403928Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.3404110Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.3404629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.3404774Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.3405276Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.3405511Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.3406115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.3406246Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.3406761Z   File "/tmp/tmpb81ohzt6/ej/cejcyxr2mvh6skru7zv4rh45gmze7p7vd3efomayfjzvtgszzv5v.py", line 51, in <module>
2025-12-04T12:15:06.3407229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.3407345Z     kernel.precompile(
2025-12-04T12:15:06.3407918Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.3408040Z     self._precompile_worker()
2025-12-04T12:15:06.3408679Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.3408873Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.3409469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3409685Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3410174Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3410423Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3410884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3411223Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3411469Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3411797Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3411955Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3412108Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3412219Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3412315Z     x0 = xindex
2025-12-04T12:15:06.3412497Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3412621Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3412714Z            ^
2025-12-04T12:15:06.3413117Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3413122Z 
2025-12-04T12:15:06.3413834Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3413839Z 
2025-12-04T12:15:06.3413844Z 
2025-12-04T12:15:06.3414110Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3414754Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T12:15:06.3414762Z 
2025-12-04T12:15:06.3415047Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3415272Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3415384Z frames [('total', 1)]
2025-12-04T12:15:06.3415518Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3415985Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3416208Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3416402Z graph_break []
2025-12-04T12:15:06.3416626Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3416746Z frames [('total', 1)]
2025-12-04T12:15:06.3416865Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3417081Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3417559Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3417659Z graph_break []
2025-12-04T12:15:06.3417875Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3417996Z frames [('total', 1)]
2025-12-04T12:15:06.3418112Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3418346Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3418811Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3418911Z graph_break []
2025-12-04T12:15:06.3419624Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-da23a1d59c747be6.xml -
2025-12-04T12:15:06.3419803Z =========================== short test summary info ============================
2025-12-04T12:15:06.3420605Z FAILED [0.4396s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3420940Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3421112Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3421263Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3421373Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3421468Z     x0 = xindex
2025-12-04T12:15:06.3421654Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3421775Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3421866Z            ^
2025-12-04T12:15:06.3422276Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3422282Z 
2025-12-04T12:15:06.3422994Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3423029Z 
2025-12-04T12:15:06.3423034Z 
2025-12-04T12:15:06.3423270Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3423908Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T12:15:06.3423914Z 
2025-12-04T12:15:06.3424197Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3424380Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.3424583Z ================== 1 failed, 187 deselected, 2 rerun in 4.43s ==================
2025-12-04T12:15:06.3424697Z Got exit code 1
2025-12-04T12:15:06.3424918Z Retrying single test...
2025-12-04T12:15:06.3425391Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-36993cd4956a89fe.xml
2025-12-04T12:15:06.3425572Z ============================= test session starts ==============================
2025-12-04T12:15:06.3425924Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.3426050Z cachedir: .pytest_cache
2025-12-04T12:15:06.3426572Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.3426697Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.3426821Z configfile: pytest.ini
2025-12-04T12:15:06.3427410Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.3427636Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:06.3428369Z stepcurrent: skipping 62 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T12:15:06.3428492Z Running 1 items in this shard
2025-12-04T12:15:06.3428497Z 
2025-12-04T12:15:06.3429627Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.3430395Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3430953Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3431549Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3432044Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3432492Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3433138Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3433675Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3434180Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.3434691Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.3435207Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.3435783Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.3436341Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.3436708Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3438561Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.3439098Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3439962Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3440489Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3441324Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.3442053Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.3442906Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3443424Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3444267Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.3444901Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.3445812Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.3446630Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.3447519Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.3448213Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.3449071Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.3449752Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.3450664Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3451040Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3451720Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.3452093Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3452654Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3453708Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3454333Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3455225Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3455916Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3456891Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3457684Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3458299Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.3459072Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3459618Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3460216Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3460724Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3461157Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3461791Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3462314Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3462730Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.3463576Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3463742Z ('RERUN', {'yellow': True}) [3.4997s] [100%]
2025-12-04T12:15:06.3464864Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.3465625Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3466179Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3466768Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3467261Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3467708Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3468305Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3468845Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3469353Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.3469869Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.3470390Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.3471105Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.3471665Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.3472032Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3473909Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.3474450Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3475313Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3475875Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3476709Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.3477433Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.3478326Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3478845Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3479688Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.3480317Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.3481242Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.3482065Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.3482924Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.3483619Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.3484482Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.3485163Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.3486059Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3486423Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3487103Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.3487474Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3488073Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3489131Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3489758Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3490671Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3491374Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3492265Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3493082Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3493696Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.3494472Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3495016Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3495607Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3496122Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3496618Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3497234Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3497759Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3498195Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.3499024Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3499164Z ('RERUN', {'yellow': True}) [0.4324s] [100%]
2025-12-04T12:15:06.3500302Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.3501063Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3501618Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3502223Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3502718Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3503166Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3503760Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3504329Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3504839Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.3505355Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.3505879Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.3506457Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.3507022Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.3507387Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3509244Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.3509784Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3510662Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3511170Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3512006Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.3512729Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.3513582Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3514104Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3514946Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.3515593Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.3516494Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.3517311Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.3518172Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.3518906Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.3519764Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.3520447Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.3521377Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3521740Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3522418Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.3522788Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3523351Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3524404Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3525037Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3525940Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3526617Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3527510Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3528289Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3528903Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.3529672Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3530214Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3530826Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3531323Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3531753Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3532389Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3532913Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3533345Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.3534169Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3534308Z FAILED [0.4307s] [100%]
2025-12-04T12:15:06.3534315Z 
2025-12-04T12:15:06.3534476Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.3534801Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _
2025-12-04T12:15:06.3534942Z Traceback (most recent call last):
2025-12-04T12:15:06.3535318Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.3535447Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.3535951Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.3536203Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.3536839Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.3537053Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.3537567Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.3537727Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.3538261Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.3538583Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.3539116Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.3539265Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.3539760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.3539886Z     return self._compile_to_module()
2025-12-04T12:15:06.3540374Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.3540554Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.3541072Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.3541205Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.3541720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.3541953Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.3542551Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.3542682Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.3543207Z   File "/tmp/tmpi56d1yz9/pl/cpljnzs7kv6cv43347rm2j3rcqe4fqv6i4pzmm4f2gftqkzl35sj.py", line 51, in <module>
2025-12-04T12:15:06.3543692Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.3543805Z     kernel.precompile(
2025-12-04T12:15:06.3544372Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.3544524Z     self._precompile_worker()
2025-12-04T12:15:06.3545124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.3545323Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.3545916Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3546121Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3546584Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3546864Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3547322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3547662Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3547897Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3548231Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3548359Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3548499Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3548621Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3548746Z     x0 = xindex
2025-12-04T12:15:06.3548932Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3549054Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3549148Z            ^
2025-12-04T12:15:06.3549546Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3549552Z 
2025-12-04T12:15:06.3550274Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3550283Z 
2025-12-04T12:15:06.3550287Z 
2025-12-04T12:15:06.3550517Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3551158Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T12:15:06.3551164Z 
2025-12-04T12:15:06.3551434Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3551672Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3551780Z frames [('total', 1)]
2025-12-04T12:15:06.3551912Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3552375Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3552596Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3552715Z graph_break []
2025-12-04T12:15:06.3553040Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _
2025-12-04T12:15:06.3553164Z Traceback (most recent call last):
2025-12-04T12:15:06.3553551Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.3553680Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.3554188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.3554468Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.3554984Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.3555192Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.3555704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.3555880Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.3556428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.3556749Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.3557286Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.3557438Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.3557920Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.3558090Z     return self._compile_to_module()
2025-12-04T12:15:06.3558577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.3558760Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.3559275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.3559407Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.3559922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.3560154Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.3560770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.3560915Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.3561411Z   File "/tmp/tmp1fmpx0jd/ou/couh4d6bv2nmxvvtt5hffvzref3f5ysgl5sitr3f63rm4wyjo3i5.py", line 51, in <module>
2025-12-04T12:15:06.3561886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.3562003Z     kernel.precompile(
2025-12-04T12:15:06.3562558Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.3562692Z     self._precompile_worker()
2025-12-04T12:15:06.3563291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.3563489Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.3564089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3564294Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3564757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3565005Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3565448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3565798Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3566030Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3566369Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3566526Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3566669Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3566801Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3566897Z     x0 = xindex
2025-12-04T12:15:06.3567070Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3567204Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3567296Z            ^
2025-12-04T12:15:06.3567726Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3567732Z 
2025-12-04T12:15:06.3568450Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3568456Z 
2025-12-04T12:15:06.3568461Z 
2025-12-04T12:15:06.3568680Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3569341Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T12:15:06.3569347Z 
2025-12-04T12:15:06.3569650Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3569882Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3569988Z frames [('total', 1)]
2025-12-04T12:15:06.3570108Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3570592Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3570815Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3570928Z graph_break []
2025-12-04T12:15:06.3571345Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3571451Z frames [('total', 1)]
2025-12-04T12:15:06.3571584Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3571878Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3572346Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3572466Z graph_break []
2025-12-04T12:15:06.3572617Z =================================== FAILURES ===================================
2025-12-04T12:15:06.3572956Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _
2025-12-04T12:15:06.3573086Z Traceback (most recent call last):
2025-12-04T12:15:06.3573463Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.3573607Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.3574100Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.3574353Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.3574889Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.3575088Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.3575615Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.3575765Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.3576361Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.3576705Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.3577233Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.3577398Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.3578125Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.3578252Z     return self._compile_to_module()
2025-12-04T12:15:06.3578759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.3578926Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.3579445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.3579644Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.3580140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.3580391Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.3580978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.3581113Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.3581634Z   File "/tmp/tmpk0y73hyi/4w/c4wnjsxmagfq62yledwacpfirhrnhebcisqz5akbiqfawjcrs4yk.py", line 51, in <module>
2025-12-04T12:15:06.3582159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.3582274Z     kernel.precompile(
2025-12-04T12:15:06.3582846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.3582968Z     self._precompile_worker()
2025-12-04T12:15:06.3583584Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.3583767Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.3584399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3584616Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3585070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3585337Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3585781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3586120Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3586367Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3586692Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3586816Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3586965Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3587077Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3587188Z     x0 = xindex
2025-12-04T12:15:06.3587357Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3587480Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3587589Z            ^
2025-12-04T12:15:06.3587976Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3587981Z 
2025-12-04T12:15:06.3588695Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3588715Z 
2025-12-04T12:15:06.3588719Z 
2025-12-04T12:15:06.3588937Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3589581Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T12:15:06.3589588Z 
2025-12-04T12:15:06.3589907Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3590132Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3590250Z frames [('total', 1)]
2025-12-04T12:15:06.3590366Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3590833Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3591651Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3591753Z graph_break []
2025-12-04T12:15:06.3591973Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3592094Z frames [('total', 1)]
2025-12-04T12:15:06.3592213Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3592430Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3592908Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3593013Z graph_break []
2025-12-04T12:15:06.3593244Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3593386Z frames [('total', 1)]
2025-12-04T12:15:06.3593502Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3593735Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3594193Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3594296Z graph_break []
2025-12-04T12:15:06.3594957Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-36993cd4956a89fe.xml -
2025-12-04T12:15:06.3595132Z =========================== short test summary info ============================
2025-12-04T12:15:06.3595971Z FAILED [0.4307s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3596294Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3596423Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3596575Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3596683Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3596793Z     x0 = xindex
2025-12-04T12:15:06.3596966Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T12:15:06.3597085Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3597192Z            ^
2025-12-04T12:15:06.3597581Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3597587Z 
2025-12-04T12:15:06.3598299Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3598307Z 
2025-12-04T12:15:06.3598326Z 
2025-12-04T12:15:06.3598547Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3599187Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T12:15:06.3599192Z 
2025-12-04T12:15:06.3599472Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3599656Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.3599859Z ================== 1 failed, 187 deselected, 2 rerun in 4.41s ==================
2025-12-04T12:15:06.3599973Z Got exit code 1
2025-12-04T12:15:06.3600526Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T12:15:06.3600951Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:06.3601463Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-78153e5fcd212bc6.xml
2025-12-04T12:15:06.3601635Z ============================= test session starts ==============================
2025-12-04T12:15:06.3602004Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.3602118Z cachedir: .pytest_cache
2025-12-04T12:15:06.3602683Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.3602810Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.3602921Z configfile: pytest.ini
2025-12-04T12:15:06.3603523Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.3603752Z collecting ... collected 188 items / 63 deselected / 125 selected
2025-12-04T12:15:06.3603899Z stepcurrent: skipping 63 already run items.
2025-12-04T12:15:06.3604031Z Running 125 items in this shard
2025-12-04T12:15:06.3604036Z 
2025-12-04T12:15:06.3605163Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.3605942Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3606490Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3607069Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3607598Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3608031Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3608592Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3609120Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3609646Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.3610157Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.3610665Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.3611225Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.3611774Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.3612149Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3613957Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.3614539Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3615412Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3615951Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3616879Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.3617592Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.3618463Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3619015Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3619876Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.3620514Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.3621410Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.3622242Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.3623082Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.3623793Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.3624640Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.3625338Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.3626217Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3626582Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3627279Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.3627639Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3628185Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3629264Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3629906Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3630800Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3631523Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3632427Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3633197Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3633929Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.3634685Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3635246Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3635808Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3636336Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3636790Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3637334Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3637876Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3638293Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.3639117Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3639272Z ('RERUN', {'yellow': True}) [3.4739s] [  0%]
2025-12-04T12:15:06.3640360Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.3641144Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3641690Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3642266Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3642760Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3643229Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3643788Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3644312Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3644867Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.3645379Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.3645883Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.3646449Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.3647027Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.3647473Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3649327Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.3649930Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3650802Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3651309Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3652161Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.3652870Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.3653742Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3654252Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3655110Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.3655758Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.3656695Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.3657572Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.3658420Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.3659168Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.3660010Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.3660714Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.3661599Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3662000Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3662693Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.3663061Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3663608Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3664685Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3665338Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3666234Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3666914Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3667815Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3668596Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3669234Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.3669995Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3670559Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3671298Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3671873Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3672324Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3672869Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3673407Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3673872Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.3674696Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3674851Z ('RERUN', {'yellow': True}) [0.4403s] [  0%]
2025-12-04T12:15:06.3675939Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.3676755Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3677299Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3677873Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3678368Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3678847Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3679406Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3679931Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3680454Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.3680964Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.3681468Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.3682036Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.3682579Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.3682952Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3684746Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.3685355Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3686225Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3686736Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3687613Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.3688327Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.3689194Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3689699Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3690585Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.3691219Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.3692086Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.3692948Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.3693795Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.3694499Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.3695344Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.3696045Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.3697001Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3697370Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3698061Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.3698422Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3698965Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3700077Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3700721Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3701620Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3702330Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3703223Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3703997Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3704667Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.3705426Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3705991Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3706550Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3707083Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3707531Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3708074Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3708610Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3709029Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.3709851Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3709971Z FAILED [0.4341s] [  0%]
2025-12-04T12:15:06.3709977Z 
2025-12-04T12:15:06.3710127Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.3710458Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _
2025-12-04T12:15:06.3710587Z Traceback (most recent call last):
2025-12-04T12:15:06.3710960Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.3711102Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.3711589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.3711854Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.3712366Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.3712562Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.3713086Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.3713271Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.3713807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.3714141Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.3714663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.3714861Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.3715344Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.3715468Z     return self._compile_to_module()
2025-12-04T12:15:06.3715966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.3716133Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.3716664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.3716831Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.3717325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.3717572Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.3718155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.3718282Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.3718793Z   File "/tmp/tmp8ydd984h/5u/c5uv3ao4fcezwyqjnn4pj3wcxvmr7ou5xmuxgqjmmc7p3pbqvpwc.py", line 51, in <module>
2025-12-04T12:15:06.3719288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.3719419Z     kernel.precompile(
2025-12-04T12:15:06.3719971Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.3720094Z     self._precompile_worker()
2025-12-04T12:15:06.3720702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.3720885Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.3721488Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3721691Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3722145Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3722403Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3722849Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3723185Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3723430Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3723747Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3723888Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3724029Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3724140Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3724250Z     x0 = xindex
2025-12-04T12:15:06.3724375Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3724496Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3724603Z            ^
2025-12-04T12:15:06.3724994Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3725043Z 
2025-12-04T12:15:06.3725770Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3725779Z 
2025-12-04T12:15:06.3725784Z 
2025-12-04T12:15:06.3726002Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3726662Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T12:15:06.3726681Z 
2025-12-04T12:15:06.3726951Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3727178Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3727297Z frames [('total', 1)]
2025-12-04T12:15:06.3727415Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3727884Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3728120Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3728257Z graph_break []
2025-12-04T12:15:06.3728570Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _
2025-12-04T12:15:06.3728707Z Traceback (most recent call last):
2025-12-04T12:15:06.3729083Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.3729225Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.3729717Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.3729964Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.3730488Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.3730733Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.3731255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.3731406Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.3731940Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.3732276Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.3732793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.3732949Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.3733442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.3733568Z     return self._compile_to_module()
2025-12-04T12:15:06.3734070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.3734240Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.3734755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.3734903Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.3735403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.3735652Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.3736240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.3736448Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.3737013Z   File "/tmp/tmphn5b1r1s/vx/cvxtly3y3idm74gvnjkxho3zm6igvoxdg74xq27ji7sczms6fsqj.py", line 51, in <module>
2025-12-04T12:15:06.3737479Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.3737594Z     kernel.precompile(
2025-12-04T12:15:06.3738163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.3738313Z     self._precompile_worker()
2025-12-04T12:15:06.3738922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.3739102Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.3739698Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3745263Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3745831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3746081Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3746649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3746995Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3747247Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3747569Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3747697Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3747851Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3747964Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3748057Z     x0 = xindex
2025-12-04T12:15:06.3748196Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3748356Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3748465Z            ^
2025-12-04T12:15:06.3748857Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3748867Z 
2025-12-04T12:15:06.3749582Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3749591Z 
2025-12-04T12:15:06.3749596Z 
2025-12-04T12:15:06.3749828Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3750450Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T12:15:06.3750457Z 
2025-12-04T12:15:06.3750741Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3750970Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3751079Z frames [('total', 1)]
2025-12-04T12:15:06.3751215Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3751686Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3751925Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3752026Z graph_break []
2025-12-04T12:15:06.3752250Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3752369Z frames [('total', 1)]
2025-12-04T12:15:06.3752486Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3752706Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3753180Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3753280Z graph_break []
2025-12-04T12:15:06.3753433Z =================================== FAILURES ===================================
2025-12-04T12:15:06.3753801Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _
2025-12-04T12:15:06.3753931Z Traceback (most recent call last):
2025-12-04T12:15:06.3754318Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.3754447Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.3754940Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.3755242Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.3755757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.3755963Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.3756480Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.3756633Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.3757178Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.3757540Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.3758055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.3758221Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.3758702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.3758837Z     return self._compile_to_module()
2025-12-04T12:15:06.3759321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.3759515Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.3760054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.3760185Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.3760693Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.3760923Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.3761507Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.3761643Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.3762118Z   File "/tmp/tmp0_xwpdro/v2/cv2vr5qulpwknx2x7xlkexi6uhlcg3xeczvkdxgb3gweew2ujjyt.py", line 51, in <module>
2025-12-04T12:15:06.3762583Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.3762717Z     kernel.precompile(
2025-12-04T12:15:06.3763267Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.3763394Z     self._precompile_worker()
2025-12-04T12:15:06.3763989Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.3764170Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.3764779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3764976Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3765437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3765686Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3766169Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3766517Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3766752Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3767075Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3767244Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3767387Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3767511Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3767608Z     x0 = xindex
2025-12-04T12:15:06.3767733Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3767861Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3767953Z            ^
2025-12-04T12:15:06.3768346Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3768355Z 
2025-12-04T12:15:06.3769076Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3769113Z 
2025-12-04T12:15:06.3769118Z 
2025-12-04T12:15:06.3769336Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3769971Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T12:15:06.3769977Z 
2025-12-04T12:15:06.3770245Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3770478Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3770580Z frames [('total', 1)]
2025-12-04T12:15:06.3770694Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3771474Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3771700Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3771804Z graph_break []
2025-12-04T12:15:06.3772037Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3772141Z frames [('total', 1)]
2025-12-04T12:15:06.3772258Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3772496Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3772962Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3773079Z graph_break []
2025-12-04T12:15:06.3773297Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3773403Z frames [('total', 1)]
2025-12-04T12:15:06.3773533Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3773758Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3774216Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3774332Z graph_break []
2025-12-04T12:15:06.3775064Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-78153e5fcd212bc6.xml -
2025-12-04T12:15:06.3775286Z =========================== short test summary info ============================
2025-12-04T12:15:06.3776060Z FAILED [0.4341s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3776453Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3776596Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3776738Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3776866Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3777044Z     x0 = xindex
2025-12-04T12:15:06.3777168Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3777300Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3777393Z            ^
2025-12-04T12:15:06.3777783Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3777788Z 
2025-12-04T12:15:06.3778558Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3778565Z 
2025-12-04T12:15:06.3778570Z 
2025-12-04T12:15:06.3778787Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3779419Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T12:15:06.3779426Z 
2025-12-04T12:15:06.3779702Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3779885Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.3780167Z ================== 1 failed, 63 deselected, 2 rerun in 4.39s ===================
2025-12-04T12:15:06.3780270Z Got exit code 1
2025-12-04T12:15:06.3780393Z Retrying single test...
2025-12-04T12:15:06.3780866Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-04b538cf09549803.xml
2025-12-04T12:15:06.3781038Z ============================= test session starts ==============================
2025-12-04T12:15:06.3781404Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.3781516Z cachedir: .pytest_cache
2025-12-04T12:15:06.3782036Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.3782207Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.3782322Z configfile: pytest.ini
2025-12-04T12:15:06.3782924Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.3783150Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:06.3783853Z stepcurrent: skipping 63 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T12:15:06.3783981Z Running 1 items in this shard
2025-12-04T12:15:06.3783987Z 
2025-12-04T12:15:06.3785075Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.3785850Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3786394Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3786954Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3787458Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3787894Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3788448Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3789002Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3789523Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.3790028Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.3790531Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.3791140Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.3791684Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.3792057Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3793857Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.3794441Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3795306Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3795844Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3796690Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.3797396Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.3798262Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3798761Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3799618Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.3800255Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.3801129Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.3801961Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.3802804Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.3803549Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.3804394Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.3805127Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.3806010Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3806373Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3807075Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.3807469Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3808014Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3809055Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3809688Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3810607Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3811290Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3812178Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3812947Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3813568Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.3814328Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3814884Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3815441Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3815936Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3816448Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3816989Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3817574Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3817995Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.3818813Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3818997Z ('RERUN', {'yellow': True}) [3.4962s] [100%]
2025-12-04T12:15:06.3820079Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.3820852Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3821396Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3821986Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3822498Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3822934Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3823492Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3824013Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3824566Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.3825081Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.3825586Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.3826148Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.3826692Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.3827068Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3828870Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.3829419Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3830285Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3830792Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3831671Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.3832384Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.3833298Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3833804Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3834663Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.3835303Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.3836201Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.3837031Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.3837878Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.3838622Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.3839467Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.3840162Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.3841040Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3841402Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3842098Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.3842459Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3843004Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3844042Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3844683Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3845610Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3846287Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3847178Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3848054Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3848677Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.3849442Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3849998Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3850601Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3851100Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3851546Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3852082Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3852649Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3853067Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.3853890Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3854041Z ('RERUN', {'yellow': True}) [0.4248s] [100%]
2025-12-04T12:15:06.3855135Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.3855903Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3856550Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3857114Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3857626Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3858058Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3858609Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3859130Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3859693Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.3860204Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.3860707Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.3861302Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.3861846Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.3862219Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3864024Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.3864603Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3865471Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3865976Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3866854Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.3867567Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.3868431Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3868939Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3869791Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.3870427Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.3871482Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.3872316Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.3873160Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.3873947Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.3874791Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.3875490Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.3876424Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3876783Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3877481Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.3877842Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3878429Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3879471Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3880115Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3881004Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3881728Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3882624Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3883397Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3884020Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.3884782Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3885334Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3885894Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3886389Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3886839Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3887378Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3887915Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3888381Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.3889207Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3889326Z FAILED [0.4280s] [100%]
2025-12-04T12:15:06.3889362Z 
2025-12-04T12:15:06.3889511Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.3889838Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _
2025-12-04T12:15:06.3889966Z Traceback (most recent call last):
2025-12-04T12:15:06.3890336Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.3890478Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.3890975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.3891226Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.3891784Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.3891979Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.3892503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.3892653Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.3893188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.3893523Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.3894076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.3894238Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.3894725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.3894848Z     return self._compile_to_module()
2025-12-04T12:15:06.3895342Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.3895509Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.3896030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.3896174Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.3896731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.3896982Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.3897568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.3897699Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.3898220Z   File "/tmp/tmpazmxrjke/mg/cmgd3spxrrgsie7ojrcnagmaoswnkj4hm7jd2ehgjvt343c6nvvt.py", line 51, in <module>
2025-12-04T12:15:06.3898680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.3898814Z     kernel.precompile(
2025-12-04T12:15:06.3899373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.3899490Z     self._precompile_worker()
2025-12-04T12:15:06.3900100Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.3900283Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.3900919Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3901133Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3901585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3901878Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3902319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3902653Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3902895Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3903214Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3903360Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3903500Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3903641Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3903750Z     x0 = xindex
2025-12-04T12:15:06.3903873Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3903992Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3904102Z            ^
2025-12-04T12:15:06.3904492Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3904499Z 
2025-12-04T12:15:06.3905226Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3905232Z 
2025-12-04T12:15:06.3905237Z 
2025-12-04T12:15:06.3905456Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3906125Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T12:15:06.3906133Z 
2025-12-04T12:15:06.3906423Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3906648Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3906768Z frames [('total', 1)]
2025-12-04T12:15:06.3906885Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3907358Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3907594Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3907696Z graph_break []
2025-12-04T12:15:06.3908009Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _
2025-12-04T12:15:06.3908144Z Traceback (most recent call last):
2025-12-04T12:15:06.3908521Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.3908664Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.3909152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.3909403Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.3909927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.3910122Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.3910629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.3910786Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.3911317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.3911682Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.3912203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.3912354Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.3912848Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.3913003Z     return self._compile_to_module()
2025-12-04T12:15:06.3913495Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.3913659Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.3914186Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.3914328Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.3914828Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.3915075Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.3915693Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.3915825Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.3916335Z   File "/tmp/tmprua1fz53/cc/ccce4o7a6hxb477ydo4e2rhu72dcnnp3fbfszb2hh4c3kyeqpjt3.py", line 51, in <module>
2025-12-04T12:15:06.3916800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.3916914Z     kernel.precompile(
2025-12-04T12:15:06.3917482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.3917604Z     self._precompile_worker()
2025-12-04T12:15:06.3918244Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.3918432Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.3919026Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3919242Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3919697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3919960Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3920406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3920744Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3920995Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3921319Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3921448Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3921607Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3921720Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3921832Z     x0 = xindex
2025-12-04T12:15:06.3921962Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3922085Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3922195Z            ^
2025-12-04T12:15:06.3922584Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3922590Z 
2025-12-04T12:15:06.3923302Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3923309Z 
2025-12-04T12:15:06.3923331Z 
2025-12-04T12:15:06.3923581Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3924208Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T12:15:06.3924216Z 
2025-12-04T12:15:06.3924502Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3924759Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3924871Z frames [('total', 1)]
2025-12-04T12:15:06.3925003Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3925472Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3925716Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3925819Z graph_break []
2025-12-04T12:15:06.3926044Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3926168Z frames [('total', 1)]
2025-12-04T12:15:06.3926282Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3926534Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3927013Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3927147Z graph_break []
2025-12-04T12:15:06.3927312Z =================================== FAILURES ===================================
2025-12-04T12:15:06.3927625Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _
2025-12-04T12:15:06.3927748Z Traceback (most recent call last):
2025-12-04T12:15:06.3928136Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.3928263Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.3928804Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.3929068Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.3929586Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.3929795Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.3930305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.3930454Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.3931000Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.3931320Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.3931855Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.3932008Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.3932489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.3932627Z     return self._compile_to_module()
2025-12-04T12:15:06.3933113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.3933280Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.3933810Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.3933937Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.3934448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.3934682Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.3935321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.3935467Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.3935965Z   File "/tmp/tmpvv7ocoss/2c/c2c2lga7tiaygn4ikuv35kyz2khvqjwp443pfkaqsspj6zi4d3yo.py", line 51, in <module>
2025-12-04T12:15:06.3936519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.3936679Z     kernel.precompile(
2025-12-04T12:15:06.3937236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.3937368Z     self._precompile_worker()
2025-12-04T12:15:06.3937963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.3938147Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.3938756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3938987Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3939449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3939701Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.3940143Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3940492Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3940725Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3941058Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3941215Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3941357Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3941483Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3941579Z     x0 = xindex
2025-12-04T12:15:06.3941703Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3941835Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3941928Z            ^
2025-12-04T12:15:06.3942315Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3942321Z 
2025-12-04T12:15:06.3943045Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3943051Z 
2025-12-04T12:15:06.3943056Z 
2025-12-04T12:15:06.3943271Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3943913Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T12:15:06.3943919Z 
2025-12-04T12:15:06.3944190Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3944425Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3944530Z frames [('total', 1)]
2025-12-04T12:15:06.3944648Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3945128Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3945353Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3945454Z graph_break []
2025-12-04T12:15:06.3945687Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3945790Z frames [('total', 1)]
2025-12-04T12:15:06.3945918Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3946174Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3946635Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3946750Z graph_break []
2025-12-04T12:15:06.3946965Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.3947071Z frames [('total', 1)]
2025-12-04T12:15:06.3947202Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.3947454Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.3947911Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.3948023Z graph_break []
2025-12-04T12:15:06.3948671Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-04b538cf09549803.xml -
2025-12-04T12:15:06.3948862Z =========================== short test summary info ============================
2025-12-04T12:15:06.3949632Z FAILED [0.4280s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.3949993Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3950132Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3950275Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3950398Z     xmask = xindex < xnumel
2025-12-04T12:15:06.3950493Z     x0 = xindex
2025-12-04T12:15:06.3950616Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3950748Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3950839Z            ^
2025-12-04T12:15:06.3951226Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3951232Z 
2025-12-04T12:15:06.3951987Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.3951996Z 
2025-12-04T12:15:06.3952001Z 
2025-12-04T12:15:06.3952219Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.3952855Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T12:15:06.3952863Z 
2025-12-04T12:15:06.3953130Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.3953328Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.3953532Z ================== 1 failed, 187 deselected, 2 rerun in 4.39s ==================
2025-12-04T12:15:06.3953635Z Got exit code 1
2025-12-04T12:15:06.3953759Z Retrying single test...
2025-12-04T12:15:06.3954235Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d91f9f6b0d5ec125.xml
2025-12-04T12:15:06.3954402Z ============================= test session starts ==============================
2025-12-04T12:15:06.3954771Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.3954882Z cachedir: .pytest_cache
2025-12-04T12:15:06.3955416Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.3955543Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.3955652Z configfile: pytest.ini
2025-12-04T12:15:06.3956257Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.3956481Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:06.3957217Z stepcurrent: skipping 63 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T12:15:06.3957347Z Running 1 items in this shard
2025-12-04T12:15:06.3957352Z 
2025-12-04T12:15:06.3958441Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.3959245Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3959791Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3960364Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3960862Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3961326Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3961880Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3962408Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3962930Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.3963440Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.3963971Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.3964529Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.3965077Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.3965452Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3967297Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.3967981Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3968851Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3969359Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3970199Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.3970909Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.3972028Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.3972542Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.3973399Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.3974082Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.3974957Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.3975793Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.3976755Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.3977467Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.3978310Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.3979061Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.3979950Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3980316Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3981012Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.3981371Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.3981916Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.3982959Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.3983607Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.3987074Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.3987758Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
﻿2025-12-04T12:15:06.3990541Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.3991337Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.3991951Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.3992748Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3993302Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.3993883Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.3994381Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.3994827Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.3995375Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.3995911Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.3996330Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.3997150Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.3997346Z ('RERUN', {'yellow': True}) [3.4692s] [100%]
2025-12-04T12:15:06.3998440Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.3999209Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.3999754Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4000330Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4000830Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4001263Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4001820Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4002344Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4002935Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.4003445Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.4003950Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.4004586Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.4005135Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.4005513Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4007318Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.4007875Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4008739Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4009247Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4010092Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.4010803Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.4011714Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4012220Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4013082Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.4013725Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.4014593Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.4015429Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.4016274Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.4017155Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.4017999Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.4018782Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.4019682Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4020049Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4020749Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.4021107Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4021660Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4022727Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4023481Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4024381Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4025064Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4025957Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4026800Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4027431Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.4028194Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4028759Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4029320Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4029820Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4030268Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4030817Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4031414Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4031832Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.4032659Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4032872Z ('RERUN', {'yellow': True}) [0.4420s] [100%]
2025-12-04T12:15:06.4033992Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.4034771Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4035316Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4035891Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4036386Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4036832Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4037388Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4037919Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4038439Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.4038948Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.4039452Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.4040064Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.4040609Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.4040982Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4042785Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.4043342Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4044205Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4044710Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4045624Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.4046333Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.4047271Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4047777Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4048631Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.4049270Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.4050139Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.4050974Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.4051813Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.4052522Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.4053366Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.4054067Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.4054984Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4055350Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4056043Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.4056507Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4057053Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4058093Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4058741Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4059627Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4060417Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4061309Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4062151Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4062774Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.4063535Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4064090Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4064650Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4065149Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4065597Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4066141Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4066677Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4067096Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.4067917Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4068042Z FAILED [0.4395s] [100%]
2025-12-04T12:15:06.4068049Z 
2025-12-04T12:15:06.4068233Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.4068565Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _
2025-12-04T12:15:06.4068693Z Traceback (most recent call last):
2025-12-04T12:15:06.4069070Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.4069216Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.4069709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4069961Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4070486Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4070681Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4071424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4071579Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4072113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4072452Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4072973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4073234Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4073714Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4073840Z     return self._compile_to_module()
2025-12-04T12:15:06.4074344Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.4074613Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.4075136Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.4075281Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.4075779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.4076035Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.4076625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.4076754Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.4077273Z   File "/tmp/tmpt6w16nnv/du/cduc2masbadbpu7gctirhfo35eloemknautmizriuqaxr6z5sq6a.py", line 51, in <module>
2025-12-04T12:15:06.4077743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.4077871Z     kernel.precompile(
2025-12-04T12:15:06.4078426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.4078547Z     self._precompile_worker()
2025-12-04T12:15:06.4079156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.4079340Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.4079936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4080149Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4080597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4080904Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4081353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4081689Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4081936Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.4082259Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4082399Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4082540Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4082651Z     xmask = xindex < xnumel
2025-12-04T12:15:06.4082762Z     x0 = xindex
2025-12-04T12:15:06.4082887Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4083009Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4083120Z            ^
2025-12-04T12:15:06.4083512Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4083519Z 
2025-12-04T12:15:06.4084245Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4084252Z 
2025-12-04T12:15:06.4084257Z 
2025-12-04T12:15:06.4084472Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4085153Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T12:15:06.4085160Z 
2025-12-04T12:15:06.4085441Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4085668Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4085822Z frames [('total', 1)]
2025-12-04T12:15:06.4085944Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4086443Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4086681Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4086780Z graph_break []
2025-12-04T12:15:06.4087120Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _
2025-12-04T12:15:06.4087246Z Traceback (most recent call last):
2025-12-04T12:15:06.4087636Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.4087767Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.4088257Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4088523Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4089042Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4089242Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4089772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4089923Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4090471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4090796Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4091315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4091483Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4091966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4092154Z     return self._compile_to_module()
2025-12-04T12:15:06.4092648Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.4092820Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.4093350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.4093485Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.4093986Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.4094237Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.4094823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.4094969Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.4095479Z   File "/tmp/tmp76qdwytu/tr/ctrjd2h5qp4dv7rwfmx4kjgko243xboyncysg57ozmyhlbe6n3fc.py", line 51, in <module>
2025-12-04T12:15:06.4095946Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.4096076Z     kernel.precompile(
2025-12-04T12:15:06.4096713Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.4096901Z     self._precompile_worker()
2025-12-04T12:15:06.4097501Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.4097680Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.4098288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4098526Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4099008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4099273Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4099716Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4100068Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4100305Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.4100628Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4100771Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4100917Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4101028Z     xmask = xindex < xnumel
2025-12-04T12:15:06.4101144Z     x0 = xindex
2025-12-04T12:15:06.4101272Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4101414Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4101510Z            ^
2025-12-04T12:15:06.4101897Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4101903Z 
2025-12-04T12:15:06.4102628Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4102637Z 
2025-12-04T12:15:06.4102641Z 
2025-12-04T12:15:06.4102858Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4103495Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T12:15:06.4103501Z 
2025-12-04T12:15:06.4103772Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4104038Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4104162Z frames [('total', 1)]
2025-12-04T12:15:06.4104279Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4104758Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4104980Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4105080Z graph_break []
2025-12-04T12:15:06.4105313Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4105417Z frames [('total', 1)]
2025-12-04T12:15:06.4105533Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4105766Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4106228Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4106342Z graph_break []
2025-12-04T12:15:06.4106496Z =================================== FAILURES ===================================
2025-12-04T12:15:06.4106809Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _
2025-12-04T12:15:06.4106947Z Traceback (most recent call last):
2025-12-04T12:15:06.4107320Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.4107447Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.4107993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4108241Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4108774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4108969Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4109520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4109722Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4110257Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4110578Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4111109Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4111261Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4111754Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4111878Z     return self._compile_to_module()
2025-12-04T12:15:06.4112363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.4112547Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.4113063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.4113209Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.4113705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.4113938Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.4114540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.4114668Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.4115169Z   File "/tmp/tmptja5smb2/mb/cmbn3dj5wnso24f63dqvjd4pgdznbtmuljhs5h6bljqnndm2i7vy.py", line 51, in <module>
2025-12-04T12:15:06.4115683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.4115799Z     kernel.precompile(
2025-12-04T12:15:06.4116369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.4116488Z     self._precompile_worker()
2025-12-04T12:15:06.4117085Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.4117283Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.4117876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4118086Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4118537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4118789Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4119251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4119586Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4119817Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.4120150Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4120307Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4120461Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4120573Z     xmask = xindex < xnumel
2025-12-04T12:15:06.4120669Z     x0 = xindex
2025-12-04T12:15:06.4120809Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4120930Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4121054Z            ^
2025-12-04T12:15:06.4121457Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4121493Z 
2025-12-04T12:15:06.4122205Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4122211Z 
2025-12-04T12:15:06.4122216Z 
2025-12-04T12:15:06.4122447Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4123075Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T12:15:06.4123081Z 
2025-12-04T12:15:06.4123367Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4123620Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4123766Z frames [('total', 1)]
2025-12-04T12:15:06.4123943Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4124435Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4124657Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4124771Z graph_break []
2025-12-04T12:15:06.4124989Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4125099Z frames [('total', 1)]
2025-12-04T12:15:06.4125228Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4125451Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4125925Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4126024Z graph_break []
2025-12-04T12:15:06.4126239Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4126359Z frames [('total', 1)]
2025-12-04T12:15:06.4126474Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4126758Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4127232Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4127332Z graph_break []
2025-12-04T12:15:06.4127997Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d91f9f6b0d5ec125.xml -
2025-12-04T12:15:06.4128177Z =========================== short test summary info ============================
2025-12-04T12:15:06.4128940Z FAILED [0.4395s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.4129276Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4129404Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4129558Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4129672Z     xmask = xindex < xnumel
2025-12-04T12:15:06.4129768Z     x0 = xindex
2025-12-04T12:15:06.4129907Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4130026Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4130118Z            ^
2025-12-04T12:15:06.4130522Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4130528Z 
2025-12-04T12:15:06.4131280Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4131286Z 
2025-12-04T12:15:06.4131291Z 
2025-12-04T12:15:06.4131521Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4132147Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T12:15:06.4132191Z 
2025-12-04T12:15:06.4132494Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4132690Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.4132893Z ================== 1 failed, 187 deselected, 2 rerun in 4.39s ==================
2025-12-04T12:15:06.4133011Z Got exit code 1
2025-12-04T12:15:06.4133549Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T12:15:06.4133959Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:06.4134442Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ebbb316cfb6210df.xml
2025-12-04T12:15:06.4134613Z ============================= test session starts ==============================
2025-12-04T12:15:06.4134983Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.4135102Z cachedir: .pytest_cache
2025-12-04T12:15:06.4135623Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.4135766Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.4135878Z configfile: pytest.ini
2025-12-04T12:15:06.4136633Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.4136888Z collecting ... collected 188 items / 64 deselected / 124 selected
2025-12-04T12:15:06.4137033Z stepcurrent: skipping 64 already run items.
2025-12-04T12:15:06.4137166Z Running 124 items in this shard
2025-12-04T12:15:06.4137172Z 
2025-12-04T12:15:06.4138346Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.4139119Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4139681Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4140243Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4140750Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4141183Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4141730Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4142266Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4142770Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.4143292Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.4143837Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.4144392Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.4144981Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.4145377Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4147193Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.4147731Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4148612Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4149119Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4149964Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.4150674Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.4151529Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4152085Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4152930Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.4153582Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.4154455Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.4155289Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.4156146Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.4156839Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.4157696Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.4158414Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.4159308Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4159737Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4160429Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.4160791Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4161323Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4162377Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4163008Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4163920Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4164602Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4165507Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4166278Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4166932Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.4167703Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4168244Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4168822Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4169316Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4169765Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4170317Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4170841Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4171484Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.4172316Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4172559Z ('RERUN', {'yellow': True}) [3.4763s] [  0%]
2025-12-04T12:15:06.4173669Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.4174534Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4175096Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4175656Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4176169Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4176658Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4177201Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4177752Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4178258Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.4178785Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.4179293Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.4179850Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.4180393Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.4180806Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4182624Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.4183160Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4184034Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4184545Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4185393Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.4186105Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.4186999Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4187517Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4188437Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.4189089Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.4189959Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.4190787Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.4191631Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.4192328Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.4193179Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.4193866Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.4194758Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4195160Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4195855Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.4196215Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4196752Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4197805Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4198431Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4199335Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4200012Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4200939Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4201709Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4202358Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.4203162Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4203706Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4204280Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4204773Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4205218Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4205767Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4206290Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4206719Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.4207539Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4207690Z ('RERUN', {'yellow': True}) [0.4351s] [  0%]
2025-12-04T12:15:06.4208801Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.4209600Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4210157Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4210720Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4211226Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4211656Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4212198Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4212741Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4213249Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.4213773Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.4214276Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.4214900Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.4215443Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.4215860Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4217781Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.4218321Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4219202Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4219719Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4220566Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.4221275Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.4222130Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4222655Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4223533Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.4224182Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.4225057Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.4225885Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.4226725Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.4227427Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.4228285Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.4229002Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.4229891Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4230302Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4231028Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.4231391Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4231927Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4232981Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4233604Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4234509Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4235190Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4236082Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4236851Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4237460Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.4238265Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4238812Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4239384Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4239880Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4240321Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4240861Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4241389Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4241819Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.4242639Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4242789Z FAILED [0.4329s] [  0%]
2025-12-04T12:15:06.4242796Z 
2025-12-04T12:15:06.4242945Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.4243272Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _
2025-12-04T12:15:06.4243410Z Traceback (most recent call last):
2025-12-04T12:15:06.4243814Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.4243972Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.4244480Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4244730Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4245257Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4245454Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4245964Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4246126Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4246659Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4246998Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4247521Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4247676Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4248170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4248298Z     return self._compile_to_module()
2025-12-04T12:15:06.4248781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.4248964Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.4249482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.4249635Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.4250166Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.4250402Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.4250998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.4251126Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.4251600Z   File "/tmp/tmp_c_zu4r5/uu/cuukewa6wbxqjvqj2skh52k5pc2mvc6crv56al2zfmkih5jomgrb.py", line 51, in <module>
2025-12-04T12:15:06.4252067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.4252183Z     kernel.precompile(
2025-12-04T12:15:06.4252748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.4252870Z     self._precompile_worker()
2025-12-04T12:15:06.4253470Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.4253662Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.4254258Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4254467Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4254954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4255202Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4255660Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4255994Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4256277Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.4256706Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4256833Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4256992Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4257107Z     xmask = xindex < xnumel
2025-12-04T12:15:06.4257218Z     x0 = xindex
2025-12-04T12:15:06.4257356Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4257483Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4257578Z            ^
2025-12-04T12:15:06.4257981Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4257987Z 
2025-12-04T12:15:06.4258708Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4258716Z 
2025-12-04T12:15:06.4258721Z 
2025-12-04T12:15:06.4258961Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4259602Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T12:15:06.4259608Z 
2025-12-04T12:15:06.4259892Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4260120Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4260231Z frames [('total', 1)]
2025-12-04T12:15:06.4260367Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4260836Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4261059Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4261181Z graph_break []
2025-12-04T12:15:06.4261506Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _
2025-12-04T12:15:06.4261767Z Traceback (most recent call last):
2025-12-04T12:15:06.4262146Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.4262274Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.4262782Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4263035Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4263571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4263769Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4264282Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4264448Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4264986Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4265310Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4265846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4265996Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4266525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4266651Z     return self._compile_to_module()
2025-12-04T12:15:06.4267136Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.4267319Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.4267904Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.4268057Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.4268556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.4268789Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.4269393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.4269527Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.4270024Z   File "/tmp/tmp7la5352y/lb/clb77x6yqs7hsb3hmtygkrjoinvimp542n54darrcwxcz5koa3yw.py", line 51, in <module>
2025-12-04T12:15:06.4270501Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.4270619Z     kernel.precompile(
2025-12-04T12:15:06.4271367Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.4271488Z     self._precompile_worker()
2025-12-04T12:15:06.4272087Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.4272284Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.4272881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4273095Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4273546Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4273791Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4274331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4274671Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4274901Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.4275235Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4275360Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4275517Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4275627Z     xmask = xindex < xnumel
2025-12-04T12:15:06.4275722Z     x0 = xindex
2025-12-04T12:15:06.4275860Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4275979Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4276072Z            ^
2025-12-04T12:15:06.4276475Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4276484Z 
2025-12-04T12:15:06.4277200Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4277206Z 
2025-12-04T12:15:06.4277211Z 
2025-12-04T12:15:06.4277440Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4278076Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T12:15:06.4278151Z 
2025-12-04T12:15:06.4278437Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4278662Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4278767Z frames [('total', 1)]
2025-12-04T12:15:06.4278900Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4279366Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4279676Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4279794Z graph_break []
2025-12-04T12:15:06.4280015Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4280121Z frames [('total', 1)]
2025-12-04T12:15:06.4280251Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4280470Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4280945Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4281049Z graph_break []
2025-12-04T12:15:06.4281199Z =================================== FAILURES ===================================
2025-12-04T12:15:06.4281534Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _
2025-12-04T12:15:06.4281659Z Traceback (most recent call last):
2025-12-04T12:15:06.4282036Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.4282182Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.4282670Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4282931Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4283448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4283643Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4284167Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4284314Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4284859Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4285213Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4285735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4285898Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4286378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4286505Z     return self._compile_to_module()
2025-12-04T12:15:06.4287003Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.4287167Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.4287699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.4287832Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.4288333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.4288578Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.4289162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.4289304Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.4289841Z   File "/tmp/tmpsn1tewe1/ke/ckevxfko2vbgshwtfltjb26qlatadjfaue2iv7hl7ulz465sdadk.py", line 51, in <module>
2025-12-04T12:15:06.4290304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.4290432Z     kernel.precompile(
2025-12-04T12:15:06.4290988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.4291134Z     self._precompile_worker()
2025-12-04T12:15:06.4291776Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.4291956Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.4292717Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4292924Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4293381Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4293641Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4294084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4294433Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4294672Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.4294996Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4295136Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4295278Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4295392Z     xmask = xindex < xnumel
2025-12-04T12:15:06.4295506Z     x0 = xindex
2025-12-04T12:15:06.4295629Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4295752Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4295858Z            ^
2025-12-04T12:15:06.4296247Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4296253Z 
2025-12-04T12:15:06.4297043Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4297057Z 
2025-12-04T12:15:06.4297062Z 
2025-12-04T12:15:06.4297336Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4297992Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T12:15:06.4297998Z 
2025-12-04T12:15:06.4298267Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4298491Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4298612Z frames [('total', 1)]
2025-12-04T12:15:06.4298730Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4299193Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4299431Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4299535Z graph_break []
2025-12-04T12:15:06.4299771Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4299878Z frames [('total', 1)]
2025-12-04T12:15:06.4299994Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4300225Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4300691Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4300791Z graph_break []
2025-12-04T12:15:06.4301053Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4301157Z frames [('total', 1)]
2025-12-04T12:15:06.4301284Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4301506Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4301965Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4302114Z graph_break []
2025-12-04T12:15:06.4302811Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ebbb316cfb6210df.xml -
2025-12-04T12:15:06.4302988Z =========================== short test summary info ============================
2025-12-04T12:15:06.4303793Z FAILED [0.4329s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.4304115Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4304257Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4304396Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4304507Z     xmask = xindex < xnumel
2025-12-04T12:15:06.4304615Z     x0 = xindex
2025-12-04T12:15:06.4304737Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4304856Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4304965Z            ^
2025-12-04T12:15:06.4305357Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4305363Z 
2025-12-04T12:15:06.4306087Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4306093Z 
2025-12-04T12:15:06.4306097Z 
2025-12-04T12:15:06.4306314Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4306957Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T12:15:06.4306977Z 
2025-12-04T12:15:06.4307245Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4307427Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.4307645Z ================== 1 failed, 64 deselected, 2 rerun in 4.39s ===================
2025-12-04T12:15:06.4307787Z Got exit code 1
2025-12-04T12:15:06.4307902Z Retrying single test...
2025-12-04T12:15:06.4308386Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6f1b13e751374b5d.xml
2025-12-04T12:15:06.4308552Z ============================= test session starts ==============================
2025-12-04T12:15:06.4308917Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.4309034Z cachedir: .pytest_cache
2025-12-04T12:15:06.4309554Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.4309692Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.4309803Z configfile: pytest.ini
2025-12-04T12:15:06.4310397Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.4310638Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:06.4311359Z stepcurrent: skipping 64 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T12:15:06.4311487Z Running 1 items in this shard
2025-12-04T12:15:06.4311492Z 
2025-12-04T12:15:06.4312608Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.4313429Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4313976Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4314625Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4315139Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4315571Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4316128Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4316651Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4317158Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.4317689Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.4318193Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.4318752Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.4319297Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.4319664Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4321547Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.4322089Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4322969Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4323481Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4324336Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.4325056Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.4325927Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4326472Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4327319Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.4327978Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.4328918Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.4329752Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.4330604Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.4331318Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.4332172Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.4332856Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.4333867Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4334237Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4334932Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.4335296Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4335877Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4336996Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4337634Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4338545Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4339225Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4340123Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4340895Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4341564Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.4342326Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4342908Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4343518Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4344012Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4344457Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4345002Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4345531Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4345959Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.4346791Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4346944Z ('RERUN', {'yellow': True}) [3.4575s] [100%]
2025-12-04T12:15:06.4348053Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.4348831Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4349372Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4349972Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4350482Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4350913Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4351464Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4351990Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4352493Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.4353020Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.4353527Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.4354091Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.4354638Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.4355036Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4356895Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.4357457Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4358335Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4358845Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4359704Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.4360422Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.4361290Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4361793Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4362637Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.4363287Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.4364192Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.4365019Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.4365867Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.4366575Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.4367423Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.4368109Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.4369003Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4369395Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4370084Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.4370445Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4371270Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4372328Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4372953Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4373866Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4374547Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4375447Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4376215Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4376901Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.4377665Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4378209Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4378844Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4379343Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4379789Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4380334Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4380859Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4381292Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.4382122Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4382272Z ('RERUN', {'yellow': True}) [0.4342s] [100%]
2025-12-04T12:15:06.4383396Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.4384214Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4384758Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4385359Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4385893Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4386327Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4386883Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4387408Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4387916Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.4388436Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.4388946Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.4389503Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.4390047Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.4390409Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4392267Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.4392805Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4393682Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4394189Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4395037Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.4395753Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.4396617Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4397121Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4398003Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.4398648Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.4399576Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.4400409Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.4401248Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.4401957Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.4402799Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.4403489Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.4404381Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4404749Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4405442Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.4405802Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4406384Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4407447Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4408076Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4408983Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4409663Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4410561Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4411331Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4411950Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.4412744Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4413285Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4413926Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4414418Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4414865Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4415406Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4415934Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4416425Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.4417262Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4417382Z FAILED [0.4356s] [100%]
2025-12-04T12:15:06.4417389Z 
2025-12-04T12:15:06.4417535Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.4417866Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _
2025-12-04T12:15:06.4418011Z Traceback (most recent call last):
2025-12-04T12:15:06.4418388Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.4418537Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.4419028Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4419279Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4419847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4420050Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4420563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4420724Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4421261Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4421596Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4422120Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4422270Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4422764Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4422893Z     return self._compile_to_module()
2025-12-04T12:15:06.4423393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.4423560Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.4424076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.4424265Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.4424759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.4424991Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.4425593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.4425776Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.4426295Z   File "/tmp/tmp_s67pl2n/37/c37zt7inya7rhhvfferv4dpwirmtgkdiincjagrzebkvrbfcgs5a.py", line 51, in <module>
2025-12-04T12:15:06.4426760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.4426874Z     kernel.precompile(
2025-12-04T12:15:06.4427444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.4427584Z     self._precompile_worker()
2025-12-04T12:15:06.4428191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.4428373Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.4428968Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4429190Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4429645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4429893Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4430352Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4430689Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4430936Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.4431261Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4431386Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4431543Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4431658Z     xmask = xindex < xnumel
2025-12-04T12:15:06.4431758Z     x0 = xindex
2025-12-04T12:15:06.4431937Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4432064Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4432173Z            ^
2025-12-04T12:15:06.4432563Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4432569Z 
2025-12-04T12:15:06.4433287Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4433298Z 
2025-12-04T12:15:06.4433321Z 
2025-12-04T12:15:06.4433626Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4434286Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T12:15:06.4434293Z 
2025-12-04T12:15:06.4434582Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4434817Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4434926Z frames [('total', 1)]
2025-12-04T12:15:06.4435061Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4435531Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4435770Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4435874Z graph_break []
2025-12-04T12:15:06.4436243Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _
2025-12-04T12:15:06.4436383Z Traceback (most recent call last):
2025-12-04T12:15:06.4436757Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.4436886Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.4437392Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4437710Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4438315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4438515Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4439027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4439193Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4439730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4440066Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4440588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4440741Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4441242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4441368Z     return self._compile_to_module()
2025-12-04T12:15:06.4441856Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.4442035Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.4442555Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.4442700Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.4443195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.4443426Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.4444072Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.4444201Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.4444711Z   File "/tmp/tmpk0rfnz8k/rp/crpzruxu65lnw77wuerkdeuslfqi2plfmve4o4xjes3d43w2maf3.py", line 51, in <module>
2025-12-04T12:15:06.4445176Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.4445291Z     kernel.precompile(
2025-12-04T12:15:06.4445860Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.4445978Z     self._precompile_worker()
2025-12-04T12:15:06.4446575Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.4446770Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.4447372Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4447582Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4448032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4448276Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4448770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4449105Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4449351Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.4449672Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4449831Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4450018Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4450131Z     xmask = xindex < xnumel
2025-12-04T12:15:06.4450228Z     x0 = xindex
2025-12-04T12:15:06.4450368Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4450488Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4450583Z            ^
2025-12-04T12:15:06.4450988Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4450997Z 
2025-12-04T12:15:06.4451713Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4451719Z 
2025-12-04T12:15:06.4451723Z 
2025-12-04T12:15:06.4451957Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4452595Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T12:15:06.4452606Z 
2025-12-04T12:15:06.4452887Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4453114Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4453219Z frames [('total', 1)]
2025-12-04T12:15:06.4453350Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4453815Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4454041Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4454154Z graph_break []
2025-12-04T12:15:06.4454406Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4454526Z frames [('total', 1)]
2025-12-04T12:15:06.4454642Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4454864Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4455373Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4455476Z graph_break []
2025-12-04T12:15:06.4455628Z =================================== FAILURES ===================================
2025-12-04T12:15:06.4455965Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _
2025-12-04T12:15:06.4456091Z Traceback (most recent call last):
2025-12-04T12:15:06.4456585Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.4456717Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.4457209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4457474Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4457992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4458191Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4458719Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4458872Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4459424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4459795Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4460317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4460478Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4460961Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4461132Z     return self._compile_to_module()
2025-12-04T12:15:06.4461724Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.4461891Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.4462426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.4462588Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.4463083Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.4463330Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.4463917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.4464066Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.4464563Z   File "/tmp/tmptpjwmlj1/na/cnao57g2ulgri32xj4l73z4laikimden227m6rybxodp4tkm4j57.py", line 51, in <module>
2025-12-04T12:15:06.4465026Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.4465158Z     kernel.precompile(
2025-12-04T12:15:06.4465713Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.4465849Z     self._precompile_worker()
2025-12-04T12:15:06.4466444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.4466624Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.4467230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4467471Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4467926Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4468185Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4468625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4468973Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4469206Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.4469526Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4469667Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4469805Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4469917Z     xmask = xindex < xnumel
2025-12-04T12:15:06.4470027Z     x0 = xindex
2025-12-04T12:15:06.4470152Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4470286Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4470379Z            ^
2025-12-04T12:15:06.4470768Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4470774Z 
2025-12-04T12:15:06.4471697Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4471801Z 
2025-12-04T12:15:06.4471806Z 
2025-12-04T12:15:06.4472027Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4472679Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T12:15:06.4472685Z 
2025-12-04T12:15:06.4473004Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4473271Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4473393Z frames [('total', 1)]
2025-12-04T12:15:06.4473509Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4473989Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4474213Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4474317Z graph_break []
2025-12-04T12:15:06.4474550Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4474655Z frames [('total', 1)]
2025-12-04T12:15:06.4474771Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4475005Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4475465Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4475581Z graph_break []
2025-12-04T12:15:06.4475806Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4475912Z frames [('total', 1)]
2025-12-04T12:15:06.4476041Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4476264Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4476719Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4476834Z graph_break []
2025-12-04T12:15:06.4477482Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6f1b13e751374b5d.xml -
2025-12-04T12:15:06.4477670Z =========================== short test summary info ============================
2025-12-04T12:15:06.4478462Z FAILED [0.4356s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.4478833Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4478980Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4479122Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4479233Z     xmask = xindex < xnumel
2025-12-04T12:15:06.4479344Z     x0 = xindex
2025-12-04T12:15:06.4479469Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4479589Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4479697Z            ^
2025-12-04T12:15:06.4487345Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4487362Z 
2025-12-04T12:15:06.4488180Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4488197Z 
2025-12-04T12:15:06.4488202Z 
2025-12-04T12:15:06.4488426Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4489077Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T12:15:06.4489096Z 
2025-12-04T12:15:06.4489370Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4489556Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.4489863Z ================== 1 failed, 187 deselected, 2 rerun in 4.37s ==================
2025-12-04T12:15:06.4489966Z Got exit code 1
2025-12-04T12:15:06.4490077Z Retrying single test...
2025-12-04T12:15:06.4490564Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f2c09e3279cd971a.xml
2025-12-04T12:15:06.4490731Z ============================= test session starts ==============================
2025-12-04T12:15:06.4491135Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.4491286Z cachedir: .pytest_cache
2025-12-04T12:15:06.4491813Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.4491951Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.4492062Z configfile: pytest.ini
2025-12-04T12:15:06.4492654Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.4492892Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:06.4493617Z stepcurrent: skipping 64 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T12:15:06.4493748Z Running 1 items in this shard
2025-12-04T12:15:06.4493756Z 
2025-12-04T12:15:06.4494878Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.4495645Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4496209Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4496896Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4497401Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4497835Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4498432Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4498957Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4499462Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.4499984Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.4500486Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.4501043Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.4501591Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.4501956Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4503772Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.4504342Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4505287Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4505793Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4506641Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.4507350Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.4508211Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4508719Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4509559Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.4510209Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.4511079Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.4511929Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.4512778Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.4513482Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.4514323Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.4515004Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.4515898Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4516259Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4516947Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.4517339Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4517871Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4518921Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4519612Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4520519Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4521192Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4522089Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4522857Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4523481Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.4524238Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4524779Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4525347Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4525836Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4526319Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4526860Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4527379Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4527805Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.4528632Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4528780Z ('RERUN', {'yellow': True}) [3.4628s] [100%]
2025-12-04T12:15:06.4529893Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.4530653Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4531210Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4532174Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4532680Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4533110Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4533763Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4534285Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4534786Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.4535309Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.4535805Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.4536421Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.4536999Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.4537384Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4539239Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.4539817Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4540741Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4541252Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4542092Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.4542809Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.4543671Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4544182Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4545026Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.4545673Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.4546574Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.4547397Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.4548295Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.4549001Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.4549842Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.4550531Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.4551420Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4551789Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4552475Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.4552837Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4553369Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4554417Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4555080Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4555981Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4556655Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4557544Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4558313Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4558942Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.4559701Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4560242Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4560850Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4561343Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4561822Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4562394Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4562920Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4563350Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.4564168Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4564315Z ('RERUN', {'yellow': True}) [0.4335s] [100%]
2025-12-04T12:15:06.4565433Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.4566196Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4566747Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4567304Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4567812Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4568244Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4568839Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4569365Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4569866Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T12:15:06.4570392Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T12:15:06.4570896Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T12:15:06.4571640Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T12:15:06.4572187Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T12:15:06.4572554Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4574376Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.4575001Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4575925Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4576543Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4577395Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T12:15:06.4578108Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T12:15:06.4578975Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T12:15:06.4579486Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T12:15:06.4580333Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T12:15:06.4580982Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T12:15:06.4581852Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T12:15:06.4582688Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T12:15:06.4583592Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T12:15:06.4584302Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T12:15:06.4585149Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T12:15:06.4585844Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T12:15:06.4586741Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4587104Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4587791Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T12:15:06.4588149Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4588725Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4589786Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4590464Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4591396Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4592078Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4592969Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4593750Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4594377Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T12:15:06.4595136Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4595679Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4596258Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4596747Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T12:15:06.4597189Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T12:15:06.4597775Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4598300Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4598732Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T12:15:06.4599556Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4599673Z FAILED [0.4451s] [100%]
2025-12-04T12:15:06.4599680Z 
2025-12-04T12:15:06.4599826Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.4600152Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _
2025-12-04T12:15:06.4600294Z Traceback (most recent call last):
2025-12-04T12:15:06.4600676Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.4600819Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.4601315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4601566Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4602123Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4602324Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4602833Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4602995Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4603559Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4603925Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4604448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4604600Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4605099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4605227Z     return self._compile_to_module()
2025-12-04T12:15:06.4605727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.4605893Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.4606412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.4606561Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.4607060Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.4607293Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.4607890Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.4608021Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.4608533Z   File "/tmp/tmpxvkbrrj1/3z/c3zzs6t6rgzarktvabvrrs5jnnzy7ol6rncfz5zgmc56h7mvt5lf.py", line 51, in <module>
2025-12-04T12:15:06.4608997Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.4609111Z     kernel.precompile(
2025-12-04T12:15:06.4609689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.4609844Z     self._precompile_worker()
2025-12-04T12:15:06.4610453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.4610633Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.4611230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4611440Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4611890Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4612134Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4612587Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4612930Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4613173Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.4613496Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4613623Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4613777Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4613885Z     xmask = xindex < xnumel
2025-12-04T12:15:06.4614013Z     x0 = xindex
2025-12-04T12:15:06.4614149Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4614265Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4614371Z            ^
2025-12-04T12:15:06.4614759Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4614766Z 
2025-12-04T12:15:06.4615483Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4615550Z 
2025-12-04T12:15:06.4615556Z 
2025-12-04T12:15:06.4615785Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4616487Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T12:15:06.4616494Z 
2025-12-04T12:15:06.4616776Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4617004Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4617110Z frames [('total', 1)]
2025-12-04T12:15:06.4617238Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4617697Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4617934Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4618033Z graph_break []
2025-12-04T12:15:06.4618358Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _
2025-12-04T12:15:06.4618494Z Traceback (most recent call last):
2025-12-04T12:15:06.4618865Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.4618994Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.4619493Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4619742Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4620268Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4620460Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4620969Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4621723Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4622267Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4622598Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4623119Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4623272Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4623763Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4623884Z     return self._compile_to_module()
2025-12-04T12:15:06.4624366Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.4624545Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.4625061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.4625200Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.4625696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.4625927Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.4626557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.4626683Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.4627196Z   File "/tmp/tmpul8swxe9/ce/cce4pvrydatjw3qpxy4xb24dbqsr55og3qzmdmbbyytgzqtvtg6l.py", line 51, in <module>
2025-12-04T12:15:06.4627687Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.4627799Z     kernel.precompile(
2025-12-04T12:15:06.4628398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.4628513Z     self._precompile_worker()
2025-12-04T12:15:06.4629109Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.4629298Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.4629889Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4630097Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4630545Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4630791Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4631249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4631581Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4631828Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.4632144Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4632272Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4632420Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4632530Z     xmask = xindex < xnumel
2025-12-04T12:15:06.4632625Z     x0 = xindex
2025-12-04T12:15:06.4632755Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4632873Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4632963Z            ^
2025-12-04T12:15:06.4633357Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4633401Z 
2025-12-04T12:15:06.4634121Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4634127Z 
2025-12-04T12:15:06.4634132Z 
2025-12-04T12:15:06.4634360Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4634997Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T12:15:06.4635005Z 
2025-12-04T12:15:06.4635281Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4635508Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4635613Z frames [('total', 1)]
2025-12-04T12:15:06.4635739Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4636212Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4636431Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4636540Z graph_break []
2025-12-04T12:15:06.4636758Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4636872Z frames [('total', 1)]
2025-12-04T12:15:06.4636986Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4637270Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4637741Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4637838Z graph_break []
2025-12-04T12:15:06.4637983Z =================================== FAILURES ===================================
2025-12-04T12:15:06.4638318Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _
2025-12-04T12:15:06.4638473Z Traceback (most recent call last):
2025-12-04T12:15:06.4638892Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T12:15:06.4639018Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T12:15:06.4639506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4639763Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4640277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4640470Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4640987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4641136Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4641685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4642007Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4642524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4642679Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4643157Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4643295Z     return self._compile_to_module()
2025-12-04T12:15:06.4643779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.4643942Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.4644468Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.4644639Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.4645134Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.4645378Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.4645961Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.4646102Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.4646603Z   File "/tmp/tmpkbo8j2vw/qc/cqczjpqx6jq4biiqt3bcyhc7vnaq5gqca23io6r2sgd24l6qln7a.py", line 51, in <module>
2025-12-04T12:15:06.4647061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.4647187Z     kernel.precompile(
2025-12-04T12:15:06.4647745Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.4647875Z     self._precompile_worker()
2025-12-04T12:15:06.4648472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.4648652Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.4649254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4649489Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4649937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4650194Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4650636Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4651047Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4651281Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.4651595Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4651731Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4651868Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4651981Z     xmask = xindex < xnumel
2025-12-04T12:15:06.4652091Z     x0 = xindex
2025-12-04T12:15:06.4652212Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4652341Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4652433Z            ^
2025-12-04T12:15:06.4652819Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4652825Z 
2025-12-04T12:15:06.4653554Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4653561Z 
2025-12-04T12:15:06.4653566Z 
2025-12-04T12:15:06.4653781Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4654428Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T12:15:06.4654434Z 
2025-12-04T12:15:06.4654706Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4654927Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4655044Z frames [('total', 1)]
2025-12-04T12:15:06.4655158Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4655638Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4655865Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4655997Z graph_break []
2025-12-04T12:15:06.4656230Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4656458Z frames [('total', 1)]
2025-12-04T12:15:06.4656579Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4656824Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4657288Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4657409Z graph_break []
2025-12-04T12:15:06.4657629Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4657735Z frames [('total', 1)]
2025-12-04T12:15:06.4657864Z stats [('calls_captured', 4)]
2025-12-04T12:15:06.4658083Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4658545Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4658663Z graph_break []
2025-12-04T12:15:06.4659315Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f2c09e3279cd971a.xml -
2025-12-04T12:15:06.4659504Z =========================== short test summary info ============================
2025-12-04T12:15:06.4660299Z FAILED [0.4451s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T12:15:06.4660664Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4660803Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4660945Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4661057Z     xmask = xindex < xnumel
2025-12-04T12:15:06.4661203Z     x0 = xindex
2025-12-04T12:15:06.4661329Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T12:15:06.4661455Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T12:15:06.4661598Z            ^
2025-12-04T12:15:06.4661988Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T12:15:06.4661994Z 
2025-12-04T12:15:06.4662723Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4662733Z 
2025-12-04T12:15:06.4662737Z 
2025-12-04T12:15:06.4662955Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4663603Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T12:15:06.4663610Z 
2025-12-04T12:15:06.4663882Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4664069Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.4664296Z ================== 1 failed, 187 deselected, 2 rerun in 4.38s ==================
2025-12-04T12:15:06.4664399Z Got exit code 1
2025-12-04T12:15:06.4664955Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T12:15:06.4665383Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:06.4665858Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c0e3bac2edd6805.xml
2025-12-04T12:15:06.4666044Z ============================= test session starts ==============================
2025-12-04T12:15:06.4666398Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.4666510Z cachedir: .pytest_cache
2025-12-04T12:15:06.4667056Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.4667218Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.4667347Z configfile: pytest.ini
2025-12-04T12:15:06.4667936Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.4668168Z collecting ... collected 188 items / 65 deselected / 123 selected
2025-12-04T12:15:06.4668334Z stepcurrent: skipping 65 already run items.
2025-12-04T12:15:06.4668454Z Running 123 items in this shard
2025-12-04T12:15:06.4668459Z 
2025-12-04T12:15:06.4669525Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.4670258Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4670697Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:06.4671419Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4671980Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4672627Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:06.4673147Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (0))
2025-12-04T12:15:06.4673693Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK])
2025-12-04T12:15:06.4674387Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float8e4nv)
2025-12-04T12:15:06.4675112Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None)
2025-12-04T12:15:06.4675484Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4677219Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.4677772Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4678816Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4679442Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4680346Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4681082Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4681979Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4682747Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4683370Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.4684084Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4684463Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:06.4685363Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.4685496Z ('RERUN', {'yellow': True}) [3.0549s] [  0%]
2025-12-04T12:15:06.4686545Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.4687299Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4687740Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:06.4688342Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4688903Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4689476Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:06.4689999Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (0))
2025-12-04T12:15:06.4690561Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK])
2025-12-04T12:15:06.4691080Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float8e4nv)
2025-12-04T12:15:06.4691812Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None)
2025-12-04T12:15:06.4692185Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4693915Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.4694465Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4695538Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4696181Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4697135Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4697831Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4698710Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4699491Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4700109Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.4700826Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4701243Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:06.4702290Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.4702488Z ('RERUN', {'yellow': True}) [0.2827s] [  0%]
2025-12-04T12:15:06.4703559Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.4704283Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4704734Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:06.4705281Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4705859Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4706432Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:06.4706954Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (0))
2025-12-04T12:15:06.4707518Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK])
2025-12-04T12:15:06.4708039Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float8e4nv)
2025-12-04T12:15:06.4708775Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None)
2025-12-04T12:15:06.4709140Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4710928Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.4711466Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4712523Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4713157Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4714061Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4714760Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4715681Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4716519Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4717205Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.4717939Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4718307Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:06.4719211Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.4719319Z FAILED [0.2805s] [  0%]
2025-12-04T12:15:06.4719326Z 
2025-12-04T12:15:06.4719474Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.4719782Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________
2025-12-04T12:15:06.4719914Z Traceback (most recent call last):
2025-12-04T12:15:06.4720366Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel
2025-12-04T12:15:06.4720487Z     actual = torch.compile(f)(x)
2025-12-04T12:15:06.4720980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4721249Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4721768Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4721969Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4722495Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4722646Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4723234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4723562Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4724085Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4724250Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4724733Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4724868Z     return self._compile_to_module()
2025-12-04T12:15:06.4725359Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.4725524Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.4726070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.4726208Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.4726720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.4726955Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.4727544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.4727725Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.4728234Z   File "/tmp/tmpvpk3crkk/dk/cdk5vi2ofixapffkl7vn54ayvwq6vxbrvzhgvnornrpgq27ef3tw.py", line 45, in <module>
2025-12-04T12:15:06.4728701Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.4728860Z     kernel.precompile(
2025-12-04T12:15:06.4729471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.4729607Z     self._precompile_worker()
2025-12-04T12:15:06.4730210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.4730392Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.4731006Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4731211Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4731681Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4731931Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4732379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4732727Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4732956Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.4733245Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4733352Z ^
2025-12-04T12:15:06.4733815Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.4733823Z 
2025-12-04T12:15:06.4734550Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4734557Z 
2025-12-04T12:15:06.4734561Z 
2025-12-04T12:15:06.4734780Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4735393Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T12:15:06.4735399Z 
2025-12-04T12:15:06.4735673Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4735898Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4736019Z frames [('total', 1)]
2025-12-04T12:15:06.4736140Z stats [('calls_captured', 1)]
2025-12-04T12:15:06.4736456Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4736699Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4736803Z graph_break []
2025-12-04T12:15:06.4737108Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________
2025-12-04T12:15:06.4737235Z Traceback (most recent call last):
2025-12-04T12:15:06.4737675Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel
2025-12-04T12:15:06.4737814Z     actual = torch.compile(f)(x)
2025-12-04T12:15:06.4738310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4738562Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4739096Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4739292Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4739862Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4740010Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4740547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4740919Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4741472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4741639Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4742123Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4742249Z     return self._compile_to_module()
2025-12-04T12:15:06.4742749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.4742916Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.4743537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.4743683Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.4744184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.4744436Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.4745022Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.4745151Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.4745668Z   File "/tmp/tmpjdeqtmqg/ru/cruj6wlj737fngl4mvq23ncz5u5wlnjubfm6kkwnyijtpdbpa3z7.py", line 45, in <module>
2025-12-04T12:15:06.4746132Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.4746260Z     kernel.precompile(
2025-12-04T12:15:06.4746815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.4746934Z     self._precompile_worker()
2025-12-04T12:15:06.4747603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.4747788Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.4748384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4748597Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4749049Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4749311Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4749761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4750103Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4750348Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.4750644Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4750736Z ^
2025-12-04T12:15:06.4751212Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.4751218Z 
2025-12-04T12:15:06.4751931Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4751974Z 
2025-12-04T12:15:06.4751979Z 
2025-12-04T12:15:06.4752215Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4752766Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T12:15:06.4752771Z 
2025-12-04T12:15:06.4753052Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4753309Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4753446Z frames [('total', 1)]
2025-12-04T12:15:06.4753580Z stats [('calls_captured', 1)]
2025-12-04T12:15:06.4753820Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4754042Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4754156Z graph_break []
2025-12-04T12:15:06.4754374Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4754492Z frames [('total', 1)]
2025-12-04T12:15:06.4754608Z stats [('calls_captured', 1)]
2025-12-04T12:15:06.4754825Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4755076Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4755175Z graph_break []
2025-12-04T12:15:06.4755322Z =================================== FAILURES ===================================
2025-12-04T12:15:06.4755624Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________
2025-12-04T12:15:06.4755752Z Traceback (most recent call last):
2025-12-04T12:15:06.4756197Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel
2025-12-04T12:15:06.4756320Z     actual = torch.compile(f)(x)
2025-12-04T12:15:06.4756813Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4757079Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4757599Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4757798Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4758329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4758477Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4759058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4759384Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4759905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4760071Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4760553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4760689Z     return self._compile_to_module()
2025-12-04T12:15:06.4761171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.4761339Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.4761872Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.4762003Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.4762502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.4762751Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.4763336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.4763510Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.4764003Z   File "/tmp/tmpe23i4x0q/m4/cm453kf6uooz34mn6h4mfgw3bzyev2ivt6ojijffnlcoepqgwz4c.py", line 45, in <module>
2025-12-04T12:15:06.4764462Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.4764618Z     kernel.precompile(
2025-12-04T12:15:06.4765207Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.4765338Z     self._precompile_worker()
2025-12-04T12:15:06.4765935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.4766115Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.4766723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4766922Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4767371Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4767631Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4768084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4768431Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4768660Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.4768946Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4769053Z ^
2025-12-04T12:15:06.4769510Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.4769519Z 
2025-12-04T12:15:06.4770245Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4770251Z 
2025-12-04T12:15:06.4770256Z 
2025-12-04T12:15:06.4770473Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4771330Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T12:15:06.4771352Z 
2025-12-04T12:15:06.4771629Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4771855Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4771975Z frames [('total', 1)]
2025-12-04T12:15:06.4772095Z stats [('calls_captured', 1)]
2025-12-04T12:15:06.4772341Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4772581Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4772683Z graph_break []
2025-12-04T12:15:06.4772906Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4773033Z frames [('total', 1)]
2025-12-04T12:15:06.4773150Z stats [('calls_captured', 1)]
2025-12-04T12:15:06.4773384Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4773628Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4773730Z graph_break []
2025-12-04T12:15:06.4773958Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4774065Z frames [('total', 1)]
2025-12-04T12:15:06.4774183Z stats [('calls_captured', 1)]
2025-12-04T12:15:06.4774412Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4774693Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4774794Z graph_break []
2025-12-04T12:15:06.4775460Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c0e3bac2edd6805.xml -
2025-12-04T12:15:06.4775636Z =========================== short test summary info ============================
2025-12-04T12:15:06.4776493Z FAILED [0.2805s] inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.4776843Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4776935Z ^
2025-12-04T12:15:06.4777409Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.4777415Z 
2025-12-04T12:15:06.4778127Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4778135Z 
2025-12-04T12:15:06.4778140Z 
2025-12-04T12:15:06.4778375Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4778934Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T12:15:06.4778942Z 
2025-12-04T12:15:06.4779223Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4779408Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.4779611Z ================== 1 failed, 65 deselected, 2 rerun in 3.66s ===================
2025-12-04T12:15:06.4779728Z Got exit code 1
2025-12-04T12:15:06.4779839Z Retrying single test...
2025-12-04T12:15:06.4780307Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1c004f486086cbb5.xml
2025-12-04T12:15:06.4780487Z ============================= test session starts ==============================
2025-12-04T12:15:06.4780841Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.4780965Z cachedir: .pytest_cache
2025-12-04T12:15:06.4781484Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.4781613Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.4781735Z configfile: pytest.ini
2025-12-04T12:15:06.4782362Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.4783098Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:06.4783834Z stepcurrent: skipping 65 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T12:15:06.4784103Z Running 1 items in this shard
2025-12-04T12:15:06.4784110Z 
2025-12-04T12:15:06.4785851Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.4787555Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4788011Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:06.4789102Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4789669Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4790312Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:06.4790835Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (0))
2025-12-04T12:15:06.4792274Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK])
2025-12-04T12:15:06.4793805Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float8e4nv)
2025-12-04T12:15:06.4794543Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None)
2025-12-04T12:15:06.4794921Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4797098Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.4797745Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4798952Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4799656Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4800856Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4801655Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4802743Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4803517Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4804141Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.4804956Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4805398Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:06.4806613Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.4806764Z ('RERUN', {'yellow': True}) [3.0389s] [100%]
2025-12-04T12:15:06.4808056Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.4808944Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4809491Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:06.4810075Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4810767Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4811404Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:06.4811989Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (0))
2025-12-04T12:15:06.4812707Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK])
2025-12-04T12:15:06.4813337Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float8e4nv)
2025-12-04T12:15:06.4814238Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None)
2025-12-04T12:15:06.4814605Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4816837Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.4817381Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4818474Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4819120Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4820010Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4820707Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4821593Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4822382Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4822989Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.4823707Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4824129Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:06.4825025Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.4825210Z ('RERUN', {'yellow': True}) [0.2795s] [100%]
2025-12-04T12:15:06.4826279Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.4827013Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4827445Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:06.4827988Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4828564Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4829134Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:06.4829671Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (0))
2025-12-04T12:15:06.4830219Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK])
2025-12-04T12:15:06.4830745Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float8e4nv)
2025-12-04T12:15:06.4831488Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None)
2025-12-04T12:15:06.4831854Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4833630Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.4834171Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4835228Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4835867Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4836770Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4837450Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4838376Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4839237Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4840095Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.4840972Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4841343Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:06.4842445Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.4842558Z FAILED [0.2777s] [100%]
2025-12-04T12:15:06.4842565Z 
2025-12-04T12:15:06.4842711Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.4843078Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________
2025-12-04T12:15:06.4843210Z Traceback (most recent call last):
2025-12-04T12:15:06.4843652Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel
2025-12-04T12:15:06.4843857Z     actual = torch.compile(f)(x)
2025-12-04T12:15:06.4844347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4844613Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4845126Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4845386Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4845934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4846137Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4846741Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4847066Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4847586Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4847752Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4848337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4848462Z     return self._compile_to_module()
2025-12-04T12:15:06.4848964Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.4849126Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.4849661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.4849797Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.4850366Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.4850619Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.4851210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.4851441Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.4851949Z   File "/tmp/tmp5up7ijr4/e6/ce62zz5vmvcbyvithppljypvzklyai4oveyk3awto4eqffvev77d.py", line 45, in <module>
2025-12-04T12:15:06.4852416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.4852593Z     kernel.precompile(
2025-12-04T12:15:06.4853284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.4853416Z     self._precompile_worker()
2025-12-04T12:15:06.4854030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.4854211Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.4854822Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4855027Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4855565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4855829Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4856449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4856860Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4857091Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.4857475Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4857632Z ^
2025-12-04T12:15:06.4858143Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.4858154Z 
2025-12-04T12:15:06.4858932Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4858940Z 
2025-12-04T12:15:06.4858945Z 
2025-12-04T12:15:06.4859164Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4859765Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T12:15:06.4859774Z 
2025-12-04T12:15:06.4860056Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4860282Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4860402Z frames [('total', 1)]
2025-12-04T12:15:06.4860520Z stats [('calls_captured', 1)]
2025-12-04T12:15:06.4860762Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4861001Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4861103Z graph_break []
2025-12-04T12:15:06.4861393Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________
2025-12-04T12:15:06.4861533Z Traceback (most recent call last):
2025-12-04T12:15:06.4861973Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel
2025-12-04T12:15:06.4862111Z     actual = torch.compile(f)(x)
2025-12-04T12:15:06.4862823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4863133Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4863718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4863917Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4864471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4864633Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4865170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4865502Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4866094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4866245Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4866744Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4866868Z     return self._compile_to_module()
2025-12-04T12:15:06.4867366Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.4867535Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.4868052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.4868199Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.4868696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.4868936Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.4869535Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.4869663Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.4870160Z   File "/tmp/tmpbe076_mg/xb/cxbzh4vedyzoq2dfxsdpfge2fyym77n4croircujachkuzlmvsjc.py", line 45, in <module>
2025-12-04T12:15:06.4870623Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.4870737Z     kernel.precompile(
2025-12-04T12:15:06.4871495Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.4871616Z     self._precompile_worker()
2025-12-04T12:15:06.4872308Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.4872494Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.4873091Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4873303Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4873755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4874002Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4874460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4874800Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4875045Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.4875336Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4875426Z ^
2025-12-04T12:15:06.4875899Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.4875905Z 
2025-12-04T12:15:06.4876615Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4876676Z 
2025-12-04T12:15:06.4876682Z 
2025-12-04T12:15:06.4876913Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4877468Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T12:15:06.4877473Z 
2025-12-04T12:15:06.4877754Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4878020Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4878171Z frames [('total', 1)]
2025-12-04T12:15:06.4878301Z stats [('calls_captured', 1)]
2025-12-04T12:15:06.4878543Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4878764Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4878880Z graph_break []
2025-12-04T12:15:06.4879099Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4879207Z frames [('total', 1)]
2025-12-04T12:15:06.4879337Z stats [('calls_captured', 1)]
2025-12-04T12:15:06.4879555Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4879802Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4879904Z graph_break []
2025-12-04T12:15:06.4880053Z =================================== FAILURES ===================================
2025-12-04T12:15:06.4880360Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________
2025-12-04T12:15:06.4880490Z Traceback (most recent call last):
2025-12-04T12:15:06.4880927Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel
2025-12-04T12:15:06.4881060Z     actual = torch.compile(f)(x)
2025-12-04T12:15:06.4881554Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4881826Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4882340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4882535Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4883058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4883209Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4883800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4884139Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4884656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4884817Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4885303Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4885428Z     return self._compile_to_module()
2025-12-04T12:15:06.4885925Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.4886090Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.4886625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.4886755Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.4887251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.4887500Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.4888084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.4888323Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.4888837Z   File "/tmp/tmpf7nuk2b2/hf/chfizvyhutfvs77r7rwygxw3wl3n7zeg7z5aci55mrorbhxncngz.py", line 45, in <module>
2025-12-04T12:15:06.4889298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.4889458Z     kernel.precompile(
2025-12-04T12:15:06.4890047Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.4890169Z     self._precompile_worker()
2025-12-04T12:15:06.4890787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.4890970Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.4891575Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4891780Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4892232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4892496Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4892947Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4893299Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4893528Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.4893815Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4893926Z ^
2025-12-04T12:15:06.4894380Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.4894389Z 
2025-12-04T12:15:06.4895102Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4895122Z 
2025-12-04T12:15:06.4895127Z 
2025-12-04T12:15:06.4895347Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4895935Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T12:15:06.4895941Z 
2025-12-04T12:15:06.4896224Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4896519Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4896643Z frames [('total', 1)]
2025-12-04T12:15:06.4896762Z stats [('calls_captured', 1)]
2025-12-04T12:15:06.4897005Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4897247Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4897351Z graph_break []
2025-12-04T12:15:06.4897572Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4897692Z frames [('total', 1)]
2025-12-04T12:15:06.4897809Z stats [('calls_captured', 1)]
2025-12-04T12:15:06.4898031Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4898289Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4898391Z graph_break []
2025-12-04T12:15:06.4898627Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4898737Z frames [('total', 1)]
2025-12-04T12:15:06.4898853Z stats [('calls_captured', 1)]
2025-12-04T12:15:06.4899085Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4899321Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4899461Z graph_break []
2025-12-04T12:15:06.4900133Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1c004f486086cbb5.xml -
2025-12-04T12:15:06.4900308Z =========================== short test summary info ============================
2025-12-04T12:15:06.4901033Z FAILED [0.2777s] inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.4901379Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4901471Z ^
2025-12-04T12:15:06.4901941Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.4901947Z 
2025-12-04T12:15:06.4902659Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4902668Z 
2025-12-04T12:15:06.4902672Z 
2025-12-04T12:15:06.4902903Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4903461Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T12:15:06.4903466Z 
2025-12-04T12:15:06.4903737Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4903938Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.4904142Z ================== 1 failed, 187 deselected, 2 rerun in 3.64s ==================
2025-12-04T12:15:06.4904257Z Got exit code 1
2025-12-04T12:15:06.4904369Z Retrying single test...
2025-12-04T12:15:06.4904839Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5e05e7b060f911b0.xml
2025-12-04T12:15:06.4905021Z ============================= test session starts ==============================
2025-12-04T12:15:06.4905376Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.4905487Z cachedir: .pytest_cache
2025-12-04T12:15:06.4906020Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.4906150Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.4906275Z configfile: pytest.ini
2025-12-04T12:15:06.4906902Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.4907131Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:15:06.4907775Z stepcurrent: skipping 65 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T12:15:06.4907894Z Running 1 items in this shard
2025-12-04T12:15:06.4907899Z 
2025-12-04T12:15:06.4908954Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.4909683Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4910120Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:06.4910679Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4911237Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4911844Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:06.4912365Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (0))
2025-12-04T12:15:06.4912913Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK])
2025-12-04T12:15:06.4913511Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float8e4nv)
2025-12-04T12:15:06.4914239Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None)
2025-12-04T12:15:06.4914615Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4916357Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.4916914Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4917958Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4918604Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4919501Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4920182Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4921113Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4921892Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4922516Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.4923233Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4923616Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:06.4924519Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.4924654Z ('RERUN', {'yellow': True}) [3.0556s] [100%]
2025-12-04T12:15:06.4925699Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.4926459Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4926902Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:06.4927478Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4928089Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4928665Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:06.4929188Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (0))
2025-12-04T12:15:06.4929755Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK])
2025-12-04T12:15:06.4930276Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float8e4nv)
2025-12-04T12:15:06.4931017Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None)
2025-12-04T12:15:06.4931383Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4933113Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.4933662Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4934760Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4935415Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4936364Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4937063Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4937943Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4938730Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4939340Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.4940068Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4940498Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:06.4941397Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.4941588Z ('RERUN', {'yellow': True}) [0.2822s] [100%]
2025-12-04T12:15:06.4942655Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T12:15:06.4943374Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4943826Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T12:15:06.4944365Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T12:15:06.4944946Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T12:15:06.4945522Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T12:15:06.4946067Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (0))
2025-12-04T12:15:06.4946618Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK])
2025-12-04T12:15:06.4947149Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float8e4nv)
2025-12-04T12:15:06.4947890Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None)
2025-12-04T12:15:06.4948252Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T12:15:06.4950046Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 75}
2025-12-04T12:15:06.4950591Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T12:15:06.4951644Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4952278Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4953169Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4953865Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4954787Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4955576Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4956249Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T12:15:06.4956983Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4957355Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T12:15:06.4958244Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.4958365Z FAILED [0.2767s] [100%]
2025-12-04T12:15:06.4958371Z 
2025-12-04T12:15:06.4958516Z ==================================== RERUNS ====================================
2025-12-04T12:15:06.4958825Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________
2025-12-04T12:15:06.4958955Z Traceback (most recent call last):
2025-12-04T12:15:06.4959395Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel
2025-12-04T12:15:06.4959533Z     actual = torch.compile(f)(x)
2025-12-04T12:15:06.4960022Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4960287Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4960803Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4960999Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4961523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4961674Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4962244Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4962583Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4963107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4963270Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4963756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4963880Z     return self._compile_to_module()
2025-12-04T12:15:06.4964376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.4964542Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.4965077Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.4965214Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.4965711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.4965959Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.4966545Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.4966707Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.4967223Z   File "/tmp/tmplrulxepv/bw/cbw6pd55lttopoe2rolibfuhv7sndqskb5qwhnzhq5yl7an2klzj.py", line 45, in <module>
2025-12-04T12:15:06.4967684Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.4967816Z     kernel.precompile(
2025-12-04T12:15:06.4968406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.4968557Z     self._precompile_worker()
2025-12-04T12:15:06.4969177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.4969360Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.4969968Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4970171Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4970622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4970886Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4971509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4971854Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4972102Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.4972390Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4972498Z ^
2025-12-04T12:15:06.4972955Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.4972962Z 
2025-12-04T12:15:06.4973674Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4973693Z 
2025-12-04T12:15:06.4973697Z 
2025-12-04T12:15:06.4973918Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4974474Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T12:15:06.4974558Z 
2025-12-04T12:15:06.4974843Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4975067Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4975173Z frames [('total', 1)]
2025-12-04T12:15:06.4975306Z stats [('calls_captured', 1)]
2025-12-04T12:15:06.4975548Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4975786Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4975889Z graph_break []
2025-12-04T12:15:06.4976195Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________
2025-12-04T12:15:06.4976390Z Traceback (most recent call last):
2025-12-04T12:15:06.4976829Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel
2025-12-04T12:15:06.4976953Z     actual = torch.compile(f)(x)
2025-12-04T12:15:06.4977466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4977717Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4978246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4978443Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4979009Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4979177Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4979713Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4980051Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4980684Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4980839Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4981337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4981462Z     return self._compile_to_module()
2025-12-04T12:15:06.4981951Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.4982137Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.4982656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.4982802Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.4983300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.4983546Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.4984149Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.4984280Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.4984796Z   File "/tmp/tmpfug849ze/7d/c7dq77infxlvw22t4ophndfmtu3hjgapvvilsj5yp43sqze7dum4.py", line 45, in <module>
2025-12-04T12:15:06.4985266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.4985381Z     kernel.precompile(
2025-12-04T12:15:06.4985952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.4986074Z     self._precompile_worker()
2025-12-04T12:15:06.4986670Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.4986910Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.4987510Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.4987722Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.4988176Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.4988427Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.4988884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.4989224Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.4989466Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.4989786Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.4989881Z ^
2025-12-04T12:15:06.4990355Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.4990360Z 
2025-12-04T12:15:06.4991072Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.4991114Z 
2025-12-04T12:15:06.4991118Z 
2025-12-04T12:15:06.4991349Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.4991903Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T12:15:06.4991908Z 
2025-12-04T12:15:06.4992176Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.4992448Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4992556Z frames [('total', 1)]
2025-12-04T12:15:06.4992717Z stats [('calls_captured', 1)]
2025-12-04T12:15:06.4992957Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4993179Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4993295Z graph_break []
2025-12-04T12:15:06.4993515Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.4993626Z frames [('total', 1)]
2025-12-04T12:15:06.4993754Z stats [('calls_captured', 1)]
2025-12-04T12:15:06.4993971Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.4994206Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.4994322Z graph_break []
2025-12-04T12:15:06.4994470Z =================================== FAILURES ===================================
2025-12-04T12:15:06.4994779Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________
2025-12-04T12:15:06.4994908Z Traceback (most recent call last):
2025-12-04T12:15:06.4995346Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel
2025-12-04T12:15:06.4995487Z     actual = torch.compile(f)(x)
2025-12-04T12:15:06.4995976Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:15:06.4996254Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:15:06.4996782Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T12:15:06.4996976Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:15:06.4997501Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T12:15:06.4997654Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:15:06.4998231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:15:06.4998568Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:15:06.4999087Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T12:15:06.4999248Z     compiled_module = graph.compile_to_module()
2025-12-04T12:15:06.4999734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T12:15:06.4999857Z     return self._compile_to_module()
2025-12-04T12:15:06.5000360Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T12:15:06.5000525Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T12:15:06.5001046Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T12:15:06.5001192Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T12:15:06.5001688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T12:15:06.5001935Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T12:15:06.5002519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T12:15:06.5002681Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T12:15:06.5003191Z   File "/tmp/tmpyjm1cd0y/qq/cqq53c7t2khz6m3yi4fjlkv76anwivkukfxnsjohara7iurydepe.py", line 45, in <module>
2025-12-04T12:15:06.5003654Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T12:15:06.5003815Z     kernel.precompile(
2025-12-04T12:15:06.5004403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T12:15:06.5004521Z     self._precompile_worker()
2025-12-04T12:15:06.5005132Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T12:15:06.5005310Z     compile_results.append(self._precompile_config(c))
2025-12-04T12:15:06.5005904Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T12:15:06.5006118Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T12:15:06.5006567Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T12:15:06.5006825Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T12:15:06.5007272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T12:15:06.5007611Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T12:15:06.5007855Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.5008144Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.5008249Z ^
2025-12-04T12:15:06.5008707Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.5008717Z 
2025-12-04T12:15:06.5009425Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.5009432Z 
2025-12-04T12:15:06.5009437Z 
2025-12-04T12:15:06.5009668Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.5010263Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T12:15:06.5010272Z 
2025-12-04T12:15:06.5010556Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.5010780Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.5010887Z frames [('total', 1)]
2025-12-04T12:15:06.5011017Z stats [('calls_captured', 1)]
2025-12-04T12:15:06.5011255Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.5011489Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.5011589Z graph_break []
2025-12-04T12:15:06.5011807Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.5011924Z frames [('total', 1)]
2025-12-04T12:15:06.5012040Z stats [('calls_captured', 1)]
2025-12-04T12:15:06.5012260Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.5012517Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.5012620Z graph_break []
2025-12-04T12:15:06.5012840Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:15:06.5012959Z frames [('total', 1)]
2025-12-04T12:15:06.5013075Z stats [('calls_captured', 1)]
2025-12-04T12:15:06.5013305Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:15:06.5013538Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T12:15:06.5013673Z graph_break []
2025-12-04T12:15:06.5014337Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5e05e7b060f911b0.xml -
2025-12-04T12:15:06.5014512Z =========================== short test summary info ============================
2025-12-04T12:15:06.5015229Z FAILED [0.2767s] inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T12:15:06.5015588Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T12:15:06.5015680Z ^
2025-12-04T12:15:06.5016151Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T12:15:06.5016156Z 
2025-12-04T12:15:06.5016949Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:15:06.5016958Z 
2025-12-04T12:15:06.5016962Z 
2025-12-04T12:15:06.5017193Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:06.5017749Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T12:15:06.5017755Z 
2025-12-04T12:15:06.5018024Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:06.5018227Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:06.5018434Z ================== 1 failed, 187 deselected, 2 rerun in 3.66s ==================
2025-12-04T12:15:06.5018539Z Got exit code 1
2025-12-04T12:15:06.5019030Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T12:15:06.5019443Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:06.5019931Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4d9849432c7f5caf.xml
2025-12-04T12:15:06.5020099Z ============================= test session starts ==============================
2025-12-04T12:15:06.5020453Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:06.5020578Z cachedir: .pytest_cache
2025-12-04T12:15:06.5021146Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:06.5021292Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:06.5021403Z configfile: pytest.ini
2025-12-04T12:15:06.5021996Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:06.5022240Z collecting ... collected 188 items / 66 deselected / 122 selected
2025-12-04T12:15:06.5022389Z stepcurrent: skipping 66 already run items.
2025-12-04T12:15:06.5022507Z Running 122 items in this shard
2025-12-04T12:15:06.5022513Z 
2025-12-04T12:15:06.5022960Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e5m2_cuda PASSED [2.9121s] [  0%]
2025-12-04T12:15:06.5023848Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes0_cuda SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  1%]
2025-12-04T12:15:06.5024746Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes1_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  2%]
2025-12-04T12:15:06.5025619Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes0_cuda SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  3%]
2025-12-04T12:15:06.5026553Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes1_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  4%]
2025-12-04T12:15:06.5027426Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes0_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  4%]
2025-12-04T12:15:06.5028365Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes1_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  5%]
2025-12-04T12:15:06.5029253Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes0_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  6%]
2025-12-04T12:15:06.5030118Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes1_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  7%]
2025-12-04T12:15:06.5030643Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fp8_max_autotune_cuda SKIPPED [0.0002s] (Not supported on non B200) [  8%]
2025-12-04T12:15:06.5031261Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fusion_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  9%]
2025-12-04T12:15:06.5032223Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  9%]
2025-12-04T12:15:06.5033164Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 10%]
2025-12-04T12:15:06.5034101Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 11%]
2025-12-04T12:15:06.5035038Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 12%]
2025-12-04T12:15:06.5036005Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 13%]
2025-12-04T12:15:06.5036958Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 13%]
2025-12-04T12:15:06.5037877Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 14%]
2025-12-04T12:15:06.5038825Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 15%]
2025-12-04T12:15:06.5039740Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 16%]
2025-12-04T12:15:06.5040676Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 17%]
2025-12-04T12:15:06.5041614Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 18%]
2025-12-04T12:15:06.5042547Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 18%]
2025-12-04T12:15:06.5043540Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 19%]
2025-12-04T12:15:06.5044494Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 20%]
2025-12-04T12:15:06.5045415Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 21%]
2025-12-04T12:15:06.5046339Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 22%]
2025-12-04T12:15:06.5047273Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 22%]
2025-12-04T12:15:06.5048196Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 23%]
2025-12-04T12:15:06.5049130Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 24%]
2025-12-04T12:15:06.5050066Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 25%]
2025-12-04T12:15:06.5051023Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 26%]
2025-12-04T12:15:06.5051946Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 27%]
2025-12-04T12:15:06.5052863Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 27%]
2025-12-04T12:15:06.5053784Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 28%]
2025-12-04T12:15:06.5054710Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 29%]
2025-12-04T12:15:06.5055658Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 30%]
2025-12-04T12:15:06.5056646Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 31%]
2025-12-04T12:15:06.5057650Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 31%]
2025-12-04T12:15:06.5058630Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 32%]
2025-12-04T12:15:06.5059563Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 33%]
2025-12-04T12:15:06.5060570Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 34%]
2025-12-04T12:15:06.5061584Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 35%]
2025-12-04T12:15:06.5062590Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 36%]
2025-12-04T12:15:06.5063599Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 36%]
2025-12-04T12:15:06.5064571Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 37%]
2025-12-04T12:15:06.5065536Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 38%]
2025-12-04T12:15:06.5066547Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 39%]
2025-12-04T12:15:06.5067506Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 40%]
2025-12-04T12:15:06.5068494Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 40%]
2025-12-04T12:15:06.5069456Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 41%]
2025-12-04T12:15:06.5070439Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 42%]
2025-12-04T12:15:06.5071589Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 43%]
2025-12-04T12:15:06.5072572Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 44%]
2025-12-04T12:15:06.5073452Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_True_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 45%]
2025-12-04T12:15:06.5074411Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_False_cuda SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 45%]
2025-12-04T12:15:06.5075262Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_True_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 46%]
2025-12-04T12:15:06.5075971Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_scaled_mm_preserves_strides_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 47%]
2025-12-04T12:15:06.5076950Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 48%]
2025-12-04T12:15:06.5077924Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 49%]
2025-12-04T12:15:06.5078888Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 50%]
2025-12-04T12:15:06.5079842Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 50%]
2025-12-04T12:15:06.5080802Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 51%]
2025-12-04T12:15:06.5081816Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 52%]
2025-12-04T12:15:06.5082775Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 53%]
2025-12-04T12:15:06.5083730Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 54%]
2025-12-04T12:15:06.5084675Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 54%]
2025-12-04T12:15:06.5085626Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 55%]
2025-12-04T12:15:06.5086560Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 56%]
2025-12-04T12:15:06.5087544Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 57%]
2025-12-04T12:15:06.5088492Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 58%]
2025-12-04T12:15:06.5089521Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 59%]
2025-12-04T12:15:06.5090456Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 59%]
2025-12-04T12:15:06.5091413Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 60%]
2025-12-04T12:15:06.5092687Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 61%]
2025-12-04T12:15:06.5093652Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 62%]
2025-12-04T12:15:06.5094592Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 63%]
2025-12-04T12:15:06.5095582Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 63%]
2025-12-04T12:15:06.5096591Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 64%]
2025-12-04T12:15:06.5097667Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 65%]
2025-12-04T12:15:06.5098674Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 66%]
2025-12-04T12:15:06.5099647Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 67%]
2025-12-04T12:15:06.5100623Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 68%]
2025-12-04T12:15:06.5101579Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 68%]
2025-12-04T12:15:06.5102523Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 69%]
2025-12-04T12:15:06.5103500Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 70%]
2025-12-04T12:15:06.5104452Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 71%]
2025-12-04T12:15:06.5105485Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 72%]
2025-12-04T12:15:06.5106614Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 72%]
2025-12-04T12:15:06.5107714Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 73%]
2025-12-04T12:15:06.5108827Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 74%]
2025-12-04T12:15:06.5109926Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 75%]
2025-12-04T12:15:06.5111025Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 76%]
2025-12-04T12:15:06.5112092Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 77%]
2025-12-04T12:15:06.5113215Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 77%]
2025-12-04T12:15:06.5114287Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 78%]
2025-12-04T12:15:06.5115366Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 79%]
2025-12-04T12:15:06.5116458Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 80%]
2025-12-04T12:15:06.5117533Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 81%]
2025-12-04T12:15:06.5118612Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 81%]
2025-12-04T12:15:06.5119732Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 82%]
2025-12-04T12:15:06.5120866Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 83%]
2025-12-04T12:15:06.5121985Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 84%]
2025-12-04T12:15:06.5123080Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 85%]
2025-12-04T12:15:06.5124144Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 86%]
2025-12-04T12:15:06.5125212Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 86%]
2025-12-04T12:15:06.5126267Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 87%]
2025-12-04T12:15:06.5127335Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 88%]
2025-12-04T12:15:06.5128439Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 89%]
2025-12-04T12:15:06.5129514Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 90%]
2025-12-04T12:15:06.5130576Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 90%]
2025-12-04T12:15:06.5131632Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 91%]
2025-12-04T12:15:06.5132647Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 92%]
2025-12-04T12:15:06.5133628Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_True_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 93%]
2025-12-04T12:15:06.5134628Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 94%]
2025-12-04T12:15:06.5135572Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_True_cuda_bfloat16 SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 95%]
2025-12-04T12:15:06.5136704Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 95%]
2025-12-04T12:15:06.5137676Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_True_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 96%]
2025-12-04T12:15:06.5138631Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 97%]
2025-12-04T12:15:06.5139571Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_True_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 98%]
2025-12-04T12:15:06.5140292Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_input_dims_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 99%]
2025-12-04T12:15:06.5141065Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_scale_dims_rowwise_scaling_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [100%]
2025-12-04T12:15:06.5141072Z 
2025-12-04T12:15:06.5141732Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4d9849432c7f5caf.xml -
2025-12-04T12:15:06.5141967Z ================ 1 passed, 121 skipped, 66 deselected in 3.35s =================
2025-12-04T12:15:06.5157844Z The following tests failed consistently: ['test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda']
2025-12-04T12:15:06.5157939Z 
2025-12-04T12:15:06.5158395Z FINISHED PRINTING LOG FILE of inductor/test_fp8 1/1 (test/test-reports/inductor.test_fp8_1.1_5b24deb545871ee8_.log)
2025-12-04T12:15:06.5158400Z 
2025-12-04T12:15:06.5158720Z Finished inductor/test_fp8 1/1 ... [2025-12-04 12:15:04.795060][10933.17794738], took 32.13min
2025-12-04T12:15:06.5159434Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dff864e79f1bf91b.xml
2025-12-04T12:15:06.5160198Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-053a0e10a178eff6.xml
2025-12-04T12:15:06.5160893Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-966288eeb3fe785e.xml
2025-12-04T12:15:06.5161631Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-47dd8058babbbd0d.xml
2025-12-04T12:15:06.5162330Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e92e228ccdafe934.xml
2025-12-04T12:15:06.5163021Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0328fb4bc2fb022d.xml
2025-12-04T12:15:06.5163730Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4ecceae3d20d3515.xml
2025-12-04T12:15:06.5164431Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-af3f0411f43ffff1.xml
2025-12-04T12:15:06.5165148Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5f646abfecfc34db.xml
2025-12-04T12:15:06.5165836Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f71fa45f6063b14.xml
2025-12-04T12:15:06.5166540Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3d881319a967678f.xml
2025-12-04T12:15:06.5167257Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7b45d70025cf6016.xml
2025-12-04T12:15:06.5167952Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b4a285d41fdad5fc.xml
2025-12-04T12:15:06.5168653Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9b24822b6f23300e.xml
2025-12-04T12:15:06.5169395Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-642548938a706c13.xml
2025-12-04T12:15:06.5170101Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3087aa3d89d0a96b.xml
2025-12-04T12:15:06.5170792Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5fb4c628c04a1cdc.xml
2025-12-04T12:15:06.5171747Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0d752e0bfa5071ea.xml
2025-12-04T12:15:06.5172462Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fedbc7df4b1c2869.xml
2025-12-04T12:15:06.5173151Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3cf6d62b643bfad7.xml
2025-12-04T12:15:06.5173862Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e2e744a24cd2751e.xml
2025-12-04T12:15:06.5174554Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6de5a411a3f65f82.xml
2025-12-04T12:15:06.5175259Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d2f3621583fff098.xml
2025-12-04T12:15:06.5175952Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-319cee3df6121e1a.xml
2025-12-04T12:15:06.5176700Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-452be63c68b4eb35.xml
2025-12-04T12:15:06.5177490Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5a49841d6a2b730b.xml
2025-12-04T12:15:06.5178182Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f1313a025d30dc09.xml
2025-12-04T12:15:06.5178883Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-03aedafc0832726c.xml
2025-12-04T12:15:06.5179572Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-89171bcc48f05a69.xml
2025-12-04T12:15:06.5180262Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6450e334481f0131.xml
2025-12-04T12:15:06.5180974Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f7999da795e3cf34.xml
2025-12-04T12:15:06.5181679Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ad7a38726bbc8b50.xml
2025-12-04T12:15:06.5182387Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b434424093647de3.xml
2025-12-04T12:15:06.5183080Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ccd966f4e119e833.xml
2025-12-04T12:15:06.5183819Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d16f18ba4de45d90.xml
2025-12-04T12:15:06.5184525Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4078dca354f1c797.xml
2025-12-04T12:15:06.5185221Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7591ded94ad5fda9.xml
2025-12-04T12:15:06.5186018Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4955a88ef6b89264.xml
2025-12-04T12:15:06.5186724Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2fae1650dec37ec0.xml
2025-12-04T12:15:06.5434234Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0893388d06071d35.xml
2025-12-04T12:15:06.5858983Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b62e3abe6013e6ef.xml
2025-12-04T12:15:06.6232208Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0fe50dbde6f69754.xml
2025-12-04T12:15:06.6550598Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1381d94cd6abec18.xml
2025-12-04T12:15:06.6851519Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2b9aebe063e8f7ef.xml
2025-12-04T12:15:06.7229676Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-649ba93d0ac5919c.xml
2025-12-04T12:15:06.7520729Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df60bd1ca7e6baab.xml
2025-12-04T12:15:06.7900531Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-927fdf8f8ff6280c.xml
2025-12-04T12:15:06.8240529Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f380c761dc75570.xml
2025-12-04T12:15:06.8592102Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db3aa4c2f1c0f2c1.xml
2025-12-04T12:15:06.8928127Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7c20e7902388541e.xml
2025-12-04T12:15:06.9248981Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43cf13c151388d8e.xml
2025-12-04T12:15:06.9615328Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-27661fe34019a4f8.xml
2025-12-04T12:15:06.9965411Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-63ef36c446edecf7.xml
2025-12-04T12:15:07.0296222Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-818cc5e6f257d295.xml
2025-12-04T12:15:07.0615558Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b552d5ebf2a766dc.xml
2025-12-04T12:15:07.1070189Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-08c28ac73e77007a.xml
2025-12-04T12:15:07.1423173Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df1b42bf8f6cd06e.xml
2025-12-04T12:15:07.1753947Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-97d6c66aee44b097.xml
2025-12-04T12:15:07.2069015Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-232f2d4b09cdec77.xml
2025-12-04T12:15:07.2376703Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6add3d31a0a55a66.xml
2025-12-04T12:15:07.2682053Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fa52f41f0c0be4e5.xml
2025-12-04T12:15:07.2972832Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-38b24c1b21208356.xml
2025-12-04T12:15:07.3301206Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b1ae24833396f782.xml
2025-12-04T12:15:07.3610568Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-80996ba6b8c32f81.xml
2025-12-04T12:15:07.3949454Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8b26ba548538abde.xml
2025-12-04T12:15:07.4411141Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d73817a3e5f02a06.xml
2025-12-04T12:15:07.4726515Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-90e37d7f0968dad1.xml
2025-12-04T12:15:07.5065459Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6ba281452d587f38.xml
2025-12-04T12:15:07.5397488Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-85d1d6e9267cc116.xml
2025-12-04T12:15:07.5677019Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7a610c26dd7fa0e9.xml
2025-12-04T12:15:07.6093904Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-269f6089cafc9f3b.xml
2025-12-04T12:15:07.6437017Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f11fe18ee197cc1f.xml
2025-12-04T12:15:07.6737533Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0b8acd36d7258295.xml
2025-12-04T12:15:07.7065512Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-babe12520ea62fea.xml
2025-12-04T12:15:07.7400551Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-08a6bb29b776e6ca.xml
2025-12-04T12:15:07.8292760Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ef8db3fa00c6c1d7.xml
2025-12-04T12:15:07.8890874Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7a3ac84fc91fa02b.xml
2025-12-04T12:15:07.9217786Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e162f70cb76e49ff.xml
2025-12-04T12:15:07.9532821Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e70a5c274fb86b8e.xml
2025-12-04T12:15:07.9884953Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0c17434f07767682.xml
2025-12-04T12:15:08.0184498Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3815c1aa47a06d85.xml
2025-12-04T12:15:08.0485855Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-69850f25ab7699fd.xml
2025-12-04T12:15:08.0813995Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-da23a1d59c747be6.xml
2025-12-04T12:15:08.1152659Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-36993cd4956a89fe.xml
2025-12-04T12:15:08.1476568Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-78153e5fcd212bc6.xml
2025-12-04T12:15:08.1821092Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-04b538cf09549803.xml
2025-12-04T12:15:08.2384722Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d91f9f6b0d5ec125.xml
2025-12-04T12:15:08.2709424Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ebbb316cfb6210df.xml
2025-12-04T12:15:08.3024354Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6f1b13e751374b5d.xml
2025-12-04T12:15:08.3312772Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f2c09e3279cd971a.xml
2025-12-04T12:15:08.3631810Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c0e3bac2edd6805.xml
2025-12-04T12:15:08.3959108Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1c004f486086cbb5.xml
2025-12-04T12:15:08.4346055Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5e05e7b060f911b0.xml
2025-12-04T12:15:08.4646140Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4d9849432c7f5caf.xml
2025-12-04T12:15:08.9568734Z Uploading logs for 57119749248 to S3
2025-12-04T12:15:09.1308848Z Uploading artifacts took 0.63 seconds
2025-12-04T12:15:09.1313118Z inductor/test_fp8 1/1 failed!
2025-12-04T12:15:09.1313652Z Running dynamo/test_model_output 1/1 ... [2025-12-04 12:15:09.131154][10937.514047253]
2025-12-04T12:15:09.1314233Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:15:09.1318455Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_model_output.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:15:09.131618]
2025-12-04T12:15:45.3925080Z 
2025-12-04T12:15:45.3926052Z PRINTING LOG FILE of dynamo/test_model_output 1/1 (test/test-reports/dynamo.test_model_output_1.1_9f288500c4a144e5_.log)
2025-12-04T12:15:45.3927496Z Test results will be stored in test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-d1e6e78eb0372411.xml
2025-12-04T12:15:45.3928602Z ============================= test session starts ==============================
2025-12-04T12:15:45.3929359Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:45.3929972Z cachedir: .pytest_cache
2025-12-04T12:15:45.3930680Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:45.3931688Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:45.3932047Z configfile: pytest.ini
2025-12-04T12:15:45.3932816Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:45.3933675Z collecting ... collected 18 items
2025-12-04T12:15:45.3934094Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T12:15:45.3940886Z Running 18 items in this shard: test/dynamo/test_model_output.py::TestHFPretrained::test_pretrained, test/dynamo/test_model_output.py::TestHFPretrained::test_pretrained_non_const_attr, test/dynamo/test_model_output.py::TestModelOutput::test_mo_assign, test/dynamo/test_model_output.py::TestModelOutput::test_mo_create, test/dynamo/test_model_output.py::TestModelOutput::test_mo_from_outside, test/dynamo/test_model_output.py::TestModelOutput::test_mo_getattr, test/dynamo/test_model_output.py::TestModelOutput::test_mo_getattr_missing, test/dynamo/test_model_output.py::TestModelOutput::test_mo_getitem, test/dynamo/test_model_output.py::TestModelOutput::test_mo_index, test/dynamo/test_model_output.py::TestModelOutput::test_mo_init, test/dynamo/test_model_output.py::TestModelOutput::test_mo_init2, test/dynamo/test_model_output.py::TestModelOutput::test_mo_init_with_disable, test/dynamo/test_model_output.py::TestModelOutput::test_mo_newkey, test/dynamo/test_model_output.py::TestModelOutput::test_mo_reconstruct_bytecode, test/dynamo/test_model_output.py::TestModelOutput::test_mo_tuple, test/dynamo/test_model_output.py::TestModelOutput::test_none, test/dynamo/test_model_output.py::TestModelOutput::test_reconstruction, test/dynamo/test_model_output.py::TestModelOutputBertCUDA::test_HF_bert_model_output_cuda
2025-12-04T12:15:45.3947510Z 
2025-12-04T12:15:45.3947942Z dynamo/test_model_output.py::TestHFPretrained::test_pretrained ('RERUN', {'yellow': True}) [0.0298s] [  5%]
2025-12-04T12:15:45.3948944Z dynamo/test_model_output.py::TestHFPretrained::test_pretrained ('RERUN', {'yellow': True}) [0.0017s] [  5%]
2025-12-04T12:15:45.3949842Z dynamo/test_model_output.py::TestHFPretrained::test_pretrained FAILED [0.0015s] [  5%]
2025-12-04T12:15:45.3950307Z 
2025-12-04T12:15:45.3950467Z ==================================== RERUNS ====================================
2025-12-04T12:15:45.3950994Z _______________________ TestHFPretrained.test_pretrained _______________________
2025-12-04T12:15:45.3951508Z Traceback (most recent call last):
2025-12-04T12:15:45.3952223Z   File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 45, in test_pretrained
2025-12-04T12:15:45.3952867Z     ref = fn(x, tmp)
2025-12-04T12:15:45.3953382Z   File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 40, in fn
2025-12-04T12:15:45.3954004Z     return a + torch.ones(2) * tmp.max_length
2025-12-04T12:15:45.3954848Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 198, in __getattribute__
2025-12-04T12:15:45.3955676Z     return super().__getattribute__(key)
2025-12-04T12:15:45.3956213Z AttributeError: 'PreTrainedConfig' object has no attribute 'max_length'
2025-12-04T12:15:45.3956622Z 
2025-12-04T12:15:45.3956856Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:45.3957637Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained
2025-12-04T12:15:45.3958224Z 
2025-12-04T12:15:45.3958496Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:45.3959155Z _______________________ TestHFPretrained.test_pretrained _______________________
2025-12-04T12:15:45.3959669Z Traceback (most recent call last):
2025-12-04T12:15:45.3960299Z   File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 45, in test_pretrained
2025-12-04T12:15:45.3960943Z     ref = fn(x, tmp)
2025-12-04T12:15:45.3961455Z   File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 40, in fn
2025-12-04T12:15:45.3962096Z     return a + torch.ones(2) * tmp.max_length
2025-12-04T12:15:45.3962926Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 198, in __getattribute__
2025-12-04T12:15:45.3963752Z     return super().__getattribute__(key)
2025-12-04T12:15:45.3964284Z AttributeError: 'PreTrainedConfig' object has no attribute 'max_length'
2025-12-04T12:15:45.3964692Z 
2025-12-04T12:15:45.3964942Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:45.3965772Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained
2025-12-04T12:15:45.3966362Z 
2025-12-04T12:15:45.3966629Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:45.3967193Z =================================== FAILURES ===================================
2025-12-04T12:15:45.3967721Z _______________________ TestHFPretrained.test_pretrained _______________________
2025-12-04T12:15:45.3968240Z Traceback (most recent call last):
2025-12-04T12:15:45.3968885Z   File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 45, in test_pretrained
2025-12-04T12:15:45.3969517Z     ref = fn(x, tmp)
2025-12-04T12:15:45.3970029Z   File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 40, in fn
2025-12-04T12:15:45.3970648Z     return a + torch.ones(2) * tmp.max_length
2025-12-04T12:15:45.3971701Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 198, in __getattribute__
2025-12-04T12:15:45.3972528Z     return super().__getattribute__(key)
2025-12-04T12:15:45.3973065Z AttributeError: 'PreTrainedConfig' object has no attribute 'max_length'
2025-12-04T12:15:45.3973477Z 
2025-12-04T12:15:45.3973709Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:45.3974505Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained
2025-12-04T12:15:45.3975078Z 
2025-12-04T12:15:45.3975344Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:45.3976567Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-d1e6e78eb0372411.xml -
2025-12-04T12:15:45.3977620Z =========================== short test summary info ============================
2025-12-04T12:15:45.3978690Z FAILED [0.0015s] dynamo/test_model_output.py::TestHFPretrained::test_pretrained - AttributeError: 'PreTrainedConfig' object has no attribute 'max_length'
2025-12-04T12:15:45.3979482Z 
2025-12-04T12:15:45.3979700Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:45.3980495Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained
2025-12-04T12:15:45.3981068Z 
2025-12-04T12:15:45.3981351Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:45.3981942Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:45.3982423Z ========================== 1 failed, 2 rerun in 0.08s ==========================
2025-12-04T12:15:45.3982836Z Got exit code 1
2025-12-04T12:15:45.3983109Z Retrying single test...
2025-12-04T12:15:45.3983825Z Test results will be stored in test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-0e0432c8246f889e.xml
2025-12-04T12:15:45.3984669Z ============================= test session starts ==============================
2025-12-04T12:15:45.3985337Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:45.3985936Z cachedir: .pytest_cache
2025-12-04T12:15:45.3986631Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:45.3987413Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:45.3987772Z configfile: pytest.ini
2025-12-04T12:15:45.3988600Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:45.3989546Z collecting ... collected 18 items / 17 deselected / 1 selected
2025-12-04T12:15:45.3990418Z stepcurrent: skipping 0 already run items. Running only test/dynamo/test_model_output.py::TestHFPretrained::test_pretrained
2025-12-04T12:15:45.3991195Z Running 1 items in this shard
2025-12-04T12:15:45.3991458Z 
2025-12-04T12:15:45.3991938Z dynamo/test_model_output.py::TestHFPretrained::test_pretrained ('RERUN', {'yellow': True}) [0.0298s] [100%]
2025-12-04T12:15:45.3992936Z dynamo/test_model_output.py::TestHFPretrained::test_pretrained ('RERUN', {'yellow': True}) [0.0018s] [100%]
2025-12-04T12:15:45.3993827Z dynamo/test_model_output.py::TestHFPretrained::test_pretrained FAILED [0.0015s] [100%]
2025-12-04T12:15:45.3994294Z 
2025-12-04T12:15:45.3994455Z ==================================== RERUNS ====================================
2025-12-04T12:15:45.3994984Z _______________________ TestHFPretrained.test_pretrained _______________________
2025-12-04T12:15:45.3995497Z Traceback (most recent call last):
2025-12-04T12:15:45.3996145Z   File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 45, in test_pretrained
2025-12-04T12:15:45.3996779Z     ref = fn(x, tmp)
2025-12-04T12:15:45.3997295Z   File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 40, in fn
2025-12-04T12:15:45.3997917Z     return a + torch.ones(2) * tmp.max_length
2025-12-04T12:15:45.3998763Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 198, in __getattribute__
2025-12-04T12:15:45.3999580Z     return super().__getattribute__(key)
2025-12-04T12:15:45.4000117Z AttributeError: 'PreTrainedConfig' object has no attribute 'max_length'
2025-12-04T12:15:45.4000529Z 
2025-12-04T12:15:45.4000759Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:45.4001540Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained
2025-12-04T12:15:45.4002126Z 
2025-12-04T12:15:45.4002393Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:45.4003049Z _______________________ TestHFPretrained.test_pretrained _______________________
2025-12-04T12:15:45.4003560Z Traceback (most recent call last):
2025-12-04T12:15:45.4004192Z   File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 45, in test_pretrained
2025-12-04T12:15:45.4004892Z     ref = fn(x, tmp)
2025-12-04T12:15:45.4005407Z   File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 40, in fn
2025-12-04T12:15:45.4006010Z     return a + torch.ones(2) * tmp.max_length
2025-12-04T12:15:45.4006837Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 198, in __getattribute__
2025-12-04T12:15:45.4007662Z     return super().__getattribute__(key)
2025-12-04T12:15:45.4008200Z AttributeError: 'PreTrainedConfig' object has no attribute 'max_length'
2025-12-04T12:15:45.4008607Z 
2025-12-04T12:15:45.4008826Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:45.4009613Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained
2025-12-04T12:15:45.4010203Z 
2025-12-04T12:15:45.4010468Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:45.4011033Z =================================== FAILURES ===================================
2025-12-04T12:15:45.4011566Z _______________________ TestHFPretrained.test_pretrained _______________________
2025-12-04T12:15:45.4012081Z Traceback (most recent call last):
2025-12-04T12:15:45.4012729Z   File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 45, in test_pretrained
2025-12-04T12:15:45.4013363Z     ref = fn(x, tmp)
2025-12-04T12:15:45.4013871Z   File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 40, in fn
2025-12-04T12:15:45.4014525Z     return a + torch.ones(2) * tmp.max_length
2025-12-04T12:15:45.4015351Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 198, in __getattribute__
2025-12-04T12:15:45.4016165Z     return super().__getattribute__(key)
2025-12-04T12:15:45.4016783Z AttributeError: 'PreTrainedConfig' object has no attribute 'max_length'
2025-12-04T12:15:45.4017240Z 
2025-12-04T12:15:45.4017469Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:45.4018306Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained
2025-12-04T12:15:45.4018899Z 
2025-12-04T12:15:45.4019171Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:45.4020305Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-0e0432c8246f889e.xml -
2025-12-04T12:15:45.4021347Z =========================== short test summary info ============================
2025-12-04T12:15:45.4022332Z FAILED [0.0015s] dynamo/test_model_output.py::TestHFPretrained::test_pretrained - AttributeError: 'PreTrainedConfig' object has no attribute 'max_length'
2025-12-04T12:15:45.4023126Z 
2025-12-04T12:15:45.4023343Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:45.4024141Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained
2025-12-04T12:15:45.4024716Z 
2025-12-04T12:15:45.4024998Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:45.4025591Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:45.4026106Z ================== 1 failed, 17 deselected, 2 rerun in 0.08s ===================
2025-12-04T12:15:45.4026549Z Got exit code 1
2025-12-04T12:15:45.4026827Z Retrying single test...
2025-12-04T12:15:45.4027547Z Test results will be stored in test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-1d824658578ee605.xml
2025-12-04T12:15:45.4028392Z ============================= test session starts ==============================
2025-12-04T12:15:45.4029058Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:45.4029663Z cachedir: .pytest_cache
2025-12-04T12:15:45.4030412Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:45.4031201Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:45.4031562Z configfile: pytest.ini
2025-12-04T12:15:45.4032329Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:45.4033281Z collecting ... collected 18 items / 17 deselected / 1 selected
2025-12-04T12:15:45.4034153Z stepcurrent: skipping 0 already run items. Running only test/dynamo/test_model_output.py::TestHFPretrained::test_pretrained
2025-12-04T12:15:45.4034931Z Running 1 items in this shard
2025-12-04T12:15:45.4035145Z 
2025-12-04T12:15:45.4035571Z dynamo/test_model_output.py::TestHFPretrained::test_pretrained ('RERUN', {'yellow': True}) [0.0298s] [100%]
2025-12-04T12:15:45.4036562Z dynamo/test_model_output.py::TestHFPretrained::test_pretrained ('RERUN', {'yellow': True}) [0.0017s] [100%]
2025-12-04T12:15:45.4037462Z dynamo/test_model_output.py::TestHFPretrained::test_pretrained FAILED [0.0015s] [100%]
2025-12-04T12:15:45.4037933Z 
2025-12-04T12:15:45.4038095Z ==================================== RERUNS ====================================
2025-12-04T12:15:45.4038626Z _______________________ TestHFPretrained.test_pretrained _______________________
2025-12-04T12:15:45.4039140Z Traceback (most recent call last):
2025-12-04T12:15:45.4039789Z   File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 45, in test_pretrained
2025-12-04T12:15:45.4040475Z     ref = fn(x, tmp)
2025-12-04T12:15:45.4040986Z   File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 40, in fn
2025-12-04T12:15:45.4041609Z     return a + torch.ones(2) * tmp.max_length
2025-12-04T12:15:45.4042445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 198, in __getattribute__
2025-12-04T12:15:45.4043264Z     return super().__getattribute__(key)
2025-12-04T12:15:45.4043838Z AttributeError: 'PreTrainedConfig' object has no attribute 'max_length'
2025-12-04T12:15:45.4044249Z 
2025-12-04T12:15:45.4044520Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:45.4045301Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained
2025-12-04T12:15:45.4045887Z 
2025-12-04T12:15:45.4046155Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:45.4046812Z _______________________ TestHFPretrained.test_pretrained _______________________
2025-12-04T12:15:45.4047334Z Traceback (most recent call last):
2025-12-04T12:15:45.4047992Z   File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 45, in test_pretrained
2025-12-04T12:15:45.4048628Z     ref = fn(x, tmp)
2025-12-04T12:15:45.4049149Z   File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 40, in fn
2025-12-04T12:15:45.4049771Z     return a + torch.ones(2) * tmp.max_length
2025-12-04T12:15:45.4050602Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 198, in __getattribute__
2025-12-04T12:15:45.4051437Z     return super().__getattribute__(key)
2025-12-04T12:15:45.4051980Z AttributeError: 'PreTrainedConfig' object has no attribute 'max_length'
2025-12-04T12:15:45.4052392Z 
2025-12-04T12:15:45.4052624Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:45.4053404Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained
2025-12-04T12:15:45.4053996Z 
2025-12-04T12:15:45.4054265Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:45.4054832Z =================================== FAILURES ===================================
2025-12-04T12:15:45.4055379Z _______________________ TestHFPretrained.test_pretrained _______________________
2025-12-04T12:15:45.4055888Z Traceback (most recent call last):
2025-12-04T12:15:45.4056694Z   File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 45, in test_pretrained
2025-12-04T12:15:45.4057354Z     ref = fn(x, tmp)
2025-12-04T12:15:45.4057858Z   File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 40, in fn
2025-12-04T12:15:45.4058484Z     return a + torch.ones(2) * tmp.max_length
2025-12-04T12:15:45.4059317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 198, in __getattribute__
2025-12-04T12:15:45.4060148Z     return super().__getattribute__(key)
2025-12-04T12:15:45.4060675Z AttributeError: 'PreTrainedConfig' object has no attribute 'max_length'
2025-12-04T12:15:45.4061102Z 
2025-12-04T12:15:45.4061321Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:45.4062114Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained
2025-12-04T12:15:45.4062689Z 
2025-12-04T12:15:45.4062972Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:45.4064096Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-1d824658578ee605.xml -
2025-12-04T12:15:45.4065125Z =========================== short test summary info ============================
2025-12-04T12:15:45.4066111Z FAILED [0.0015s] dynamo/test_model_output.py::TestHFPretrained::test_pretrained - AttributeError: 'PreTrainedConfig' object has no attribute 'max_length'
2025-12-04T12:15:45.4066974Z 
2025-12-04T12:15:45.4067208Z To execute this test, run the following from the base repo dir:
2025-12-04T12:15:45.4067986Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained
2025-12-04T12:15:45.4068573Z 
2025-12-04T12:15:45.4068840Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:15:45.4069434Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:15:45.4069997Z ================== 1 failed, 17 deselected, 2 rerun in 0.08s ===================
2025-12-04T12:15:45.4070477Z Got exit code 1
2025-12-04T12:15:45.4071156Z FAILED CONSISTENTLY: test/dynamo/test_model_output.py::TestHFPretrained::test_pretrained
2025-12-04T12:15:45.4072070Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:15:45.4073151Z Test results will be stored in test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-a0141b45c0b55065.xml
2025-12-04T12:15:45.4073999Z ============================= test session starts ==============================
2025-12-04T12:15:45.4074668Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:15:45.4075277Z cachedir: .pytest_cache
2025-12-04T12:15:45.4075975Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:15:45.4076763Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:15:45.4077121Z configfile: pytest.ini
2025-12-04T12:15:45.4077888Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:15:45.4078831Z collecting ... collected 18 items / 1 deselected / 17 selected
2025-12-04T12:15:45.4079328Z stepcurrent: skipping 1 already run items.
2025-12-04T12:15:45.4079716Z Running 17 items in this shard
2025-12-04T12:15:45.4079929Z 
2025-12-04T12:15:45.4081695Z dynamo/test_model_output.py::TestHFPretrained::test_pretrained_non_const_attr SKIPPED [0.0008s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/169481 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [  5%]
2025-12-04T12:15:45.4083930Z dynamo/test_model_output.py::TestModelOutput::test_mo_assign PASSED [0.3789s] [ 11%]
2025-12-04T12:15:45.4084813Z dynamo/test_model_output.py::TestModelOutput::test_mo_create PASSED [0.0655s] [ 17%]
2025-12-04T12:15:45.4085626Z dynamo/test_model_output.py::TestModelOutput::test_mo_from_outside PASSED [0.0463s] [ 23%]
2025-12-04T12:15:45.4086427Z dynamo/test_model_output.py::TestModelOutput::test_mo_getattr PASSED [0.0475s] [ 29%]
2025-12-04T12:15:45.4087258Z dynamo/test_model_output.py::TestModelOutput::test_mo_getattr_missing PASSED [0.0444s] [ 35%]
2025-12-04T12:15:45.4088091Z dynamo/test_model_output.py::TestModelOutput::test_mo_getitem PASSED [0.0575s] [ 41%]
2025-12-04T12:15:45.4088867Z dynamo/test_model_output.py::TestModelOutput::test_mo_index PASSED [0.0676s] [ 47%]
2025-12-04T12:15:45.4089616Z dynamo/test_model_output.py::TestModelOutput::test_mo_init PASSED [0.0647s] [ 52%]
2025-12-04T12:15:45.4090379Z dynamo/test_model_output.py::TestModelOutput::test_mo_init2 PASSED [0.0671s] [ 58%]
2025-12-04T12:15:45.4091205Z dynamo/test_model_output.py::TestModelOutput::test_mo_init_with_disable PASSED [0.1121s] [ 64%]
2025-12-04T12:15:45.4092026Z dynamo/test_model_output.py::TestModelOutput::test_mo_newkey PASSED [0.0497s] [ 70%]
2025-12-04T12:15:45.4092868Z dynamo/test_model_output.py::TestModelOutput::test_mo_reconstruct_bytecode PASSED [0.0607s] [ 76%]
2025-12-04T12:15:45.4093704Z dynamo/test_model_output.py::TestModelOutput::test_mo_tuple PASSED [0.0580s] [ 82%]
2025-12-04T12:15:45.4094443Z dynamo/test_model_output.py::TestModelOutput::test_none PASSED [0.0702s] [ 88%]
2025-12-04T12:15:45.4095267Z dynamo/test_model_output.py::TestModelOutput::test_reconstruction PASSED [0.0616s] [ 94%]
2025-12-04T12:15:45.4096200Z dynamo/test_model_output.py::TestModelOutputBertCUDA::test_HF_bert_model_output_cuda PASSED [1.0125s] [100%]
2025-12-04T12:15:45.4096868Z 
2025-12-04T12:15:45.4097586Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-a0141b45c0b55065.xml -
2025-12-04T12:15:45.4098726Z ================= 16 passed, 1 skipped, 1 deselected in 2.32s ==================
2025-12-04T12:15:45.4100259Z The following tests failed consistently: ['test/dynamo/test_model_output.py::TestHFPretrained::test_pretrained']
2025-12-04T12:15:45.4100898Z 
2025-12-04T12:15:45.4101425Z FINISHED PRINTING LOG FILE of dynamo/test_model_output 1/1 (test/test-reports/dynamo.test_model_output_1.1_9f288500c4a144e5_.log)
2025-12-04T12:15:45.4102097Z 
2025-12-04T12:15:45.4102439Z Finished dynamo/test_model_output 1/1 ... [2025-12-04 12:15:45.392447][10973.775342194], took 0.60min
2025-12-04T12:15:45.4148121Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-d1e6e78eb0372411.xml
2025-12-04T12:15:45.4916816Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-0e0432c8246f889e.xml
2025-12-04T12:15:45.5206648Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-1d824658578ee605.xml
2025-12-04T12:15:45.5519792Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-a0141b45c0b55065.xml
2025-12-04T12:15:46.0352438Z Uploading logs for 57119749248 to S3
2025-12-04T12:15:46.2388718Z Uploading artifacts took 0.65 seconds
2025-12-04T12:15:46.2389140Z dynamo/test_model_output 1/1 failed!
2025-12-04T12:15:46.2393742Z Running inductor/test_triton_kernels 1/1 ... [2025-12-04 12:15:46.239183][10974.622077718]
2025-12-04T12:15:46.2394337Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:15:46.2398919Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_triton_kernels.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:15:46.239654]
2025-12-04T12:18:45.3518341Z 
2025-12-04T12:18:45.3522289Z inductor/test_triton_kernels 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_triton_kernels_1.1_80e8269e9d3330b3_.log
2025-12-04T12:18:45.3732084Z Running 366 items in this shard: test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_False_autotune_False, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_False_autotune_True, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_True_autotune_False, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_True_autotune_True, test/inductor/test_triton_kernels.py::KernelTests::test_i64_input, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_inline_asm_quotes_double, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_inline_asm_quotes_single, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_with_docstring_quotes_double, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_with_docstring_quotes_single, test/inductor/test_triton_kernels.py::KernelTests::test_layout_constraint_needs_fixed_stride_order, test/inductor/test_triton_kernels.py::KernelTests::test_no_nan_kernels, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_inductor_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_inductor_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_inductor_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_inductor_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_dedup_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_dedup_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_triton_attrs_dict_equal_1_None_format, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_caching, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_caching_duplicate, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_constants, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dependancies, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_16_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_16_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_4_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_4_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dtype_view_cfg_cpp_wrapper, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dtype_view_cfg_normal, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_emulate_precision_mm_kernels_do_not_change, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_emulate_precision_unaffected, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_0_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_0_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_1_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_1_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_float_arg_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_float_arg_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_fallback, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float16, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float32, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float64, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_functionalize, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_global_constexpr, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_higher_order_func, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inputs_buffer_reuse, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_matmul_tracking, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multi_kernel_grad_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multi_kernel_grad_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_mutation_not_mark_dirty, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_mutation_type, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_False_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_False_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_True_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_True_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_none_args, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_out_of_order, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_reinplace_inplaceable_pass, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_slice_and_view_input, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_strided_input, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_strided_input_nonzero_offset, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_to_cpu, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_tracing_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_tracing_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_various_args, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_constexpr_function, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_imported_symbol, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_imported_symbol_with_custom_name, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_kernel_param, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::MutationTests::test_add_for_loop, test/inductor/test_triton_kernels.py::MutationTests::test_add_for_loop2, test/inductor/test_triton_kernels.py::MutationTests::test_add_kernel_on_device_tma_new_api, test/inductor/test_triton_kernels.py::MutationTests::test_add_kernel_on_device_tma_old_api, test/inductor/test_triton_kernels.py::MutationTests::test_add_nested_for_loop, test/inductor/test_triton_kernels.py::MutationTests::test_add_nested_for_loop_multi_return, test/inductor/test_triton_kernels.py::MutationTests::test_argmax, test/inductor/test_triton_kernels.py::MutationTests::test_branch_with_multiple_yield_args, test/inductor/test_triton_kernels.py::MutationTests::test_cumsum, test/inductor/test_triton_kernels.py::MutationTests::test_fn_call_multi_return, test/inductor/test_triton_kernels.py::MutationTests::test_fn_call_one_return, test/inductor/test_triton_kernels.py::MutationTests::test_for_loop_arg, test/inductor/test_triton_kernels.py::MutationTests::test_for_loop_arg_2, test/inductor/test_triton_kernels.py::MutationTests::test_get_tma_stores, test/inductor/test_triton_kernels.py::MutationTests::test_labels, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_4_times_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_2d_autotuned, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_with_block_ptr, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_with_import, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_atomic_add_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_cond_op_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_indirection_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_indirection_kernel1, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_inline_asm_kernel_is_pure_false, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_inline_asm_kernel_is_pure_true, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_kernel_with_block_ptr_2d, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_mul2_inplace_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_nested_cond_op_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_out_of_order_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_out_of_order_kernel_call, test/inductor/test_triton_kernels.py::MutationTests::test_reduce_sum, test/inductor/test_triton_kernels.py::MutationTests::test_triton_kernel_inference_mode, test/inductor/test_triton_kernels.py::MutationTests::test_while_loop, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_False_dynamic_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_False_dynamic_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_True_dynamic_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_True_dynamic_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_autotune_no_pre_or_post_hook_user_defined, test/inductor/test_triton_kernels.py::CustomOpTests::test_autotune_unbacked, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_meta, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_False_autotune_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_False_autotune_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_True_autotune_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_True_autotune_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_custom_op, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_mutable_custom_op, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_triton_kernel, test/inductor/test_triton_kernels.py::CustomOpTests::test_subclass, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_dynamic_grid_no_recompile, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_eager, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_inductor, test/inductor/test_triton_kernels.py::CustomOpTests::test_wrap_triton_disabled_in_triton_op
2025-12-04T12:18:45.3945351Z 
2025-12-04T12:18:45.3945762Z Finished inductor/test_triton_kernels 1/1 ... [2025-12-04 12:18:45.352389][11153.735282205], took 2.99min
2025-12-04T12:18:45.3947141Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_triton_kernels/inductor.test_triton_kernels-498ce8e3e7c25595.xml
2025-12-04T12:18:45.4664857Z Running inductor/test_loop_ordering 1/1 ... [2025-12-04 12:18:45.466178][11153.849072214]
2025-12-04T12:18:45.4665458Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:18:45.4668352Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_loop_ordering.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:18:45.466602]
2025-12-04T12:20:32.2915492Z 
2025-12-04T12:20:32.2916758Z PRINTING LOG FILE of inductor/test_loop_ordering 1/1 (test/test-reports/inductor.test_loop_ordering_1.1_ca0aee6babe9c71a_.log)
2025-12-04T12:20:32.2918326Z W1204 12:18:54.697000 137513 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T12:20:32.2920105Z Test results will be stored in test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-264346bf50f4314b.xml
2025-12-04T12:20:32.2921310Z ============================= test session starts ==============================
2025-12-04T12:20:32.2922425Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:20:32.2923073Z cachedir: .pytest_cache
2025-12-04T12:20:32.2923903Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:20:32.2924690Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:20:32.2925054Z configfile: pytest.ini
2025-12-04T12:20:32.2925849Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:20:32.2926691Z collecting ... collected 53 items
2025-12-04T12:20:32.2927118Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T12:20:32.2948987Z Running 53 items in this shard: test/inductor/test_loop_ordering.py::ImplDetailTest::test_merge_loops_invalidate_pw_dep_cache, test/inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_and_merge_loops, test/inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_modular_indexing, test/inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_twice, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_3dred_pw_2d_outer_red, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_apbt_realize, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_different_broadcast_shapes, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_different_reduction_order, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_for_reordering_reindex, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fp8_cast_and_t, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fp8_pattern_2, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fuse_reduction_with_tiled_pw, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fuse_with_scalar_shared_memory, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_interaction_with_multi_template, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_interaction_with_triton_template, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_keep_fake_dep, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_outer_dimension_softmax, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_outer_dimension_sum_fuse_with_pw, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_pw_outer_red, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_pw_outer_red_2, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_sum_and_t, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_view, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_coalescing, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps0, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps1, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps2, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps3, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_reduction_no_pointwise, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_reduction_pointwise, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_remapped_reads, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_remapped_reads_split, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_solve_for_tiling, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_solve_for_zero, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_tiled_coalesce_analysis_downcast_transposed_v_False, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_tiled_coalesce_analysis_downcast_transposed_v_True, test/inductor/test_loop_ordering.py::TestTiling::test_3d_pointwise, test/inductor/test_loop_ordering.py::TestTiling::test_cat, test/inductor/test_loop_ordering.py::TestTiling::test_find_broadcast_var, test/inductor/test_loop_ordering.py::TestTiling::test_mutation_deps, test/inductor/test_loop_ordering.py::TestTiling::test_penalized_small_dim, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_NHWC, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_T, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_cont, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_NHWC, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_T, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_cont, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_NHWC, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_T, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_cont, test/inductor/test_loop_ordering.py::TestTiling::test_tiled_reduction, test/inductor/test_loop_ordering.py::TestIndexInversion::test_inversion_cases, test/inductor/test_loop_ordering.py::TestIndexInversion::test_original_complex_expression
2025-12-04T12:20:32.2970495Z 
2025-12-04T12:20:32.2971441Z inductor/test_loop_ordering.py::ImplDetailTest::test_merge_loops_invalidate_pw_dep_cache PASSED [0.0475s] [  1%]
2025-12-04T12:20:32.2972605Z inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_and_merge_loops PASSED [0.0100s] [  3%]
2025-12-04T12:20:32.2973536Z inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_modular_indexing PASSED [0.0643s] [  5%]
2025-12-04T12:20:32.2974520Z inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_twice PASSED [0.0134s] [  7%]
2025-12-04T12:20:32.2975838Z inductor/test_loop_ordering.py::LoopOrderingTest::test_3dred_pw_2d_outer_red I1204 12:19:00.156000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.2978414Z I1204 12:19:00.156000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 1081600, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1), 266240), (SchedulerNode(name='op2'), 4160)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1), 0.003327334533093381), (SchedulerNode(name='op2'), 5.198960207958408e-05)]}
2025-12-04T12:20:32.2980305Z PASSED [5.2497s] [  9%]
2025-12-04T12:20:32.2981500Z inductor/test_loop_ordering.py::LoopOrderingTest::test_apbt_realize I1204 12:19:00.965000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.2983883Z I1204 12:19:00.965000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 25165824, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op1_op0_op2), 6291456)], 'node_runtimes': [(FusedSchedulerNode(nodes=op1_op0_op2), 0.07862747450509898)]}
2025-12-04T12:20:32.2985355Z PASSED [0.9374s] [ 11%]
2025-12-04T12:20:32.2986426Z inductor/test_loop_ordering.py::LoopOrderingTest::test_different_broadcast_shapes I1204 12:19:01.552000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.2988821Z I1204 12:19:01.552000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 16785408, 'nodes_num_elem': [(SchedulerNode(name='op0'), 2098176), (SchedulerNode(name='op1'), 2098176)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.026221955608878224), (SchedulerNode(name='op1'), 0.026221955608878224)]}
2025-12-04T12:20:32.2990434Z PASSED [0.4485s] [ 13%]
2025-12-04T12:20:32.2991482Z inductor/test_loop_ordering.py::LoopOrderingTest::test_different_reduction_order I1204 12:19:01.970000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.2993955Z I1204 12:19:01.970000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 16789504, 'nodes_num_elem': [(SchedulerNode(name='op0'), 2099200), (SchedulerNode(name='op1'), 2098176)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.02623475304939012), (SchedulerNode(name='op1'), 0.026221955608878224)]}
2025-12-04T12:20:32.2995584Z PASSED [0.3668s] [ 15%]
2025-12-04T12:20:32.2996633Z inductor/test_loop_ordering.py::LoopOrderingTest::test_for_reordering_reindex W1204 12:19:02.964000 137513 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T12:20:32.2998065Z I1204 12:19:03.341000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3000289Z I1204 12:19:03.341000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 54400, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1), 12400), (ExternKernelSchedulerNode(name='op2'), 1200)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1), 0.00015496900619876023), (ExternKernelSchedulerNode(name='op2'), 0.0019654088050314465)]}
2025-12-04T12:20:32.3002147Z PASSED [1.4408s] [ 16%]
2025-12-04T12:20:32.3002857Z inductor/test_loop_ordering.py::LoopOrderingTest::test_fp8_cast_and_t SKIPPED [0.0003s] (FP8 requires H100+ and MI300+) [ 18%]
2025-12-04T12:20:32.3004016Z inductor/test_loop_ordering.py::LoopOrderingTest::test_fp8_pattern_2 SKIPPED [0.0003s] (FP8 requires H100+ and MI300+) [ 20%]
2025-12-04T12:20:32.3005528Z inductor/test_loop_ordering.py::LoopOrderingTest::test_fuse_reduction_with_tiled_pw I1204 12:19:04.677000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3008060Z I1204 12:19:04.677000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 241604, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op2_op3), 60200), (SchedulerNode(name='op1'), 201)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op2_op3), 0.0007523495300939811), (SchedulerNode(name='op1'), 2.511997600479904e-06)]}
2025-12-04T12:20:32.3009823Z PASSED [1.3263s] [ 22%]
2025-12-04T12:20:32.3010931Z inductor/test_loop_ordering.py::LoopOrderingTest::test_fuse_with_scalar_shared_memory I1204 12:19:05.376000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3013017Z I1204 12:19:05.376000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 104, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1), 26)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1), 3.249350129974005e-07)]}
2025-12-04T12:20:32.3014323Z PASSED [0.6280s] [ 24%]
2025-12-04T12:20:32.3015127Z inductor/test_loop_ordering.py::LoopOrderingTest::test_interaction_with_multi_template SKIPPED [0.0003s] (Need big gpu for max-autotune) [ 26%]
2025-12-04T12:20:32.3016639Z inductor/test_loop_ordering.py::LoopOrderingTest::test_interaction_with_triton_template SKIPPED [0.0002s] (Need big gpu for max-autotune) [ 28%]
2025-12-04T12:20:32.3018213Z inductor/test_loop_ordering.py::LoopOrderingTest::test_keep_fake_dep I1204 12:19:07.029000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0_1] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3021069Z I1204 12:19:07.029000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0_1] [__inductor_metrics] {'num_bytes_accessed': 2068996, 'nodes_num_elem': [(SchedulerNode(name='op1'), 66560), (FusedSchedulerNode(nodes=op0_op3_op5_op2_op6), 446592), (SchedulerNode(name='op7'), 4097)], 'node_runtimes': [(SchedulerNode(name='op1'), 0.0008318336332733453), (FusedSchedulerNode(nodes=op0_op3_op5_op2_op6), 0.00558128374325135), (SchedulerNode(name='op7'), 5.120225954809038e-05)]}
2025-12-04T12:20:32.3023591Z I1204 12:19:08.524000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0_1] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3027740Z I1204 12:19:08.524000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0_1] [__inductor_metrics] {'num_bytes_accessed': 6542092, 'nodes_num_elem': [(SchedulerNode(name='op9'), 131072), (FusedSchedulerNode(nodes=op0_op1_op2_op10), 835649), (SchedulerNode(name='op3'), 305153), (SchedulerNode(name='op4'), 2112), (SchedulerNode(name='op5'), 65), (SchedulerNode(name='op7'), 32768), (FusedSchedulerNode(nodes=op6_op8), 328704)], 'node_runtimes': [(SchedulerNode(name='op9'), 0.0016380723855228953), (FusedSchedulerNode(nodes=op0_op1_op2_op10), 0.010443523795240951), (SchedulerNode(name='op3'), 0.00381364977004599), (SchedulerNode(name='op4'), 2.639472105578884e-05), (SchedulerNode(name='op5'), 8.123375324935012e-07), (SchedulerNode(name='op7'), 0.0004095180963807238), (FusedSchedulerNode(nodes=op6_op8), 0.0041079784043191354)]}
2025-12-04T12:20:32.3031390Z PASSED [3.2679s] [ 30%]
2025-12-04T12:20:32.3032422Z inductor/test_loop_ordering.py::LoopOrderingTest::test_outer_dimension_softmax I1204 12:19:09.876000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3034580Z I1204 12:19:09.876000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 536870912, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1_op2), 134217728)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1_op2), 1.6773861227754447)]}
2025-12-04T12:20:32.3035981Z PASSED [1.6114s] [ 32%]
2025-12-04T12:20:32.3037044Z inductor/test_loop_ordering.py::LoopOrderingTest::test_outer_dimension_sum_fuse_with_pw I1204 12:19:11.094000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3039249Z I1204 12:19:11.094000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 536870912, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1), 134217728)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1), 1.6773861227754447)]}
2025-12-04T12:20:32.3040594Z PASSED [1.0912s] [ 33%]
2025-12-04T12:20:32.3041562Z inductor/test_loop_ordering.py::LoopOrderingTest::test_pw_outer_red I1204 12:19:12.222000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3043681Z I1204 12:19:12.222000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 18432, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1), 4608)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1), 5.758848230353929e-05)]}
2025-12-04T12:20:32.3045017Z PASSED [0.9092s] [ 35%]
2025-12-04T12:20:32.3045975Z inductor/test_loop_ordering.py::LoopOrderingTest::test_pw_outer_red_2 I1204 12:19:12.773000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3048063Z I1204 12:19:12.773000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 18432, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1_op2_op3), 4608)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1_op2_op3), 5.758848230353929e-05)]}
2025-12-04T12:20:32.3049474Z PASSED [0.5478s] [ 37%]
2025-12-04T12:20:32.3050429Z inductor/test_loop_ordering.py::LoopOrderingTest::test_sum_and_t I1204 12:19:13.579000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3052446Z I1204 12:19:13.579000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 16781312, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1), 4195328)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1), 0.05243111377724455)]}
2025-12-04T12:20:32.3053791Z PASSED [0.9096s] [ 39%]
2025-12-04T12:20:32.3054736Z inductor/test_loop_ordering.py::LoopOrderingTest::test_view I1204 12:19:14.015000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3056827Z I1204 12:19:14.015000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 1200, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1), 300)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1), 3.7492501499700058e-06)]}
2025-12-04T12:20:32.3058171Z PASSED [0.2883s] [ 41%]
2025-12-04T12:20:32.3058724Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_coalescing PASSED [0.0090s] [ 43%]
2025-12-04T12:20:32.3060071Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling W1204 12:19:14.132000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3061478Z W1204 12:19:14.132000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3062876Z W1204 12:19:14.132000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3064170Z W1204 12:19:14.132000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3065447Z W1204 12:19:14.132000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling.<locals>.fn'
2025-12-04T12:20:32.3066767Z W1204 12:19:14.161000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3067767Z W1204 12:19:14.161000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3069149Z W1204 12:19:14.161000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3070485Z W1204 12:19:14.161000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3072481Z W1204 12:19:14.161000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling.<locals>.fn'
2025-12-04T12:20:32.3073636Z ('RERUN', {'yellow': True}) [0.1358s] [ 45%]
2025-12-04T12:20:32.3074807Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling W1204 12:19:14.265000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3076198Z W1204 12:19:14.265000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3078145Z W1204 12:19:14.265000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3080105Z W1204 12:19:14.265000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3081970Z W1204 12:19:14.265000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling.<locals>.fn'
2025-12-04T12:20:32.3083919Z W1204 12:19:14.290000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3085483Z W1204 12:19:14.290000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3087712Z W1204 12:19:14.290000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3090044Z W1204 12:19:14.290000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3092330Z W1204 12:19:14.290000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling.<locals>.fn'
2025-12-04T12:20:32.3094177Z ('RERUN', {'yellow': True}) [0.0967s] [ 45%]
2025-12-04T12:20:32.3096497Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling W1204 12:19:14.362000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3099575Z W1204 12:19:14.362000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3102613Z W1204 12:19:14.362000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3105547Z W1204 12:19:14.362000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3108059Z W1204 12:19:14.362000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling.<locals>.fn'
2025-12-04T12:20:32.3110784Z W1204 12:19:14.387000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3112734Z W1204 12:19:14.387000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3115157Z W1204 12:19:14.387000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3116658Z W1204 12:19:14.387000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3118146Z W1204 12:19:14.387000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling.<locals>.fn'
2025-12-04T12:20:32.3119499Z FAILED [0.0939s] [ 45%]
2025-12-04T12:20:32.3119710Z 
2025-12-04T12:20:32.3119879Z ==================================== RERUNS ====================================
2025-12-04T12:20:32.3120532Z ________________ MemoryCoalescingTest.test_induced_fused_tiling ________________
2025-12-04T12:20:32.3121213Z Traceback (most recent call last):
2025-12-04T12:20:32.3122107Z   File "/var/lib/jenkins/workspace/test/inductor/test_loop_ordering.py", line 1042, in test_induced_fused_tiling
2025-12-04T12:20:32.3122951Z     out, code = run_and_get_code(torch.compile(forward), (permute))
2025-12-04T12:20:32.3123786Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T12:20:32.3124518Z     result = fn(*args, **kwargs)
2025-12-04T12:20:32.3125227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:20:32.3126089Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:20:32.3126989Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T12:20:32.3128000Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:20:32.3128830Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T12:20:32.3129631Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:20:32.3130456Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:20:32.3131459Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:20:32.3132431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T12:20:32.3133225Z     _check_triton_bf16_support(graph)
2025-12-04T12:20:32.3134029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T12:20:32.3134845Z     warn_and_skip(node.get_device())
2025-12-04T12:20:32.3135568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T12:20:32.3136490Z     raise SkipFrame("BF16 is not supported")
2025-12-04T12:20:32.3137033Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T12:20:32.3137424Z 
2025-12-04T12:20:32.3138228Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:20:32.3139172Z 
2025-12-04T12:20:32.3139178Z 
2025-12-04T12:20:32.3139399Z To execute this test, run the following from the base repo dir:
2025-12-04T12:20:32.3140287Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling
2025-12-04T12:20:32.3140961Z 
2025-12-04T12:20:32.3141234Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:20:32.3141877Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:20:32.3142346Z frames [('total', 1)]
2025-12-04T12:20:32.3142649Z stats [('calls_captured', 3)]
2025-12-04T12:20:32.3143105Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)]
2025-12-04T12:20:32.3143816Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)]
2025-12-04T12:20:32.3144414Z graph_break []
2025-12-04T12:20:32.3144797Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:20:32.3145895Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:20:32.3146913Z   warnings.warn(
2025-12-04T12:20:32.3147443Z ________________ MemoryCoalescingTest.test_induced_fused_tiling ________________
2025-12-04T12:20:32.3148174Z Traceback (most recent call last):
2025-12-04T12:20:32.3148893Z   File "/var/lib/jenkins/workspace/test/inductor/test_loop_ordering.py", line 1042, in test_induced_fused_tiling
2025-12-04T12:20:32.3149794Z     out, code = run_and_get_code(torch.compile(forward), (permute))
2025-12-04T12:20:32.3150674Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T12:20:32.3151414Z     result = fn(*args, **kwargs)
2025-12-04T12:20:32.3152287Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:20:32.3153172Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:20:32.3154078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T12:20:32.3154913Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:20:32.3155771Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T12:20:32.3156572Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:20:32.3157398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:20:32.3158383Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:20:32.3159379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T12:20:32.3160174Z     _check_triton_bf16_support(graph)
2025-12-04T12:20:32.3160981Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T12:20:32.3161785Z     warn_and_skip(node.get_device())
2025-12-04T12:20:32.3162520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T12:20:32.3163293Z     raise SkipFrame("BF16 is not supported")
2025-12-04T12:20:32.3163811Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T12:20:32.3164206Z 
2025-12-04T12:20:32.3164969Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:20:32.3165834Z 
2025-12-04T12:20:32.3165839Z 
2025-12-04T12:20:32.3166057Z To execute this test, run the following from the base repo dir:
2025-12-04T12:20:32.3166932Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling
2025-12-04T12:20:32.3167593Z 
2025-12-04T12:20:32.3167878Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:20:32.3168510Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:20:32.3168985Z frames [('total', 1)]
2025-12-04T12:20:32.3169291Z stats [('calls_captured', 3)]
2025-12-04T12:20:32.3169724Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)]
2025-12-04T12:20:32.3170451Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)]
2025-12-04T12:20:32.3171257Z graph_break []
2025-12-04T12:20:32.3171644Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:20:32.3172731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:20:32.3173710Z   warnings.warn(
2025-12-04T12:20:32.3174233Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:20:32.3174700Z frames [('total', 1)]
2025-12-04T12:20:32.3175007Z stats [('calls_captured', 3)]
2025-12-04T12:20:32.3175463Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)]
2025-12-04T12:20:32.3176191Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)]
2025-12-04T12:20:32.3176907Z graph_break []
2025-12-04T12:20:32.3177293Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:20:32.3178447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:20:32.3179407Z   warnings.warn(
2025-12-04T12:20:32.3179724Z =================================== FAILURES ===================================
2025-12-04T12:20:32.3180298Z ________________ MemoryCoalescingTest.test_induced_fused_tiling ________________
2025-12-04T12:20:32.3180838Z Traceback (most recent call last):
2025-12-04T12:20:32.3181648Z   File "/var/lib/jenkins/workspace/test/inductor/test_loop_ordering.py", line 1042, in test_induced_fused_tiling
2025-12-04T12:20:32.3182497Z     out, code = run_and_get_code(torch.compile(forward), (permute))
2025-12-04T12:20:32.3183318Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T12:20:32.3184061Z     result = fn(*args, **kwargs)
2025-12-04T12:20:32.3184781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:20:32.3185660Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:20:32.3186550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T12:20:32.3187399Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:20:32.3188247Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T12:20:32.3189034Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:20:32.3189864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:20:32.3190870Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:20:32.3191922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T12:20:32.3192708Z     _check_triton_bf16_support(graph)
2025-12-04T12:20:32.3193520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T12:20:32.3194341Z     warn_and_skip(node.get_device())
2025-12-04T12:20:32.3195080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T12:20:32.3195843Z     raise SkipFrame("BF16 is not supported")
2025-12-04T12:20:32.3196377Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T12:20:32.3196766Z 
2025-12-04T12:20:32.3197491Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:20:32.3198340Z 
2025-12-04T12:20:32.3198345Z 
2025-12-04T12:20:32.3198585Z To execute this test, run the following from the base repo dir:
2025-12-04T12:20:32.3199455Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling
2025-12-04T12:20:32.3200125Z 
2025-12-04T12:20:32.3200395Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:20:32.3201023Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:20:32.3201532Z frames [('total', 1)]
2025-12-04T12:20:32.3201815Z stats [('calls_captured', 3)]
2025-12-04T12:20:32.3202261Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)]
2025-12-04T12:20:32.3202975Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)]
2025-12-04T12:20:32.3203556Z graph_break []
2025-12-04T12:20:32.3203971Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:20:32.3205107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:20:32.3206077Z   warnings.warn(
2025-12-04T12:20:32.3206450Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:20:32.3206915Z frames [('total', 1)]
2025-12-04T12:20:32.3207219Z stats [('calls_captured', 3)]
2025-12-04T12:20:32.3207658Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)]
2025-12-04T12:20:32.3208379Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)]
2025-12-04T12:20:32.3208979Z graph_break []
2025-12-04T12:20:32.3209339Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:20:32.3210433Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:20:32.3211404Z   warnings.warn(
2025-12-04T12:20:32.3211788Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:20:32.3212242Z frames [('total', 1)]
2025-12-04T12:20:32.3212541Z stats [('calls_captured', 3)]
2025-12-04T12:20:32.3212987Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)]
2025-12-04T12:20:32.3213716Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)]
2025-12-04T12:20:32.3214319Z graph_break []
2025-12-04T12:20:32.3214697Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:20:32.3215794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:20:32.3216857Z   warnings.warn(
2025-12-04T12:20:32.3217846Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-264346bf50f4314b.xml -
2025-12-04T12:20:32.3218916Z =========================== short test summary info ============================
2025-12-04T12:20:32.3219986Z FAILED [0.0939s] inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T12:20:32.3220840Z 
2025-12-04T12:20:32.3221547Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:20:32.3222405Z 
2025-12-04T12:20:32.3222409Z 
2025-12-04T12:20:32.3222633Z To execute this test, run the following from the base repo dir:
2025-12-04T12:20:32.3223515Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling
2025-12-04T12:20:32.3224174Z 
2025-12-04T12:20:32.3224458Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:20:32.3225040Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:20:32.3225579Z ============== 1 failed, 19 passed, 4 skipped, 2 rerun in 19.61s ===============
2025-12-04T12:20:32.3226035Z Got exit code 1
2025-12-04T12:20:32.3226313Z Retrying single test...
2025-12-04T12:20:32.3226944Z W1204 12:19:26.073000 138524 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T12:20:32.3228156Z Test results will be stored in test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-f4f4ac9590e83730.xml
2025-12-04T12:20:32.3229050Z ============================= test session starts ==============================
2025-12-04T12:20:32.3229704Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:20:32.3230354Z cachedir: .pytest_cache
2025-12-04T12:20:32.3231106Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:20:32.3231895Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:20:32.3232235Z configfile: pytest.ini
2025-12-04T12:20:32.3233008Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:20:32.3233948Z collecting ... collected 53 items / 52 deselected / 1 selected
2025-12-04T12:20:32.3234900Z stepcurrent: skipping 23 already run items. Running only test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling
2025-12-04T12:20:32.3235761Z Running 1 items in this shard
2025-12-04T12:20:32.3235989Z 
2025-12-04T12:20:32.3236807Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling W1204 12:19:30.302000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3238214Z W1204 12:19:30.302000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3239607Z W1204 12:19:30.302000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3240901Z W1204 12:19:30.302000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3242190Z W1204 12:19:30.302000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling.<locals>.fn'
2025-12-04T12:20:32.3243514Z W1204 12:19:30.545000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3244513Z W1204 12:19:30.545000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3245931Z W1204 12:19:30.545000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3247233Z W1204 12:19:30.545000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3248516Z W1204 12:19:30.545000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling.<locals>.fn'
2025-12-04T12:20:32.3249562Z ('RERUN', {'yellow': True}) [4.4104s] [100%]
2025-12-04T12:20:32.3250632Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling W1204 12:19:30.896000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3252030Z W1204 12:19:30.896000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3253527Z W1204 12:19:30.896000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3254817Z W1204 12:19:30.896000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3256098Z W1204 12:19:30.896000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling.<locals>.fn'
2025-12-04T12:20:32.3257528Z W1204 12:19:30.922000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3258530Z W1204 12:19:30.922000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3259921Z W1204 12:19:30.922000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3261290Z W1204 12:19:30.922000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3262556Z W1204 12:19:30.922000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling.<locals>.fn'
2025-12-04T12:20:32.3263607Z ('RERUN', {'yellow': True}) [0.2820s] [100%]
2025-12-04T12:20:32.3264699Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling W1204 12:19:30.995000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3266109Z W1204 12:19:30.995000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3267501Z W1204 12:19:30.995000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3268813Z W1204 12:19:30.995000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3270099Z W1204 12:19:30.995000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling.<locals>.fn'
2025-12-04T12:20:32.3271620Z W1204 12:19:31.020000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3272610Z W1204 12:19:31.020000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3273999Z W1204 12:19:31.020000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3275293Z W1204 12:19:31.020000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3276718Z W1204 12:19:31.020000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling.<locals>.fn'
2025-12-04T12:20:32.3277753Z FAILED [0.0955s] [100%]
2025-12-04T12:20:32.3277936Z 
2025-12-04T12:20:32.3278083Z ==================================== RERUNS ====================================
2025-12-04T12:20:32.3278658Z ________________ MemoryCoalescingTest.test_induced_fused_tiling ________________
2025-12-04T12:20:32.3279206Z Traceback (most recent call last):
2025-12-04T12:20:32.3279932Z   File "/var/lib/jenkins/workspace/test/inductor/test_loop_ordering.py", line 1042, in test_induced_fused_tiling
2025-12-04T12:20:32.3280782Z     out, code = run_and_get_code(torch.compile(forward), (permute))
2025-12-04T12:20:32.3281620Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T12:20:32.3282353Z     result = fn(*args, **kwargs)
2025-12-04T12:20:32.3283064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:20:32.3283949Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:20:32.3284854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T12:20:32.3285700Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:20:32.3286522Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T12:20:32.3287386Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:20:32.3288210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:20:32.3289197Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:20:32.3290246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T12:20:32.3291086Z     _check_triton_bf16_support(graph)
2025-12-04T12:20:32.3291884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T12:20:32.3292690Z     warn_and_skip(node.get_device())
2025-12-04T12:20:32.3293427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T12:20:32.3294201Z     raise SkipFrame("BF16 is not supported")
2025-12-04T12:20:32.3294716Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T12:20:32.3295121Z 
2025-12-04T12:20:32.3295835Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:20:32.3296766Z 
2025-12-04T12:20:32.3296771Z 
2025-12-04T12:20:32.3296992Z To execute this test, run the following from the base repo dir:
2025-12-04T12:20:32.3297877Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling
2025-12-04T12:20:32.3298534Z 
2025-12-04T12:20:32.3298817Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:20:32.3299440Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:20:32.3299913Z frames [('total', 1)]
2025-12-04T12:20:32.3300216Z stats [('calls_captured', 3)]
2025-12-04T12:20:32.3300777Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)]
2025-12-04T12:20:32.3301504Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)]
2025-12-04T12:20:32.3301974Z graph_break []
2025-12-04T12:20:32.3302360Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:20:32.3303506Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:20:32.3304481Z   warnings.warn(
2025-12-04T12:20:32.3304926Z ________________ MemoryCoalescingTest.test_induced_fused_tiling ________________
2025-12-04T12:20:32.3305455Z Traceback (most recent call last):
2025-12-04T12:20:32.3306196Z   File "/var/lib/jenkins/workspace/test/inductor/test_loop_ordering.py", line 1042, in test_induced_fused_tiling
2025-12-04T12:20:32.3307049Z     out, code = run_and_get_code(torch.compile(forward), (permute))
2025-12-04T12:20:32.3307872Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T12:20:32.3308590Z     result = fn(*args, **kwargs)
2025-12-04T12:20:32.3309300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:20:32.3310184Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:20:32.3311087Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T12:20:32.3311920Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:20:32.3312757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T12:20:32.3313554Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:20:32.3314400Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:20:32.3315395Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:20:32.3316391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T12:20:32.3317216Z     _check_triton_bf16_support(graph)
2025-12-04T12:20:32.3318047Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T12:20:32.3318866Z     warn_and_skip(node.get_device())
2025-12-04T12:20:32.3319596Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T12:20:32.3320363Z     raise SkipFrame("BF16 is not supported")
2025-12-04T12:20:32.3320872Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T12:20:32.3321277Z 
2025-12-04T12:20:32.3321987Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:20:32.3322829Z 
2025-12-04T12:20:32.3322851Z 
2025-12-04T12:20:32.3323072Z To execute this test, run the following from the base repo dir:
2025-12-04T12:20:32.3323950Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling
2025-12-04T12:20:32.3324611Z 
2025-12-04T12:20:32.3324879Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:20:32.3325512Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:20:32.3325981Z frames [('total', 1)]
2025-12-04T12:20:32.3326270Z stats [('calls_captured', 3)]
2025-12-04T12:20:32.3326842Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)]
2025-12-04T12:20:32.3327572Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)]
2025-12-04T12:20:32.3328048Z graph_break []
2025-12-04T12:20:32.3328420Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:20:32.3329514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:20:32.3330490Z   warnings.warn(
2025-12-04T12:20:32.3330904Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:20:32.3331378Z frames [('total', 1)]
2025-12-04T12:20:32.3331682Z stats [('calls_captured', 3)]
2025-12-04T12:20:32.3332130Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)]
2025-12-04T12:20:32.3332837Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)]
2025-12-04T12:20:32.3333433Z graph_break []
2025-12-04T12:20:32.3333814Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:20:32.3334888Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:20:32.3335853Z   warnings.warn(
2025-12-04T12:20:32.3336170Z =================================== FAILURES ===================================
2025-12-04T12:20:32.3336835Z ________________ MemoryCoalescingTest.test_induced_fused_tiling ________________
2025-12-04T12:20:32.3337384Z Traceback (most recent call last):
2025-12-04T12:20:32.3338123Z   File "/var/lib/jenkins/workspace/test/inductor/test_loop_ordering.py", line 1042, in test_induced_fused_tiling
2025-12-04T12:20:32.3338969Z     out, code = run_and_get_code(torch.compile(forward), (permute))
2025-12-04T12:20:32.3339796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T12:20:32.3340583Z     result = fn(*args, **kwargs)
2025-12-04T12:20:32.3341294Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:20:32.3342175Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:20:32.3343069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T12:20:32.3343955Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:20:32.3344835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T12:20:32.3345638Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:20:32.3346451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:20:32.3347455Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:20:32.3348455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T12:20:32.3349250Z     _check_triton_bf16_support(graph)
2025-12-04T12:20:32.3350043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T12:20:32.3350872Z     warn_and_skip(node.get_device())
2025-12-04T12:20:32.3351613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T12:20:32.3352371Z     raise SkipFrame("BF16 is not supported")
2025-12-04T12:20:32.3352901Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T12:20:32.3353301Z 
2025-12-04T12:20:32.3354015Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:20:32.3354858Z 
2025-12-04T12:20:32.3354863Z 
2025-12-04T12:20:32.3355096Z To execute this test, run the following from the base repo dir:
2025-12-04T12:20:32.3355968Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling
2025-12-04T12:20:32.3356634Z 
2025-12-04T12:20:32.3356901Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:20:32.3357539Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:20:32.3358047Z frames [('total', 1)]
2025-12-04T12:20:32.3358333Z stats [('calls_captured', 3)]
2025-12-04T12:20:32.3358903Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)]
2025-12-04T12:20:32.3359624Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)]
2025-12-04T12:20:32.3360080Z graph_break []
2025-12-04T12:20:32.3360459Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:20:32.3361556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:20:32.3362523Z   warnings.warn(
2025-12-04T12:20:32.3362893Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:20:32.3363360Z frames [('total', 1)]
2025-12-04T12:20:32.3363666Z stats [('calls_captured', 3)]
2025-12-04T12:20:32.3364101Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)]
2025-12-04T12:20:32.3364834Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)]
2025-12-04T12:20:32.3365432Z graph_break []
2025-12-04T12:20:32.3365810Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:20:32.3366882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:20:32.3367895Z   warnings.warn(
2025-12-04T12:20:32.3368279Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:20:32.3368733Z frames [('total', 1)]
2025-12-04T12:20:32.3369034Z stats [('calls_captured', 3)]
2025-12-04T12:20:32.3369481Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)]
2025-12-04T12:20:32.3370241Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)]
2025-12-04T12:20:32.3370834Z graph_break []
2025-12-04T12:20:32.3371592Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:20:32.3372690Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:20:32.3373651Z   warnings.warn(
2025-12-04T12:20:32.3374587Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-f4f4ac9590e83730.xml -
2025-12-04T12:20:32.3375668Z =========================== short test summary info ============================
2025-12-04T12:20:32.3376791Z FAILED [0.0955s] inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T12:20:32.3377652Z 
2025-12-04T12:20:32.3378388Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:20:32.3379238Z 
2025-12-04T12:20:32.3379243Z 
2025-12-04T12:20:32.3379465Z To execute this test, run the following from the base repo dir:
2025-12-04T12:20:32.3380347Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling
2025-12-04T12:20:32.3381016Z 
2025-12-04T12:20:32.3381288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:20:32.3381885Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:20:32.3382404Z ================== 1 failed, 52 deselected, 2 rerun in 4.86s ===================
2025-12-04T12:20:32.3382850Z Got exit code 1
2025-12-04T12:20:32.3383126Z Retrying single test...
2025-12-04T12:20:32.3383762Z W1204 12:19:45.879000 138693 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T12:20:32.3384999Z Test results will be stored in test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-193da166d9e268ac.xml
2025-12-04T12:20:32.3385888Z ============================= test session starts ==============================
2025-12-04T12:20:32.3386558Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:20:32.3387149Z cachedir: .pytest_cache
2025-12-04T12:20:32.3387863Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:20:32.3388655Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:20:32.3388998Z configfile: pytest.ini
2025-12-04T12:20:32.3389774Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:20:32.3390730Z collecting ... collected 53 items / 52 deselected / 1 selected
2025-12-04T12:20:32.3391690Z stepcurrent: skipping 23 already run items. Running only test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling
2025-12-04T12:20:32.3392531Z Running 1 items in this shard
2025-12-04T12:20:32.3392757Z 
2025-12-04T12:20:32.3393574Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling W1204 12:19:50.070000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3395035Z W1204 12:19:50.070000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3396421Z W1204 12:19:50.070000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3397700Z W1204 12:19:50.070000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3399076Z W1204 12:19:50.070000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling.<locals>.fn'
2025-12-04T12:20:32.3400396Z W1204 12:19:50.307000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3401394Z W1204 12:19:50.307000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3402774Z W1204 12:19:50.307000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3404052Z W1204 12:19:50.307000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3405329Z W1204 12:19:50.307000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling.<locals>.fn'
2025-12-04T12:20:32.3406379Z ('RERUN', {'yellow': True}) [4.3697s] [100%]
2025-12-04T12:20:32.3407460Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling W1204 12:19:50.660000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3408844Z W1204 12:19:50.660000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3410224Z W1204 12:19:50.660000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3426794Z W1204 12:19:50.660000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3428117Z W1204 12:19:50.660000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling.<locals>.fn'
2025-12-04T12:20:32.3429563Z W1204 12:19:50.686000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3430576Z W1204 12:19:50.686000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3431972Z W1204 12:19:50.686000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3433270Z W1204 12:19:50.686000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3434550Z W1204 12:19:50.686000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling.<locals>.fn'
2025-12-04T12:20:32.3435597Z ('RERUN', {'yellow': True}) [0.2848s] [100%]
2025-12-04T12:20:32.3436703Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling W1204 12:19:50.757000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3438105Z W1204 12:19:50.757000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3439483Z W1204 12:19:50.757000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3440823Z W1204 12:19:50.757000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3442106Z W1204 12:19:50.757000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling.<locals>.fn'
2025-12-04T12:20:32.3443423Z W1204 12:19:50.782000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3444527Z W1204 12:19:50.782000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3445907Z W1204 12:19:50.782000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3447193Z W1204 12:19:50.782000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3448483Z W1204 12:19:50.782000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling.<locals>.fn'
2025-12-04T12:20:32.3449503Z FAILED [0.0949s] [100%]
2025-12-04T12:20:32.3449689Z 
2025-12-04T12:20:32.3449838Z ==================================== RERUNS ====================================
2025-12-04T12:20:32.3450418Z ________________ MemoryCoalescingTest.test_induced_fused_tiling ________________
2025-12-04T12:20:32.3450972Z Traceback (most recent call last):
2025-12-04T12:20:32.3451698Z   File "/var/lib/jenkins/workspace/test/inductor/test_loop_ordering.py", line 1042, in test_induced_fused_tiling
2025-12-04T12:20:32.3452533Z     out, code = run_and_get_code(torch.compile(forward), (permute))
2025-12-04T12:20:32.3453361Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T12:20:32.3454106Z     result = fn(*args, **kwargs)
2025-12-04T12:20:32.3454798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:20:32.3455671Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:20:32.3456693Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T12:20:32.3457547Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:20:32.3458419Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T12:20:32.3459216Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:20:32.3460037Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:20:32.3461021Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:20:32.3462013Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T12:20:32.3462812Z     _check_triton_bf16_support(graph)
2025-12-04T12:20:32.3463611Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T12:20:32.3464414Z     warn_and_skip(node.get_device())
2025-12-04T12:20:32.3465147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T12:20:32.3465923Z     raise SkipFrame("BF16 is not supported")
2025-12-04T12:20:32.3466447Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T12:20:32.3466830Z 
2025-12-04T12:20:32.3467541Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:20:32.3468394Z 
2025-12-04T12:20:32.3468399Z 
2025-12-04T12:20:32.3468657Z To execute this test, run the following from the base repo dir:
2025-12-04T12:20:32.3469535Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling
2025-12-04T12:20:32.3470197Z 
2025-12-04T12:20:32.3470477Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:20:32.3473092Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:20:32.3473888Z frames [('total', 1)]
2025-12-04T12:20:32.3474202Z stats [('calls_captured', 3)]
2025-12-04T12:20:32.3474830Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)]
2025-12-04T12:20:32.3475569Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)]
2025-12-04T12:20:32.3476044Z graph_break []
2025-12-04T12:20:32.3476432Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:20:32.3477532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:20:32.3478511Z   warnings.warn(
2025-12-04T12:20:32.3478952Z ________________ MemoryCoalescingTest.test_induced_fused_tiling ________________
2025-12-04T12:20:32.3479483Z Traceback (most recent call last):
2025-12-04T12:20:32.3480213Z   File "/var/lib/jenkins/workspace/test/inductor/test_loop_ordering.py", line 1042, in test_induced_fused_tiling
2025-12-04T12:20:32.3481062Z     out, code = run_and_get_code(torch.compile(forward), (permute))
2025-12-04T12:20:32.3481893Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T12:20:32.3482619Z     result = fn(*args, **kwargs)
2025-12-04T12:20:32.3483327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:20:32.3484207Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:20:32.3485110Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T12:20:32.3485939Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:20:32.3486778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T12:20:32.3487578Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:20:32.3488439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:20:32.3489438Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:20:32.3490422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T12:20:32.3491220Z     _check_triton_bf16_support(graph)
2025-12-04T12:20:32.3492007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T12:20:32.3492823Z     warn_and_skip(node.get_device())
2025-12-04T12:20:32.3493549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T12:20:32.3494319Z     raise SkipFrame("BF16 is not supported")
2025-12-04T12:20:32.3494835Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T12:20:32.3495232Z 
2025-12-04T12:20:32.3495960Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:20:32.3496902Z 
2025-12-04T12:20:32.3496908Z 
2025-12-04T12:20:32.3497130Z To execute this test, run the following from the base repo dir:
2025-12-04T12:20:32.3498010Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling
2025-12-04T12:20:32.3498734Z 
2025-12-04T12:20:32.3499003Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:20:32.3499642Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:20:32.3500113Z frames [('total', 1)]
2025-12-04T12:20:32.3500417Z stats [('calls_captured', 3)]
2025-12-04T12:20:32.3501013Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)]
2025-12-04T12:20:32.3501778Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)]
2025-12-04T12:20:32.3502258Z graph_break []
2025-12-04T12:20:32.3502624Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:20:32.3503725Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:20:32.3504695Z   warnings.warn(
2025-12-04T12:20:32.3505076Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:20:32.3505531Z frames [('total', 1)]
2025-12-04T12:20:32.3505831Z stats [('calls_captured', 3)]
2025-12-04T12:20:32.3506275Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)]
2025-12-04T12:20:32.3506980Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)]
2025-12-04T12:20:32.3507579Z graph_break []
2025-12-04T12:20:32.3507967Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:20:32.3509050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:20:32.3510033Z   warnings.warn(
2025-12-04T12:20:32.3510354Z =================================== FAILURES ===================================
2025-12-04T12:20:32.3510930Z ________________ MemoryCoalescingTest.test_induced_fused_tiling ________________
2025-12-04T12:20:32.3511467Z Traceback (most recent call last):
2025-12-04T12:20:32.3512200Z   File "/var/lib/jenkins/workspace/test/inductor/test_loop_ordering.py", line 1042, in test_induced_fused_tiling
2025-12-04T12:20:32.3513044Z     out, code = run_and_get_code(torch.compile(forward), (permute))
2025-12-04T12:20:32.3513877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T12:20:32.3514605Z     result = fn(*args, **kwargs)
2025-12-04T12:20:32.3515358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T12:20:32.3516244Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T12:20:32.3517134Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T12:20:32.3517984Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:20:32.3518826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T12:20:32.3519621Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:20:32.3520430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:20:32.3521437Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:20:32.3522432Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T12:20:32.3523230Z     _check_triton_bf16_support(graph)
2025-12-04T12:20:32.3524012Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T12:20:32.3524827Z     warn_and_skip(node.get_device())
2025-12-04T12:20:32.3525652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T12:20:32.3526406Z     raise SkipFrame("BF16 is not supported")
2025-12-04T12:20:32.3526933Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T12:20:32.3527331Z 
2025-12-04T12:20:32.3528041Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:20:32.3528943Z 
2025-12-04T12:20:32.3528951Z 
2025-12-04T12:20:32.3529214Z To execute this test, run the following from the base repo dir:
2025-12-04T12:20:32.3530094Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling
2025-12-04T12:20:32.3530752Z 
2025-12-04T12:20:32.3531020Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:20:32.3531651Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:20:32.3532121Z frames [('total', 1)]
2025-12-04T12:20:32.3532406Z stats [('calls_captured', 3)]
2025-12-04T12:20:32.3532974Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)]
2025-12-04T12:20:32.3533698Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)]
2025-12-04T12:20:32.3534173Z graph_break []
2025-12-04T12:20:32.3534538Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:20:32.3535637Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:20:32.3536696Z   warnings.warn(
2025-12-04T12:20:32.3537066Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:20:32.3537536Z frames [('total', 1)]
2025-12-04T12:20:32.3537835Z stats [('calls_captured', 3)]
2025-12-04T12:20:32.3538271Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)]
2025-12-04T12:20:32.3538990Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)]
2025-12-04T12:20:32.3539587Z graph_break []
2025-12-04T12:20:32.3539964Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:20:32.3541090Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:20:32.3542066Z   warnings.warn(
2025-12-04T12:20:32.3542448Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:20:32.3542905Z frames [('total', 1)]
2025-12-04T12:20:32.3543205Z stats [('calls_captured', 3)]
2025-12-04T12:20:32.3543647Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)]
2025-12-04T12:20:32.3544368Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)]
2025-12-04T12:20:32.3544957Z graph_break []
2025-12-04T12:20:32.3545338Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:20:32.3546423Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:20:32.3547383Z   warnings.warn(
2025-12-04T12:20:32.3548312Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-193da166d9e268ac.xml -
2025-12-04T12:20:32.3549381Z =========================== short test summary info ============================
2025-12-04T12:20:32.3550448Z FAILED [0.0949s] inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T12:20:32.3551307Z 
2025-12-04T12:20:32.3552033Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T12:20:32.3552913Z 
2025-12-04T12:20:32.3552918Z 
2025-12-04T12:20:32.3553134Z To execute this test, run the following from the base repo dir:
2025-12-04T12:20:32.3554010Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling
2025-12-04T12:20:32.3554714Z 
2025-12-04T12:20:32.3554985Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:20:32.3555616Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:20:32.3556133Z ================== 1 failed, 52 deselected, 2 rerun in 4.82s ===================
2025-12-04T12:20:32.3556582Z Got exit code 1
2025-12-04T12:20:32.3557190Z FAILED CONSISTENTLY: test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling
2025-12-04T12:20:32.3558169Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:20:32.3559177Z W1204 12:20:05.719000 138862 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T12:20:32.3560343Z Test results will be stored in test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-d4eac70f931f6c8b.xml
2025-12-04T12:20:32.3561230Z ============================= test session starts ==============================
2025-12-04T12:20:32.3562119Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:20:32.3562726Z cachedir: .pytest_cache
2025-12-04T12:20:32.3563441Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:20:32.3564229Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:20:32.3564572Z configfile: pytest.ini
2025-12-04T12:20:32.3565354Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:20:32.3566309Z collecting ... collected 53 items / 24 deselected / 29 selected
2025-12-04T12:20:32.3566803Z stepcurrent: skipping 24 already run items.
2025-12-04T12:20:32.3567198Z Running 29 items in this shard
2025-12-04T12:20:32.3567422Z 
2025-12-04T12:20:32.3567861Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps0 PASSED [0.0675s] [  3%]
2025-12-04T12:20:32.3568951Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps1 PASSED [0.0190s] [  6%]
2025-12-04T12:20:32.3569935Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps2 PASSED [0.0233s] [ 10%]
2025-12-04T12:20:32.3570926Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps3 PASSED [0.0188s] [ 13%]
2025-12-04T12:20:32.3572544Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_reduction_no_pointwise W1204 12:20:10.051000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3573972Z W1204 12:20:10.051000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3575351Z W1204 12:20:10.051000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3576718Z W1204 12:20:10.051000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3578012Z W1204 12:20:10.051000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_reduction_no_pointwise.<locals>.fn'
2025-12-04T12:20:32.3579344Z W1204 12:20:10.284000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3580436Z W1204 12:20:10.284000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3581824Z W1204 12:20:10.284000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3583134Z W1204 12:20:10.284000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3584538Z W1204 12:20:10.284000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_reduction_no_pointwise.<locals>.fn'
2025-12-04T12:20:32.3585919Z I1204 12:20:10.908000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3587494Z I1204 12:20:10.908000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 4100, 'nodes_num_elem': [(SchedulerNode(name='op0'), 1025)], 'node_runtimes': [(SchedulerNode(name='op0'), 1.280993801239752e-05)]}
2025-12-04T12:20:32.3588762Z PASSED [4.9165s] [ 17%]
2025-12-04T12:20:32.3589779Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_reduction_pointwise W1204 12:20:11.009000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3591188Z W1204 12:20:11.009000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3592569Z W1204 12:20:11.009000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3593866Z W1204 12:20:11.009000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3595151Z W1204 12:20:11.009000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_reduction_pointwise.<locals>.fn'
2025-12-04T12:20:32.3596476Z W1204 12:20:11.037000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3597469Z W1204 12:20:11.037000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3598899Z W1204 12:20:11.037000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3600199Z W1204 12:20:11.037000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3601481Z W1204 12:20:11.037000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_reduction_pointwise.<locals>.fn'
2025-12-04T12:20:32.3602841Z I1204 12:20:11.164000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3604535Z I1204 12:20:11.164000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 525312, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1), 131328)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1), 0.0016412717456508697)]}
2025-12-04T12:20:32.3605893Z PASSED [0.2533s] [ 20%]
2025-12-04T12:20:32.3606890Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_remapped_reads W1204 12:20:11.243000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3608268Z W1204 12:20:11.243000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3609640Z W1204 12:20:11.243000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3610974Z W1204 12:20:11.243000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3612228Z W1204 12:20:11.243000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_remapped_reads.<locals>.fn'
2025-12-04T12:20:32.3613520Z W1204 12:20:11.266000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3614552Z W1204 12:20:11.266000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3615955Z W1204 12:20:11.266000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3617335Z W1204 12:20:11.266000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3618586Z W1204 12:20:11.266000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_remapped_reads.<locals>.fn'
2025-12-04T12:20:32.3619916Z I1204 12:20:11.358000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3621467Z I1204 12:20:11.358000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 192, 'nodes_num_elem': [(SchedulerNode(name='op0'), 48)], 'node_runtimes': [(SchedulerNode(name='op0'), 5.99880023995201e-07)]}
2025-12-04T12:20:32.3622684Z PASSED [0.1907s] [ 24%]
2025-12-04T12:20:32.3623695Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_remapped_reads_split W1204 12:20:11.479000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3625092Z W1204 12:20:11.479000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3626466Z W1204 12:20:11.479000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3627759Z W1204 12:20:11.479000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3629040Z W1204 12:20:11.479000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_remapped_reads_split.<locals>.fn'
2025-12-04T12:20:32.3630417Z W1204 12:20:11.518000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3631419Z W1204 12:20:11.518000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3632781Z W1204 12:20:11.518000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3634076Z W1204 12:20:11.518000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3635356Z W1204 12:20:11.518000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_remapped_reads_split.<locals>.fn'
2025-12-04T12:20:32.3636717Z I1204 12:20:11.702000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3638392Z I1204 12:20:11.702000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 432, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1), 108)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1), 1.3497300539892022e-06)]}
2025-12-04T12:20:32.3639720Z PASSED [0.3441s] [ 27%]
2025-12-04T12:20:32.3640315Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_solve_for_tiling PASSED [0.1454s] [ 31%]
2025-12-04T12:20:32.3641291Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_solve_for_zero PASSED [0.1614s] [ 34%]
2025-12-04T12:20:32.3642787Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_tiled_coalesce_analysis_downcast_transposed_v_False W1204 12:20:12.098000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3644341Z W1204 12:20:12.098000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3645793Z W1204 12:20:12.098000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3647085Z W1204 12:20:12.098000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3648371Z W1204 12:20:12.098000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_tiled_coalesce_analysis.<locals>.fn'
2025-12-04T12:20:32.3649719Z W1204 12:20:12.122000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3650721Z W1204 12:20:12.122000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3652104Z W1204 12:20:12.122000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3653398Z W1204 12:20:12.122000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3654676Z W1204 12:20:12.122000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_tiled_coalesce_analysis.<locals>.fn'
2025-12-04T12:20:32.3656049Z I1204 12:20:12.736000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3657730Z I1204 12:20:12.736000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 786432, 'nodes_num_elem': [(SchedulerNode(name='op0'), 196608)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.002457108578284343)]}
2025-12-04T12:20:32.3658993Z PASSED [0.7862s] [ 37%]
2025-12-04T12:20:32.3660182Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_tiled_coalesce_analysis_downcast_transposed_v_True W1204 12:20:12.887000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3661751Z W1204 12:20:12.887000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3663238Z W1204 12:20:12.887000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3664530Z W1204 12:20:12.887000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3665823Z W1204 12:20:12.887000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_tiled_coalesce_analysis.<locals>.fn'
2025-12-04T12:20:32.3667141Z W1204 12:20:12.912000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3668146Z W1204 12:20:12.912000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3669520Z W1204 12:20:12.912000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3670807Z W1204 12:20:12.912000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3672278Z W1204 12:20:12.912000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_tiled_coalesce_analysis.<locals>.fn'
2025-12-04T12:20:32.3673755Z I1204 12:20:13.536000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3675377Z I1204 12:20:13.536000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 1048576, 'nodes_num_elem': [(SchedulerNode(name='op0'), 262144)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.0032761447710457905)]}
2025-12-04T12:20:32.3676756Z PASSED [0.7959s] [ 41%]
2025-12-04T12:20:32.3677685Z inductor/test_loop_ordering.py::TestTiling::test_3d_pointwise I1204 12:20:14.911000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3679596Z I1204 12:20:14.911000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 268435456, 'nodes_num_elem': [(SchedulerNode(name='op0'), 67108864)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.8386930613877224)]}
2025-12-04T12:20:32.3680863Z PASSED [1.7333s] [ 44%]
2025-12-04T12:20:32.3681756Z inductor/test_loop_ordering.py::TestTiling::test_cat I1204 12:20:16.617000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3683629Z I1204 12:20:16.617000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 268435456, 'nodes_num_elem': [(SchedulerNode(name='op0'), 67108864)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.8386930613877224)]}
2025-12-04T12:20:32.3684889Z PASSED [1.5437s]    [ 48%]
2025-12-04T12:20:32.3685453Z inductor/test_loop_ordering.py::TestTiling::test_find_broadcast_var PASSED [0.0053s] [ 51%]
2025-12-04T12:20:32.3686665Z inductor/test_loop_ordering.py::TestTiling::test_mutation_deps W1204 12:20:16.964000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3687969Z W1204 12:20:16.964000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3689344Z W1204 12:20:16.964000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3690719Z W1204 12:20:16.964000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3691975Z W1204 12:20:16.964000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'TestTiling.test_mutation_deps.<locals>.fn'
2025-12-04T12:20:32.3693201Z W1204 12:20:16.990000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key
2025-12-04T12:20:32.3694187Z W1204 12:20:16.990000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last):
2025-12-04T12:20:32.3695570Z W1204 12:20:16.990000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps
2025-12-04T12:20:32.3696923Z W1204 12:20:16.990000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0]     self.dump(obj)
2025-12-04T12:20:32.3698105Z W1204 12:20:16.990000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'TestTiling.test_mutation_deps.<locals>.fn'
2025-12-04T12:20:32.3699379Z I1204 12:20:17.425000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3701081Z I1204 12:20:17.425000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 134217728, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1), 33554432)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1), 0.4193465306938612)]}
2025-12-04T12:20:32.3704845Z PASSED [0.6148s] [ 55%]
2025-12-04T12:20:32.3705839Z inductor/test_loop_ordering.py::TestTiling::test_penalized_small_dim I1204 12:20:18.061000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3707769Z I1204 12:20:18.061000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 40016, 'nodes_num_elem': [(SchedulerNode(name='op0'), 10004)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.0001250249950009998)]}
2025-12-04T12:20:32.3709102Z PASSED [0.5818s] [ 58%]
2025-12-04T12:20:32.3710092Z inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_NHWC I1204 12:20:18.631000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3712053Z I1204 12:20:18.631000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 201326592, 'nodes_num_elem': [(SchedulerNode(name='op0'), 50331648)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.6290197960407918)]}
2025-12-04T12:20:32.3713331Z PASSED [0.6256s] [ 62%]
2025-12-04T12:20:32.3714302Z inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_T I1204 12:20:19.778000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3716242Z I1204 12:20:19.778000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 201326592, 'nodes_num_elem': [(SchedulerNode(name='op0'), 50331648)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.6290197960407918)]}
2025-12-04T12:20:32.3717523Z PASSED [1.3383s] [ 65%]
2025-12-04T12:20:32.3718520Z inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_cont I1204 12:20:21.111000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3720477Z I1204 12:20:21.111000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 201326592, 'nodes_num_elem': [(SchedulerNode(name='op0'), 50331648)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.6290197960407918)]}
2025-12-04T12:20:32.3721737Z PASSED [1.2863s] [ 68%]
2025-12-04T12:20:32.3722705Z inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_NHWC I1204 12:20:22.409000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3724718Z I1204 12:20:22.409000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 201326592, 'nodes_num_elem': [(SchedulerNode(name='op0'), 50331648)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.6290197960407918)]}
2025-12-04T12:20:32.3725986Z PASSED [1.3411s] [ 72%]
2025-12-04T12:20:32.3726937Z inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_T I1204 12:20:23.226000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3728851Z I1204 12:20:23.226000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 201326592, 'nodes_num_elem': [(SchedulerNode(name='op0'), 50331648)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.6290197960407918)]}
2025-12-04T12:20:32.3730130Z PASSED [0.6231s] [ 75%]
2025-12-04T12:20:32.3731093Z inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_cont I1204 12:20:24.410000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3733039Z I1204 12:20:24.410000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 201326592, 'nodes_num_elem': [(SchedulerNode(name='op0'), 50331648)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.6290197960407918)]}
2025-12-04T12:20:32.3734305Z PASSED [1.3315s] [ 79%]
2025-12-04T12:20:32.3735277Z inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_NHWC I1204 12:20:25.670000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3737363Z I1204 12:20:25.670000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 201326592, 'nodes_num_elem': [(SchedulerNode(name='op0'), 50331648)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.6290197960407918)]}
2025-12-04T12:20:32.3738682Z PASSED [1.2890s] [ 82%]
2025-12-04T12:20:32.3739682Z inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_T I1204 12:20:26.977000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3741605Z I1204 12:20:26.977000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 201326592, 'nodes_num_elem': [(SchedulerNode(name='op0'), 50331648)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.6290197960407918)]}
2025-12-04T12:20:32.3742870Z PASSED [1.2719s] [ 86%]
2025-12-04T12:20:32.3743853Z inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_cont I1204 12:20:27.746000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3745815Z I1204 12:20:27.746000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 201326592, 'nodes_num_elem': [(SchedulerNode(name='op0'), 50331648)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.6290197960407918)]}
2025-12-04T12:20:32.3747075Z PASSED [0.6211s] [ 89%]
2025-12-04T12:20:32.3748013Z inductor/test_loop_ordering.py::TestTiling::test_tiled_reduction I1204 12:20:28.802000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics:
2025-12-04T12:20:32.3749950Z I1204 12:20:28.802000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 1074790400, 'nodes_num_elem': [(SchedulerNode(name='op0'), 268697600)], 'node_runtimes': [(SchedulerNode(name='op0'), 3.3580483903219354)]}
2025-12-04T12:20:32.3751226Z PASSED [1.2841s] [ 93%]
2025-12-04T12:20:32.3751801Z inductor/test_loop_ordering.py::TestIndexInversion::test_inversion_cases PASSED [0.0535s] [ 96%]
2025-12-04T12:20:32.3752767Z inductor/test_loop_ordering.py::TestIndexInversion::test_original_complex_expression PASSED [0.7354s] [100%]
2025-12-04T12:20:32.3753368Z 
2025-12-04T12:20:32.3754182Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-d4eac70f931f6c8b.xml -
2025-12-04T12:20:32.3755275Z ====================== 29 passed, 24 deselected in 24.12s ======================
2025-12-04T12:20:32.3756161Z The following tests failed consistently: ['test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling']
2025-12-04T12:20:32.3756843Z 
2025-12-04T12:20:32.3757411Z FINISHED PRINTING LOG FILE of inductor/test_loop_ordering 1/1 (test/test-reports/inductor.test_loop_ordering_1.1_ca0aee6babe9c71a_.log)
2025-12-04T12:20:32.3758124Z 
2025-12-04T12:20:32.3758485Z Finished inductor/test_loop_ordering 1/1 ... [2025-12-04 12:20:32.292243][11260.675134844], took 1.78min
2025-12-04T12:20:32.3759833Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-264346bf50f4314b.xml
2025-12-04T12:20:32.3986285Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-f4f4ac9590e83730.xml
2025-12-04T12:20:32.4300570Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-193da166d9e268ac.xml
2025-12-04T12:20:32.4630007Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-d4eac70f931f6c8b.xml
2025-12-04T12:20:33.0174111Z Uploading logs for 57119749248 to S3
2025-12-04T12:20:33.1920718Z Uploading artifacts took 0.69 seconds
2025-12-04T12:20:33.1921133Z inductor/test_loop_ordering 1/1 failed!
2025-12-04T12:20:33.1925865Z Running export/test_serdes 1/1 ... [2025-12-04 12:20:33.192392][11261.575285415]
2025-12-04T12:20:33.1926415Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:20:33.1931199Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_serdes.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:20:33.192866]
2025-12-04T12:24:41.5856113Z 
2025-12-04T12:24:41.5859178Z export/test_serdes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_serdes_1.1_d6753111c4d56d4f_.log
2025-12-04T12:24:41.6337746Z Running 880 items in this shard: test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_assume_static_by_default_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_constraints_error_not_in_range_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_constraints_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_inline_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_slice_maxsize_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_slice_unbacked_dim1_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_strict_narrow_unbacked_expr_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_no_grad_param_inplace_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_reshape_view_backed_size_oblivious_serdes_strict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_assume_static_by_default_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_constraints_error_not_in_range_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_constraints_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_inline_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_slice_maxsize_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_slice_unbacked_dim1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_strict_narrow_unbacked_expr_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_no_grad_param_inplace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_reshape_view_backed_size_oblivious_serdes_nonstrict, test/export/test_serdes.py::SerDesExportTestExport::test__scaled_dot_product_flash_attention_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_additional_inputs_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_allow_explicit_guards_as_runtime_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_annotate_on_assert_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_args_type_checked_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_aten_lift_fresh_copy_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_attention_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_attr_assignment_extra_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_constrain_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_constant_relation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_linear_relation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_simple_equality_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_baddbmm_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_non_strict_fake_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_non_strict_real_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_bincount_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_buffer_util_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_constructor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_constructor_torch_ir_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_wrong_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_ccode_python_mod_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cdist_forward_compute_mode_zero_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_check_specialized_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_checks_to_constrain_range_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cleanup_dynamic_markers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_colin_unbacked_backed_vr_sub_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_colon_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_compiling_state_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_access_identical_symint_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_branches_return_constant_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_branches_return_same_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_contains_unbacked_no_escape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_int_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_with_module_stack_export_with_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_with_module_stack_export_with_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_aliasing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_input_naming_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_no_user_inp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_output_dup_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_requires_grad_const_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_return_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_with_non_functional_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_with_non_functional_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_decomp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_in_eager_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_with_constrain_value_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_with_various_cases_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_conv_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_crop_like_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cse_for_symint_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_functionalize_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_functionalize_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_warn_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_preserve_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_tag_metadata_re_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_batch_norm_functional_predispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_item_in_prim_after_decomposition_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_item_in_prim_before_decomposition_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_default_decomposition_core_cia_ops_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_1_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_integer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_repeat_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_simplified_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_repeat_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_nonstrict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_nonstrict_with_stacktrace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_strict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_gpu_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_mutation_float_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_static_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_1_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_auto_and_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_divisibility_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_hint_range_violations_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_hint_ranges_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_disable_forced_specializations_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_disable_forced_specializations_ok_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_gather_into_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_gather_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_reduce_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_to_all_single_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_reduce_scatter_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dont_duck_size_for_auto_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_double_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_aliasing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_list_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_with_nan_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_fake_kernel_inference_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_infers_fake_kernel_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_duplicate_modules_with_non_persistent_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_lr_shift_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_bounds_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_dataclass_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_inferred_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_generic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_user_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_various_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_spec_with_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_wrapped_with_shape_guards_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_sym_round_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_ends_of_bounds_oblivious_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_enum_str_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_error_does_not_reference_eager_fallback_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_error_when_passing_mutating_primitive_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_exception_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_expand_copy_export_handles_implicit_true_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_api_with_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_as_backend_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_lifted_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_symbol_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_symbol_scandim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_subclass_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_symbool_pred_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_warns_constant_pred_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_decomp_table_basic_pop_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_decomp_table_container_methods_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_op_lib_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_triton_kernel_mutable_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_triton_kernel_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cyclic_reference_leak_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomp_torture_case_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomp_torture_case_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomps_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomps_simple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_dynamo_config_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_run_decomp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_container_type_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_state_dict_hooks_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_default_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_keyword_only_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_pytree_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_keyword_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_keyword_pytree_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_postional_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_function_schema_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_graph_with_no_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_bug_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_dynamic_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_static_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_leak_compile_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_linear_preserve_dynamic_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_max_nonstrict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_max_onnx_reported_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_mod_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_preserve_linear_at_aot_level_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_preserve_linear_but_not_custom_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_rnn_variants_with_warning_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_scan_pytree_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_script_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_statically_known_true_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_then_compile_tensor_ctor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_autocast_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_fake_tensor_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_inline_constraints_complex_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_inline_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_set_grad_enabled_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_wrong_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_external_call_non_strict_real_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fake_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fake_weights_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_filter_traceback_frames_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_flex_attention_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_float_conversion_from_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_float_conversion_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fqn_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_from_node_metadata_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_full_on_scalar_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_function_holding_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_hints_wrapper_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_hoo_inline_users_issue_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_if_functional_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_if_post_autograd_op_preserved_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inductor_backend_inside_nonstrict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_class_method_recursive_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_class_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_int_shape_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_intermediate_shape_comp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_invalid_pytree_dynamo_graph_capture_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_is_exporting_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_is_nonzero_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_isnonzero_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_issue_113041_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_issue_157289_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_issue_161902_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_istft_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_invalid_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_linear_convd_for_training_ir_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_linear_convd_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_kwarg_dynamic_shapes_diff_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_kwargs_reorder_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_layer_norm_unbacked_normalized_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_layer_sharing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_lazy_module_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_linear_conv_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_malformed_fqn_from_source_name_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_map_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_map_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mask_nonzero_static_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_masked_select_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_math_pow_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mismatched_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mixed_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_dict_key_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_input_subclasses_parameterization_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_list_slice_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_with_dict_container_inp_out_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_modules_access_for_deleted_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_more_multidimensional_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multidimensional_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multinomial_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multiple_definitions_same_name_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_namedtuple_input_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_native_multi_attention_head_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_dynamic_shapes_spec_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_fake_tensor_leak_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_constant_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_init_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nn_module_stack_shared_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_check_is_size_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_suggested_fixes_for_data_dependent_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_3_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_persistent_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_strict_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_strict_dynamic_shapes_suggested_fixes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_none_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonstrict_retrace_preserves_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonzero_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonzero_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_not_registered_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_operator_aten_tensor_mode_variant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_output_node_name_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pad_sequence_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_param_util_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_partial_patched_forward_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_collisions_hoo_subgraphs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_collisions_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_order_variadic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_update_preserving_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_predispatch_cond_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_predispatch_grad_wrappers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_annotation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_module_call_signature_unflatten_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_requires_grad_placeholders_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_shape_dynamism_for_unused_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_profiling_code_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_python_asserts_with_sym_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pytree_register_data_class_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pytree_register_nested_data_class_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_range_constraints_with_replacement_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_alias_dtype_mismatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_bool_cast_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_errors_on_aliasing_custom_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_for_max_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_size_mismatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_redundant_assert_max_upper_bound_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_redundant_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_refine_dynamic_shapes_from_suggested_fixes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_register_constant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_repeat_interleave_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_replace_unbacked_with_very_large_upperbound_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_replaced_unbacked_bindings_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_reshape_view_helper_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_retracable_ep_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_retrace_pre_autograd_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decomposition_supports_user_input_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decompositions_keep_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decompositions_keep_tensor_constant_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_for_prim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_for_prm_str_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_with_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sdpa_gqa_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sequential_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_example_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_as_side_effect_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_empty_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_setgrad_lifted_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_shared_submodule_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_simple_export_for_training_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_simple_unbacked_view_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_size_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_slice_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_solver_unsupported_sympy_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_specialize_derived_dim_roots_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_split_const_gm_with_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_stack_trace_make_fx_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_stack_trace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_primitives_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_shape_attribute_assignment_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_tensors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_static_dim_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_context_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_complicated_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_const_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclasses_parameterization_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclasses_parameterization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggest_torch_checks_with_non_negative_check_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggest_torch_checks_with_regular_check_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_for_data_dependent_errors_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_new_roots_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_float_operators_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_or_sym_and_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_sqrt_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symbool_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symfloat_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_additional_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_ranges_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_shapes_collection_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_tensor_return_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tag_ac_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_attribute_zero_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_constant_aten_to_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_constant_with_wrapped_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_multiple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tolist_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_torch_check_eq_commutativity_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_torch_fn_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_trace_under_fake_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_train_eval_on_exported_preautograd_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tril_dynamic_diagonal_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_triu_dynamic_diagonal_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_3d_matmul_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_bincount_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_bindings_for_divisible_u_symint_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_deferred_runtime_retrace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_expand_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_infer_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_kth_value_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_linear_layer_norm_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_noncontig_lin_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_pad_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_scalar_constructor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_slice_forward_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_slice_simple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_to_cond_passthrough_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_to_cond_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_unsqueeze_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_buffer_update_child2parent_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_isinstance_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_shared_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_state_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_no_unroll_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_placeholder_update_child2parent_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_5_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_6_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_buf_8_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_const_preserving_3_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_const_preserving_3_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_6_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_9_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_preserving_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unused_aliases_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unused_constant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_uplift_common_custom_meta_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_uplift_common_custom_meta_with_multiple_calls_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_use_embedding_twice_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_user_input_and_buffer_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_vmap_custom_autograd_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_vmap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_vmap_to_assert_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_where_decomp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_assert_separation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_index_assertions_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_simple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_tensor_constant_idx_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_wrapper_module_serdes_strict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test__scaled_dot_product_flash_attention_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_additional_inputs_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_allow_explicit_guards_as_runtime_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_annotate_on_assert_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_args_type_checked_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_aten_lift_fresh_copy_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_attention_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_attr_assignment_extra_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_constrain_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_constant_relation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_linear_relation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_simple_equality_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_baddbmm_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_non_strict_fake_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_non_strict_real_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_bincount_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_buffer_util_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_constructor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_constructor_torch_ir_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_wrong_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_ccode_python_mod_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cdist_forward_compute_mode_zero_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_check_specialized_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_checks_to_constrain_range_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cleanup_dynamic_markers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_colin_unbacked_backed_vr_sub_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_colon_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_compiling_state_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_access_identical_symint_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_branches_return_constant_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_branches_return_same_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_contains_unbacked_no_escape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_int_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_with_module_stack_export_with_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_with_module_stack_export_with_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_aliasing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_input_naming_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_no_user_inp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_output_dup_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_requires_grad_const_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_return_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_with_non_functional_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_with_non_functional_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_decomp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_in_eager_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_with_constrain_value_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_with_various_cases_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_conv_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_crop_like_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cse_for_symint_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_functionalize_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_functionalize_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_warn_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_preserve_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_tag_metadata_re_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_batch_norm_functional_predispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_item_in_prim_after_decomposition_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_item_in_prim_before_decomposition_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_default_decomposition_core_cia_ops_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_1_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_integer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_repeat_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_simplified_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_repeat_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_nonstrict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_nonstrict_with_stacktrace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_strict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_gpu_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_mutation_float_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_static_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_1_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_auto_and_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_divisibility_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_hint_range_violations_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_hint_ranges_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_disable_forced_specializations_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_disable_forced_specializations_ok_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_gather_into_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_gather_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_reduce_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_to_all_single_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_reduce_scatter_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dont_duck_size_for_auto_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_double_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_aliasing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_list_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_with_nan_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_fake_kernel_inference_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_infers_fake_kernel_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_duplicate_modules_with_non_persistent_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_lr_shift_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_bounds_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_dataclass_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_inferred_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_generic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_user_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_various_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_spec_with_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_wrapped_with_shape_guards_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_sym_round_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_ends_of_bounds_oblivious_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_enum_str_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_error_does_not_reference_eager_fallback_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_error_when_passing_mutating_primitive_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_exception_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_expand_copy_export_handles_implicit_true_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_api_with_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_as_backend_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_lifted_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_symbol_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_symbol_scandim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_subclass_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_symbool_pred_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_warns_constant_pred_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_decomp_table_basic_pop_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_decomp_table_container_methods_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_op_lib_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_triton_kernel_mutable_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_triton_kernel_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cyclic_reference_leak_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomp_torture_case_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomp_torture_case_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomps_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomps_simple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_dynamo_config_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_run_decomp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_container_type_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_state_dict_hooks_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_default_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_keyword_only_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_pytree_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_keyword_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_keyword_pytree_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_postional_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_function_schema_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_graph_with_no_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_bug_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_dynamic_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_static_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_leak_compile_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_linear_preserve_dynamic_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_max_nonstrict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_max_onnx_reported_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_mod_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_preserve_linear_at_aot_level_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_preserve_linear_but_not_custom_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_rnn_variants_with_warning_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_scan_pytree_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_script_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_statically_known_true_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_then_compile_tensor_ctor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_autocast_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_fake_tensor_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_inline_constraints_complex_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_inline_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_set_grad_enabled_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_wrong_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_external_call_non_strict_real_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fake_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fake_weights_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_filter_traceback_frames_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_flex_attention_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_float_conversion_from_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_float_conversion_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fqn_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_from_node_metadata_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_full_on_scalar_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_function_holding_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_hints_wrapper_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_hoo_inline_users_issue_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_if_functional_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_if_post_autograd_op_preserved_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inductor_backend_inside_nonstrict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_class_method_recursive_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_class_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_int_shape_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_intermediate_shape_comp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_invalid_pytree_dynamo_graph_capture_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_is_exporting_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_is_nonzero_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_isnonzero_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_issue_113041_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_issue_157289_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_issue_161902_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_istft_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_invalid_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_linear_convd_for_training_ir_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_linear_convd_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_kwarg_dynamic_shapes_diff_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_kwargs_reorder_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_layer_norm_unbacked_normalized_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_layer_sharing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_lazy_module_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_linear_conv_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_malformed_fqn_from_source_name_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_map_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_map_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mask_nonzero_static_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_masked_select_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_math_pow_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mismatched_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mixed_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_dict_key_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_input_subclasses_parameterization_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_list_slice_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_with_dict_container_inp_out_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_modules_access_for_deleted_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_more_multidimensional_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multidimensional_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multinomial_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multiple_definitions_same_name_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_namedtuple_input_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_native_multi_attention_head_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_dynamic_shapes_spec_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_fake_tensor_leak_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_constant_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_init_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nn_module_stack_shared_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_check_is_size_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_suggested_fixes_for_data_dependent_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_3_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_persistent_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_strict_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_strict_dynamic_shapes_suggested_fixes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_none_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonstrict_retrace_preserves_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonzero_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonzero_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_not_registered_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_operator_aten_tensor_mode_variant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_output_node_name_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pad_sequence_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_param_util_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_partial_patched_forward_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_collisions_hoo_subgraphs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_collisions_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_order_variadic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_update_preserving_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_predispatch_cond_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_predispatch_grad_wrappers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_annotation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_module_call_signature_unflatten_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_requires_grad_placeholders_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_shape_dynamism_for_unused_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_profiling_code_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_python_asserts_with_sym_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pytree_register_data_class_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pytree_register_nested_data_class_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_range_constraints_with_replacement_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_alias_dtype_mismatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_bool_cast_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_errors_on_aliasing_custom_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_for_max_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_size_mismatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_redundant_assert_max_upper_bound_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_redundant_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_refine_dynamic_shapes_from_suggested_fixes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_register_constant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_repeat_interleave_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_replace_unbacked_with_very_large_upperbound_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_replaced_unbacked_bindings_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_reshape_view_helper_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_retracable_ep_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_retrace_pre_autograd_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decomposition_supports_user_input_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decompositions_keep_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decompositions_keep_tensor_constant_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_for_prim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_for_prm_str_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_with_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sdpa_gqa_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sequential_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_example_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_as_side_effect_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_empty_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_setgrad_lifted_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_shared_submodule_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_simple_export_for_training_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_simple_unbacked_view_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_size_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_slice_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_solver_unsupported_sympy_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_specialize_derived_dim_roots_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_split_const_gm_with_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_stack_trace_make_fx_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_stack_trace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_primitives_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_shape_attribute_assignment_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_tensors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_static_dim_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_context_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_complicated_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_const_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclasses_parameterization_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclasses_parameterization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggest_torch_checks_with_non_negative_check_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggest_torch_checks_with_regular_check_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_for_data_dependent_errors_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_new_roots_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_float_operators_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_or_sym_and_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_sqrt_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symbool_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symfloat_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_additional_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_ranges_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_shapes_collection_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_tensor_return_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tag_ac_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_attribute_zero_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_constant_aten_to_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_constant_with_wrapped_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_multiple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tolist_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_torch_check_eq_commutativity_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_torch_fn_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_trace_under_fake_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_train_eval_on_exported_preautograd_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tril_dynamic_diagonal_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_triu_dynamic_diagonal_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_3d_matmul_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_bincount_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_bindings_for_divisible_u_symint_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_deferred_runtime_retrace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_expand_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_infer_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_kth_value_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_linear_layer_norm_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_noncontig_lin_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_pad_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_scalar_constructor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_slice_forward_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_slice_simple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_to_cond_passthrough_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_to_cond_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_unsqueeze_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_buffer_update_child2parent_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_isinstance_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_shared_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_state_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_no_unroll_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_placeholder_update_child2parent_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_5_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_6_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_buf_8_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_const_preserving_3_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_const_preserving_3_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_6_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_9_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_preserving_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unused_aliases_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unused_constant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_uplift_common_custom_meta_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_uplift_common_custom_meta_with_multiple_calls_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_use_embedding_twice_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_user_input_and_buffer_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_vmap_custom_autograd_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_vmap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_vmap_to_assert_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_where_decomp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_assert_separation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_index_assertions_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_simple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_tensor_constant_idx_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_wrapper_module_serdes_nonstrict
2025-12-04T12:24:41.6805875Z 
2025-12-04T12:24:41.6806251Z Finished export/test_serdes 1/1 ... [2025-12-04 12:24:41.587580][11509.970472875], took 4.14min
2025-12-04T12:24:41.6807427Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_serdes/export.test_serdes-191fd84c43c29743.xml
2025-12-04T12:24:41.7399320Z Running dynamo/test_backends 1/1 ... [2025-12-04 12:24:41.739605][11510.122497627]
2025-12-04T12:24:41.7400103Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:24:41.7403031Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_backends.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:24:41.740065]
2025-12-04T12:25:50.2175311Z 
2025-12-04T12:25:50.2176451Z PRINTING LOG FILE of dynamo/test_backends 1/1 (test/test-reports/dynamo.test_backends_1.1_0248c6271c37d6dd_.log)
2025-12-04T12:25:50.2178039Z Test results will be stored in test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-7c3220f8bc842d2f.xml
2025-12-04T12:25:50.2178867Z ============================= test session starts ==============================
2025-12-04T12:25:50.2179538Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:25:50.2180194Z cachedir: .pytest_cache
2025-12-04T12:25:50.2180952Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:25:50.2181750Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:25:50.2182112Z configfile: pytest.ini
2025-12-04T12:25:50.2182895Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:25:50.2183746Z collecting ... collected 21 items
2025-12-04T12:25:50.2184170Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T12:25:50.2192715Z Running 21 items in this shard: test/dynamo/test_backends.py::NormalizeIRTests::test_inplace_normalize, test/dynamo/test_backends.py::MPSSupportedTest::test_mps_supported, test/dynamo/test_backends.py::TestExplainWithBackend::test_explain_with_backend, test/dynamo/test_backends.py::TestCustomBackendAPI::test_aot_autograd_api, test/dynamo/test_backends.py::TestCustomBackendAPI::test_backend_graph_freeze, test/dynamo/test_backends.py::TestCustomBackendAPI::test_backend_recompilation, test/dynamo/test_backends.py::TestCustomBackendAPI::test_lookup_backend, test/dynamo/test_backends.py::TestCustomBackendAPI::test_lookup_custom_backend, test/dynamo/test_backends.py::TestCustomBackendAPI::test_register_backend_api, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_eager_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_eager_decomp_partition_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_ts_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_eager_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_eager_noexcept_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_example_inputs_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_example_inputs_runtime_use_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_intel_gaudi_backend_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_list_backends_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_torchscript_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_tvm_cuda
2025-12-04T12:25:50.2201357Z 
2025-12-04T12:25:50.2201718Z dynamo/test_backends.py::NormalizeIRTests::test_inplace_normalize PASSED [0.2319s] [  4%]
2025-12-04T12:25:50.2202625Z dynamo/test_backends.py::MPSSupportedTest::test_mps_supported SKIPPED [0.0003s] (requires mps) [  9%]
2025-12-04T12:25:50.2203579Z dynamo/test_backends.py::TestExplainWithBackend::test_explain_with_backend PASSED [7.1635s] [ 14%]
2025-12-04T12:25:50.2204500Z dynamo/test_backends.py::TestCustomBackendAPI::test_aot_autograd_api PASSED [0.0603s] [ 19%]
2025-12-04T12:25:50.2205410Z dynamo/test_backends.py::TestCustomBackendAPI::test_backend_graph_freeze PASSED [0.1014s] [ 23%]
2025-12-04T12:25:50.2206339Z dynamo/test_backends.py::TestCustomBackendAPI::test_backend_recompilation PASSED [0.7360s] [ 28%]
2025-12-04T12:25:50.2207220Z dynamo/test_backends.py::TestCustomBackendAPI::test_lookup_backend PASSED [0.7438s] [ 33%]
2025-12-04T12:25:50.2208195Z dynamo/test_backends.py::TestCustomBackendAPI::test_lookup_custom_backend PASSED [0.0034s] [ 38%]
2025-12-04T12:25:50.2209123Z dynamo/test_backends.py::TestCustomBackendAPI::test_register_backend_api PASSED [0.0433s] [ 42%]
2025-12-04T12:25:50.2210138Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda ('RERUN', {'yellow': True}) [1.2270s] [ 47%]
2025-12-04T12:25:50.2211277Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda ('RERUN', {'yellow': True}) [0.5088s] [ 47%]
2025-12-04T12:25:50.2212323Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda FAILED [0.5170s] [ 47%]
2025-12-04T12:25:50.2212853Z 
2025-12-04T12:25:50.2213016Z ==================================== RERUNS ====================================
2025-12-04T12:25:50.2213594Z ________________ TestOptimizationsCUDA.test_aot_cudagraphs_cuda ________________
2025-12-04T12:25:50.2214135Z Traceback (most recent call last):
2025-12-04T12:25:50.2214923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:25:50.2215705Z     method(*args, **kwargs)
2025-12-04T12:25:50.2216504Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:25:50.2217283Z     method(*args, **kwargs)
2025-12-04T12:25:50.2218018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:25:50.2218791Z     with policy():
2025-12-04T12:25:50.2219471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:25:50.2220249Z     raise RuntimeError(msg)
2025-12-04T12:25:50.2221540Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 680460288 and is now 867106816.
2025-12-04T12:25:50.2222756Z 
2025-12-04T12:25:50.2222989Z To execute this test, run the following from the base repo dir:
2025-12-04T12:25:50.2223837Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda
2025-12-04T12:25:50.2224480Z 
2025-12-04T12:25:50.2224756Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:25:50.2225455Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:25:50.2225942Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2226314Z stats [('calls_captured', 4), ('unique_graphs', 1)]
2025-12-04T12:25:50.2226737Z aot_autograd [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2227263Z ________________ TestOptimizationsCUDA.test_aot_cudagraphs_cuda ________________
2025-12-04T12:25:50.2227794Z Traceback (most recent call last):
2025-12-04T12:25:50.2228568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:25:50.2229341Z     method(*args, **kwargs)
2025-12-04T12:25:50.2230068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:25:50.2230823Z     method(*args, **kwargs)
2025-12-04T12:25:50.2231549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:25:50.2232314Z     with policy():
2025-12-04T12:25:50.2232995Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:25:50.2233773Z     raise RuntimeError(msg)
2025-12-04T12:25:50.2235070Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 839843840 and is now 867106816.
2025-12-04T12:25:50.2236334Z 
2025-12-04T12:25:50.2236565Z To execute this test, run the following from the base repo dir:
2025-12-04T12:25:50.2237406Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda
2025-12-04T12:25:50.2238051Z 
2025-12-04T12:25:50.2238323Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:25:50.2239002Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:25:50.2239525Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2239897Z stats [('calls_captured', 4), ('unique_graphs', 1)]
2025-12-04T12:25:50.2240326Z aot_autograd [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2240795Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:25:50.2241272Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2241642Z stats [('calls_captured', 4), ('unique_graphs', 1)]
2025-12-04T12:25:50.2242061Z aot_autograd [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2242454Z =================================== FAILURES ===================================
2025-12-04T12:25:50.2243016Z ________________ TestOptimizationsCUDA.test_aot_cudagraphs_cuda ________________
2025-12-04T12:25:50.2243561Z Traceback (most recent call last):
2025-12-04T12:25:50.2244330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:25:50.2245101Z     method(*args, **kwargs)
2025-12-04T12:25:50.2245825Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:25:50.2246596Z     method(*args, **kwargs)
2025-12-04T12:25:50.2247314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:25:50.2248068Z     with policy():
2025-12-04T12:25:50.2248762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:25:50.2249541Z     raise RuntimeError(msg)
2025-12-04T12:25:50.2250841Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 839843840 and is now 867106816.
2025-12-04T12:25:50.2252056Z 
2025-12-04T12:25:50.2252314Z To execute this test, run the following from the base repo dir:
2025-12-04T12:25:50.2253169Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda
2025-12-04T12:25:50.2253795Z 
2025-12-04T12:25:50.2254077Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:25:50.2254709Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:25:50.2255186Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2255573Z stats [('calls_captured', 4), ('unique_graphs', 1)]
2025-12-04T12:25:50.2255998Z aot_autograd [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2256534Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:25:50.2257015Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2257400Z stats [('calls_captured', 4), ('unique_graphs', 1)]
2025-12-04T12:25:50.2257815Z aot_autograd [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2258286Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:25:50.2258760Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2259140Z stats [('calls_captured', 4), ('unique_graphs', 1)]
2025-12-04T12:25:50.2259543Z aot_autograd [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2260475Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-7c3220f8bc842d2f.xml -
2025-12-04T12:25:50.2261541Z =========================== short test summary info ============================
2025-12-04T12:25:50.2263401Z FAILED [0.5170s] dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 839843840 and is now 867106816.
2025-12-04T12:25:50.2265093Z 
2025-12-04T12:25:50.2265331Z To execute this test, run the following from the base repo dir:
2025-12-04T12:25:50.2266214Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda
2025-12-04T12:25:50.2266865Z 
2025-12-04T12:25:50.2267138Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:25:50.2267745Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:25:50.2268291Z =============== 1 failed, 8 passed, 1 skipped, 2 rerun in 11.39s ===============
2025-12-04T12:25:50.2268738Z Got exit code 1
2025-12-04T12:25:50.2269020Z Retrying single test...
2025-12-04T12:25:50.2269715Z Test results will be stored in test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-7281b232f81c7a26.xml
2025-12-04T12:25:50.2270518Z ============================= test session starts ==============================
2025-12-04T12:25:50.2271428Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:25:50.2272046Z cachedir: .pytest_cache
2025-12-04T12:25:50.2272765Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:25:50.2273547Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:25:50.2273917Z configfile: pytest.ini
2025-12-04T12:25:50.2274699Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:25:50.2275640Z collecting ... collected 21 items / 20 deselected / 1 selected
2025-12-04T12:25:50.2276570Z stepcurrent: skipping 9 already run items. Running only test/dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda
2025-12-04T12:25:50.2277398Z Running 1 items in this shard
2025-12-04T12:25:50.2277613Z 
2025-12-04T12:25:50.2278109Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda ('RERUN', {'yellow': True}) [1.3827s] [100%]
2025-12-04T12:25:50.2279303Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda ('RERUN', {'yellow': True}) [0.4729s] [100%]
2025-12-04T12:25:50.2280318Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda FAILED [0.4714s] [100%]
2025-12-04T12:25:50.2280858Z 
2025-12-04T12:25:50.2281007Z ==================================== RERUNS ====================================
2025-12-04T12:25:50.2281583Z ________________ TestOptimizationsCUDA.test_aot_cudagraphs_cuda ________________
2025-12-04T12:25:50.2282114Z Traceback (most recent call last):
2025-12-04T12:25:50.2282891Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:25:50.2283665Z     method(*args, **kwargs)
2025-12-04T12:25:50.2284386Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:25:50.2285143Z     method(*args, **kwargs)
2025-12-04T12:25:50.2285869Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:25:50.2286629Z     with policy():
2025-12-04T12:25:50.2287309Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:25:50.2288092Z     raise RuntimeError(msg)
2025-12-04T12:25:50.2289389Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 680460288 and is now 867106816.
2025-12-04T12:25:50.2290738Z 
2025-12-04T12:25:50.2290962Z To execute this test, run the following from the base repo dir:
2025-12-04T12:25:50.2291814Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda
2025-12-04T12:25:50.2292493Z 
2025-12-04T12:25:50.2292821Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:25:50.2293449Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:25:50.2293931Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2294312Z stats [('calls_captured', 4), ('unique_graphs', 1)]
2025-12-04T12:25:50.2294726Z aot_autograd [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2295238Z ________________ TestOptimizationsCUDA.test_aot_cudagraphs_cuda ________________
2025-12-04T12:25:50.2295786Z Traceback (most recent call last):
2025-12-04T12:25:50.2296634Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:25:50.2297402Z     method(*args, **kwargs)
2025-12-04T12:25:50.2298130Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:25:50.2298909Z     method(*args, **kwargs)
2025-12-04T12:25:50.2299643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:25:50.2300397Z     with policy():
2025-12-04T12:25:50.2301092Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:25:50.2301873Z     raise RuntimeError(msg)
2025-12-04T12:25:50.2303153Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 839843840 and is now 867106816.
2025-12-04T12:25:50.2304373Z 
2025-12-04T12:25:50.2304589Z To execute this test, run the following from the base repo dir:
2025-12-04T12:25:50.2305443Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda
2025-12-04T12:25:50.2306081Z 
2025-12-04T12:25:50.2306414Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:25:50.2307049Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:25:50.2307521Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2307899Z stats [('calls_captured', 4), ('unique_graphs', 1)]
2025-12-04T12:25:50.2308319Z aot_autograd [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2308768Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:25:50.2309251Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2309626Z stats [('calls_captured', 4), ('unique_graphs', 1)]
2025-12-04T12:25:50.2310031Z aot_autograd [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2310427Z =================================== FAILURES ===================================
2025-12-04T12:25:50.2310996Z ________________ TestOptimizationsCUDA.test_aot_cudagraphs_cuda ________________
2025-12-04T12:25:50.2311549Z Traceback (most recent call last):
2025-12-04T12:25:50.2312311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:25:50.2313084Z     method(*args, **kwargs)
2025-12-04T12:25:50.2313804Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:25:50.2314558Z     method(*args, **kwargs)
2025-12-04T12:25:50.2315274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:25:50.2316074Z     with policy():
2025-12-04T12:25:50.2316766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:25:50.2317530Z     raise RuntimeError(msg)
2025-12-04T12:25:50.2318826Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 839843840 and is now 867106816.
2025-12-04T12:25:50.2320124Z 
2025-12-04T12:25:50.2320345Z To execute this test, run the following from the base repo dir:
2025-12-04T12:25:50.2321198Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda
2025-12-04T12:25:50.2321827Z 
2025-12-04T12:25:50.2322101Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:25:50.2322748Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:25:50.2323231Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2323617Z stats [('calls_captured', 4), ('unique_graphs', 1)]
2025-12-04T12:25:50.2324026Z aot_autograd [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2324493Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:25:50.2324972Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2325342Z stats [('calls_captured', 4), ('unique_graphs', 1)]
2025-12-04T12:25:50.2325763Z aot_autograd [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2326227Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:25:50.2326687Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2327063Z stats [('calls_captured', 4), ('unique_graphs', 1)]
2025-12-04T12:25:50.2327481Z aot_autograd [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2328404Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-7281b232f81c7a26.xml -
2025-12-04T12:25:50.2329389Z =========================== short test summary info ============================
2025-12-04T12:25:50.2331245Z FAILED [0.4714s] dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 839843840 and is now 867106816.
2025-12-04T12:25:50.2332969Z 
2025-12-04T12:25:50.2333194Z To execute this test, run the following from the base repo dir:
2025-12-04T12:25:50.2334045Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda
2025-12-04T12:25:50.2334673Z 
2025-12-04T12:25:50.2334944Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:25:50.2335545Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:25:50.2336083Z ================== 1 failed, 20 deselected, 2 rerun in 2.36s ===================
2025-12-04T12:25:50.2336648Z Got exit code 1
2025-12-04T12:25:50.2336914Z Retrying single test...
2025-12-04T12:25:50.2337615Z Test results will be stored in test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-0c21d337a20b3a01.xml
2025-12-04T12:25:50.2338432Z ============================= test session starts ==============================
2025-12-04T12:25:50.2339094Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:25:50.2339706Z cachedir: .pytest_cache
2025-12-04T12:25:50.2340433Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:25:50.2341223Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:25:50.2341571Z configfile: pytest.ini
2025-12-04T12:25:50.2342401Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:25:50.2343350Z collecting ... collected 21 items / 20 deselected / 1 selected
2025-12-04T12:25:50.2344274Z stepcurrent: skipping 9 already run items. Running only test/dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda
2025-12-04T12:25:50.2345102Z Running 1 items in this shard
2025-12-04T12:25:50.2345364Z 
2025-12-04T12:25:50.2345887Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda ('RERUN', {'yellow': True}) [1.3703s] [100%]
2025-12-04T12:25:50.2347000Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda ('RERUN', {'yellow': True}) [0.4663s] [100%]
2025-12-04T12:25:50.2348001Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda FAILED [0.4503s] [100%]
2025-12-04T12:25:50.2348542Z 
2025-12-04T12:25:50.2348689Z ==================================== RERUNS ====================================
2025-12-04T12:25:50.2349267Z ________________ TestOptimizationsCUDA.test_aot_cudagraphs_cuda ________________
2025-12-04T12:25:50.2349811Z Traceback (most recent call last):
2025-12-04T12:25:50.2350568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:25:50.2351342Z     method(*args, **kwargs)
2025-12-04T12:25:50.2352067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:25:50.2352842Z     method(*args, **kwargs)
2025-12-04T12:25:50.2353554Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:25:50.2354316Z     with policy():
2025-12-04T12:25:50.2355008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:25:50.2355766Z     raise RuntimeError(msg)
2025-12-04T12:25:50.2357064Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 680460288 and is now 867106816.
2025-12-04T12:25:50.2358289Z 
2025-12-04T12:25:50.2358508Z To execute this test, run the following from the base repo dir:
2025-12-04T12:25:50.2359365Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda
2025-12-04T12:25:50.2360034Z 
2025-12-04T12:25:50.2360318Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:25:50.2360950Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:25:50.2361432Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2361813Z stats [('calls_captured', 4), ('unique_graphs', 1)]
2025-12-04T12:25:50.2362222Z aot_autograd [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2362749Z ________________ TestOptimizationsCUDA.test_aot_cudagraphs_cuda ________________
2025-12-04T12:25:50.2363298Z Traceback (most recent call last):
2025-12-04T12:25:50.2364048Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:25:50.2364823Z     method(*args, **kwargs)
2025-12-04T12:25:50.2365545Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:25:50.2366309Z     method(*args, **kwargs)
2025-12-04T12:25:50.2367024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:25:50.2367782Z     with policy():
2025-12-04T12:25:50.2368474Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:25:50.2369240Z     raise RuntimeError(msg)
2025-12-04T12:25:50.2370571Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 839843840 and is now 867106816.
2025-12-04T12:25:50.2371991Z 
2025-12-04T12:25:50.2372213Z To execute this test, run the following from the base repo dir:
2025-12-04T12:25:50.2373081Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda
2025-12-04T12:25:50.2373798Z 
2025-12-04T12:25:50.2374159Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:25:50.2374781Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:25:50.2375270Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2375656Z stats [('calls_captured', 4), ('unique_graphs', 1)]
2025-12-04T12:25:50.2376067Z aot_autograd [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2376610Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:25:50.2377093Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2377461Z stats [('calls_captured', 4), ('unique_graphs', 1)]
2025-12-04T12:25:50.2377884Z aot_autograd [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2378279Z =================================== FAILURES ===================================
2025-12-04T12:25:50.2378851Z ________________ TestOptimizationsCUDA.test_aot_cudagraphs_cuda ________________
2025-12-04T12:25:50.2379385Z Traceback (most recent call last):
2025-12-04T12:25:50.2380158Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:25:50.2380931Z     method(*args, **kwargs)
2025-12-04T12:25:50.2381644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:25:50.2382412Z     method(*args, **kwargs)
2025-12-04T12:25:50.2383130Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:25:50.2383887Z     with policy():
2025-12-04T12:25:50.2384565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:25:50.2385341Z     raise RuntimeError(msg)
2025-12-04T12:25:50.2386701Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 839843840 and is now 867106816.
2025-12-04T12:25:50.2387916Z 
2025-12-04T12:25:50.2388146Z To execute this test, run the following from the base repo dir:
2025-12-04T12:25:50.2388985Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda
2025-12-04T12:25:50.2389628Z 
2025-12-04T12:25:50.2389897Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:25:50.2390540Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:25:50.2391024Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2391398Z stats [('calls_captured', 4), ('unique_graphs', 1)]
2025-12-04T12:25:50.2391823Z aot_autograd [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2392293Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:25:50.2392763Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2393151Z stats [('calls_captured', 4), ('unique_graphs', 1)]
2025-12-04T12:25:50.2393576Z aot_autograd [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2394023Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:25:50.2394493Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2394875Z stats [('calls_captured', 4), ('unique_graphs', 1)]
2025-12-04T12:25:50.2395294Z aot_autograd [('total', 1), ('ok', 1)]
2025-12-04T12:25:50.2396274Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-0c21d337a20b3a01.xml -
2025-12-04T12:25:50.2397268Z =========================== short test summary info ============================
2025-12-04T12:25:50.2400416Z FAILED [0.4503s] dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 839843840 and is now 867106816.
2025-12-04T12:25:50.2402149Z 
2025-12-04T12:25:50.2402387Z To execute this test, run the following from the base repo dir:
2025-12-04T12:25:50.2403249Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda
2025-12-04T12:25:50.2403970Z 
2025-12-04T12:25:50.2404350Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:25:50.2405091Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:25:50.2405629Z ================== 1 failed, 20 deselected, 2 rerun in 2.32s ===================
2025-12-04T12:25:50.2406071Z Got exit code 1
2025-12-04T12:25:50.2406657Z FAILED CONSISTENTLY: test/dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda
2025-12-04T12:25:50.2407634Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:25:50.2408700Z Test results will be stored in test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-62a9a8755ec319d6.xml
2025-12-04T12:25:50.2409504Z ============================= test session starts ==============================
2025-12-04T12:25:50.2410179Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:25:50.2410792Z cachedir: .pytest_cache
2025-12-04T12:25:50.2411508Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:25:50.2412302Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:25:50.2412669Z configfile: pytest.ini
2025-12-04T12:25:50.2413450Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:25:50.2414395Z collecting ... collected 21 items / 10 deselected / 11 selected
2025-12-04T12:25:50.2414980Z stepcurrent: skipping 10 already run items.
2025-12-04T12:25:50.2415387Z Running 11 items in this shard
2025-12-04T12:25:50.2415599Z 
2025-12-04T12:25:50.2415973Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_eager_cuda PASSED [1.1914s] [  9%]
2025-12-04T12:25:50.2417029Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_eager_decomp_partition_cuda PASSED [0.1564s] [ 18%]
2025-12-04T12:25:50.2417981Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_ts_cuda PASSED [0.2164s] [ 27%]
2025-12-04T12:25:50.2418821Z dynamo/test_backends.py::TestOptimizationsCUDA::test_eager_cuda PASSED [0.0739s] [ 36%]
2025-12-04T12:25:50.2419686Z dynamo/test_backends.py::TestOptimizationsCUDA::test_eager_noexcept_cuda PASSED [0.0738s] [ 45%]
2025-12-04T12:25:50.2420614Z dynamo/test_backends.py::TestOptimizationsCUDA::test_example_inputs_cuda PASSED [0.0506s] [ 54%]
2025-12-04T12:25:50.2421596Z dynamo/test_backends.py::TestOptimizationsCUDA::test_example_inputs_runtime_use_cuda PASSED [0.0470s] [ 63%]
2025-12-04T12:25:50.2422692Z dynamo/test_backends.py::TestOptimizationsCUDA::test_intel_gaudi_backend_cuda SKIPPED [0.0019s] (Only runs on hpu) [ 72%]
2025-12-04T12:25:50.2423709Z dynamo/test_backends.py::TestOptimizationsCUDA::test_list_backends_cuda PASSED [0.0124s] [ 81%]
2025-12-04T12:25:50.2424612Z dynamo/test_backends.py::TestOptimizationsCUDA::test_torchscript_cuda PASSED [0.1006s] [ 90%]
2025-12-04T12:25:50.2425538Z dynamo/test_backends.py::TestOptimizationsCUDA::test_tvm_cuda SKIPPED [0.0003s] (requires tvm) [100%]
2025-12-04T12:25:50.2426137Z 
2025-12-04T12:25:50.2426827Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-62a9a8755ec319d6.xml -
2025-12-04T12:25:50.2427850Z ================= 9 passed, 2 skipped, 10 deselected in 1.97s ==================
2025-12-04T12:25:50.2428726Z The following tests failed consistently: ['test/dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda']
2025-12-04T12:25:50.2429424Z 
2025-12-04T12:25:50.2429944Z FINISHED PRINTING LOG FILE of dynamo/test_backends 1/1 (test/test-reports/dynamo.test_backends_1.1_0248c6271c37d6dd_.log)
2025-12-04T12:25:50.2430563Z 
2025-12-04T12:25:50.2431055Z Finished dynamo/test_backends 1/1 ... [2025-12-04 12:25:50.217435][11578.600331701], took 1.14min
2025-12-04T12:25:50.2432272Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-7c3220f8bc842d2f.xml
2025-12-04T12:25:50.3324527Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-7281b232f81c7a26.xml
2025-12-04T12:25:50.3636853Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-0c21d337a20b3a01.xml
2025-12-04T12:25:50.3921843Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-62a9a8755ec319d6.xml
2025-12-04T12:25:50.9799529Z Uploading logs for 57119749248 to S3
2025-12-04T12:25:51.1640450Z Uploading artifacts took 0.74 seconds
2025-12-04T12:25:51.1640858Z dynamo/test_backends 1/1 failed!
2025-12-04T12:25:51.1645719Z Running inductor/test_aot_inductor_package 1/1 ... [2025-12-04 12:25:51.164389][11579.547283555]
2025-12-04T12:25:51.1646342Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:25:51.1651216Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor_package.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:25:51.164855]
2025-12-04T12:35:04.2505411Z 
2025-12-04T12:35:04.2506517Z PRINTING LOG FILE of inductor/test_aot_inductor_package 1/1 (test/test-reports/inductor.test_aot_inductor_package_1.1_5509f9f54e762912_.log)
2025-12-04T12:35:04.2519097Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-b1ca468dab29d0d8.xml
2025-12-04T12:35:04.2520747Z ============================= test session starts ==============================
2025-12-04T12:35:04.2521892Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:35:04.2522672Z cachedir: .pytest_cache
2025-12-04T12:35:04.2523678Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:35:04.2524951Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:35:04.2525452Z configfile: pytest.ini
2025-12-04T12:35:04.2526754Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:35:04.2527833Z collecting ... collected 88 items
2025-12-04T12:35:04.2528252Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T12:35:04.2597615Z Running 88 items in this shard: test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_add, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_bool_input, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_after_package, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_after_package_multi_arch, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_after_package_static, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_standalone_cos, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_with_exporter, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_with_exporter_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_deepcopy_compiled_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_duplicate_calls, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_linear, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_loading_wrong_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_metadata, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_multiple_methods, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_package_shared_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_package_user_managed_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_package_weights_on_disk_nested_module, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_package_without_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_remove_intermediate_files, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_save_buffer, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_specified_output_dir, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_update_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_add, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_bool_input, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_after_package, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_after_package_multi_arch, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_after_package_static, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_standalone_cos, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_with_exporter, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_with_exporter_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_deepcopy_compiled_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_duplicate_calls, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_linear, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_loading_wrong_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_metadata, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_multiple_methods, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_package_shared_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_package_user_managed_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_package_weights_on_disk_nested_module, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_package_without_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_remove_intermediate_files, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_save_buffer, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_specified_output_dir, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_update_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_add, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_bool_input, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_after_package, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_after_package_multi_arch, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_after_package_static, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_standalone_cos, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_with_exporter, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_with_exporter_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_deepcopy_compiled_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_duplicate_calls, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_linear, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_loading_wrong_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_metadata, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_multiple_methods, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_package_shared_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_package_user_managed_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_package_weights_on_disk_nested_module, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_package_without_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_remove_intermediate_files, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_save_buffer, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_specified_output_dir, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_update_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_add, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_bool_input, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_after_package, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_after_package_multi_arch, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_after_package_static, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_with_exporter, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_with_exporter_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_deepcopy_compiled_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_duplicate_calls, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_linear, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_loading_wrong_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_metadata, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_multiple_methods, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_package_shared_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_package_user_managed_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_package_weights_on_disk_nested_module, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_package_without_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_remove_intermediate_files, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_save_buffer, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_specified_output_dir, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_update_weights
2025-12-04T12:35:04.2669111Z 
2025-12-04T12:35:04.2669919Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_add PASSED [7.6318s] [  1%]
2025-12-04T12:35:04.2672074Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_bool_input PASSED [5.0570s] [  2%]
2025-12-04T12:35:04.2674766Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_after_package SKIPPED [0.0004s] (Test is only supported on CUDA 12.6+) [  3%]
2025-12-04T12:35:04.2677519Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_after_package_multi_arch SKIPPED [0.0002s] (Test is only supported on CUDA 12.8+) [  4%]
2025-12-04T12:35:04.2680538Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_after_package_static SKIPPED [0.0002s] (Test is only supported on CUDA 12.6+) [  5%]
2025-12-04T12:35:04.2683266Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_standalone_cos SKIPPED [0.0031s] (Only meant to test cpp package) [  6%]
2025-12-04T12:35:04.2685616Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_with_exporter SKIPPED [0.0002s] (Test is only supported on CUDA 12.6+) [  7%]
2025-12-04T12:35:04.2688427Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_with_exporter_weights SKIPPED [0.0006s] (Test is only supported on CUDA 12.6+) [  9%]
2025-12-04T12:35:04.2692139Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_deepcopy_compiled_model W1204 12:26:18.236000 140836 site-packages/torch/export/pt2_archive/_package.py:763] AOTICompiledModel deepcopy warning: AOTICompiledModel.loader is not deepcopied.
2025-12-04T12:35:04.2694840Z PASSED [5.1506s] [ 10%]
2025-12-04T12:35:04.2695995Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_duplicate_calls PASSED [20.3835s] [ 11%]
2025-12-04T12:35:04.2697577Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_linear PASSED [5.1766s] [ 12%]
2025-12-04T12:35:04.2700017Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_loading_wrong_model W1204 12:26:49.002000 140836 site-packages/torch/_inductor/package/package.py:120] Loading outdated pt2 file. Please regenerate your package.
2025-12-04T12:35:04.2702036Z PASSED [5.2006s] [ 13%]
2025-12-04T12:35:04.2703050Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_metadata PASSED [5.1577s] [ 14%]
2025-12-04T12:35:04.2704774Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_multiple_methods PASSED [10.4660s] [ 15%]
2025-12-04T12:35:04.2706305Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_package_shared_weights PASSED [2.1586s] [ 17%]
2025-12-04T12:35:04.2707486Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_package_user_managed_weight PASSED [6.3455s] [ 18%]
2025-12-04T12:35:04.2708833Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_package_weights_on_disk_nested_module PASSED [5.2032s] [ 19%]
2025-12-04T12:35:04.2710042Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_package_without_weight PASSED [5.3058s] [ 20%]
2025-12-04T12:35:04.2711186Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_remove_intermediate_files PASSED [5.1781s] [ 21%]
2025-12-04T12:35:04.2712291Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_save_buffer PASSED [5.2636s] [ 22%]
2025-12-04T12:35:04.2713374Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_specified_output_dir PASSED [5.2034s] [ 23%]
2025-12-04T12:35:04.2714468Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_update_weights PASSED [5.6854s] [ 25%]
2025-12-04T12:35:04.2716021Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_add In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_float.h:12,
2025-12-04T12:35:04.2717649Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:11,
2025-12-04T12:35:04.2718603Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.2719668Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.2720660Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.2721663Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.2722931Z                  from /tmp/ROfn0q/tmpejwtemx7/data/aotinductor/model/cn3k2mlnpdktb5d42n3gbws3qpzrim5w2lb6w5t7cv3mzl7dq3b5.wrapper.cpp:723:
2025-12-04T12:35:04.2724345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/sleef.h:192:10: warning: ISO C++ prohibits anonymous structs [-Wpedantic]
2025-12-04T12:35:04.2727525Z   192 |   struct {
2025-12-04T12:35:04.2727780Z       |          ^
2025-12-04T12:35:04.2728446Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15,
2025-12-04T12:35:04.2729472Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.2730417Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.2731414Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.2732433Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.2733702Z                  from /tmp/ROfn0q/tmpejwtemx7/data/aotinductor/model/cn3k2mlnpdktb5d42n3gbws3qpzrim5w2lb6w5t7cv3mzl7dq3b5.wrapper.cpp:723:
2025-12-04T12:35:04.2737421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<short int>&, const at::vec::CPU_CAPABILITY::Vectorized<short int>&, const at::vec::CPU_CAPABILITY::Vectorized<short int>&)’:
2025-12-04T12:35:04.2740607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:544:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.2741873Z   544 |     auto msb_one = _mm512_set1_epi16(0xFFFF);
2025-12-04T12:35:04.2742294Z       |                                      ^~~~~~
2025-12-04T12:35:04.2743056Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15,
2025-12-04T12:35:04.2744053Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.2745011Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.2746017Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.2747032Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.2748275Z                  from /tmp/ROfn0q/tmpejwtemx7/data/aotinductor/model/cn3k2mlnpdktb5d42n3gbws3qpzrim5w2lb6w5t7cv3mzl7dq3b5.wrapper.cpp:723:
2025-12-04T12:35:04.2750720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.2753396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:697:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.2754754Z   697 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.2755276Z       |                                                      ^~~~~~
2025-12-04T12:35:04.2757171Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.2759769Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:701:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.2761073Z   701 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.2761546Z       |                                                      ^~~~~~
2025-12-04T12:35:04.2763434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.2766028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:705:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.2767494Z   705 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.2767973Z       |                                                      ^~~~~~
2025-12-04T12:35:04.2769864Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.2772789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:709:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.2774116Z   709 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.2774595Z       |                                                      ^~~~~~
2025-12-04T12:35:04.2776552Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator>(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.2779174Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:713:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.2780472Z   713 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.2780959Z       |                                                      ^~~~~~
2025-12-04T12:35:04.2782849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator>=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.2785428Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:717:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.2786792Z   717 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.2787262Z       |                                                      ^~~~~~
2025-12-04T12:35:04.2789871Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&, const at::vec::CPU_CAPABILITY::Vectorized<signed char>&, const at::vec::CPU_CAPABILITY::Vectorized<signed char>&)’:
2025-12-04T12:35:04.2793104Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1153:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.2794407Z  1153 |     auto msb_one = _mm512_set1_epi8(0xFF);
2025-12-04T12:35:04.2794816Z       |                                     ^~~~
2025-12-04T12:35:04.2796739Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.2799406Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1166:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.2800740Z  1166 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.2801216Z       |                                                     ^~~~
2025-12-04T12:35:04.2803153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.2805822Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1170:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.2807201Z  1170 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.2807681Z       |                                                     ^~~~
2025-12-04T12:35:04.2809622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.2812299Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1174:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.2813614Z  1174 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.2814089Z       |                                                     ^~~~
2025-12-04T12:35:04.2816040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.2818781Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1178:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.2820164Z  1178 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.2820622Z       |                                                     ^~~~
2025-12-04T12:35:04.2823274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&, const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&, const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&)’:
2025-12-04T12:35:04.2826531Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1207:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.2827810Z  1207 |     auto msb_one = _mm512_set1_epi8(0xFF);
2025-12-04T12:35:04.2828213Z       |                                     ^~~~
2025-12-04T12:35:04.2830155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.2832851Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1220:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.2834180Z  1220 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.2834643Z       |                                                     ^~~~
2025-12-04T12:35:04.2836584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.2839279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1224:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.2840597Z  1224 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.2841071Z       |                                                     ^~~~
2025-12-04T12:35:04.2843072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.2845749Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1228:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.2847073Z  1228 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.2847541Z       |                                                     ^~~~
2025-12-04T12:35:04.2849506Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.2852201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1232:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.2853520Z  1232 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.2853987Z       |                                                     ^~~~
2025-12-04T12:35:04.2856767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = true; T = signed char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.2859477Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2074:27:   required from here
2025-12-04T12:35:04.2861404Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2862609Z  1866 |       0x80,
2025-12-04T12:35:04.2862873Z       |       ^~~~
2025-12-04T12:35:04.2864230Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2865428Z  1868 |       0x80,
2025-12-04T12:35:04.2865689Z       |       ^~~~
2025-12-04T12:35:04.2867035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2868258Z  1870 |       0x80,
2025-12-04T12:35:04.2868503Z       |       ^~~~
2025-12-04T12:35:04.2869849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2871306Z  1872 |       0x80,
2025-12-04T12:35:04.2871581Z       |       ^~~~
2025-12-04T12:35:04.2872928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2874140Z  1874 |       0x80,
2025-12-04T12:35:04.2874397Z       |       ^~~~
2025-12-04T12:35:04.2875812Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2877046Z  1876 |       0x80,
2025-12-04T12:35:04.2877303Z       |       ^~~~
2025-12-04T12:35:04.2878646Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2879839Z  1878 |       0x80,
2025-12-04T12:35:04.2880105Z       |       ^~~~
2025-12-04T12:35:04.2881439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2882654Z  1880 |       0x80,
2025-12-04T12:35:04.2882896Z       |       ^~~~
2025-12-04T12:35:04.2884244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2885456Z  1882 |       0x80,
2025-12-04T12:35:04.2885697Z       |       ^~~~
2025-12-04T12:35:04.2887035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2888246Z  1884 |       0x80,
2025-12-04T12:35:04.2888565Z       |       ^~~~
2025-12-04T12:35:04.2889896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2891103Z  1886 |       0x80,
2025-12-04T12:35:04.2891358Z       |       ^~~~
2025-12-04T12:35:04.2892744Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2894002Z  1888 |       0x80,
2025-12-04T12:35:04.2894265Z       |       ^~~~
2025-12-04T12:35:04.2895604Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2896868Z  1890 |       0x80,
2025-12-04T12:35:04.2897132Z       |       ^~~~
2025-12-04T12:35:04.2898486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2899702Z  1892 |       0x80,
2025-12-04T12:35:04.2899944Z       |       ^~~~
2025-12-04T12:35:04.2901302Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2902526Z  1894 |       0x80,
2025-12-04T12:35:04.2902766Z       |       ^~~~
2025-12-04T12:35:04.2904106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2905319Z  1896 |       0x80,
2025-12-04T12:35:04.2905571Z       |       ^~~~
2025-12-04T12:35:04.2906897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2908105Z  1898 |       0x80,
2025-12-04T12:35:04.2908358Z       |       ^~~~
2025-12-04T12:35:04.2909752Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2910958Z  1900 |       0x80,
2025-12-04T12:35:04.2911217Z       |       ^~~~
2025-12-04T12:35:04.2912565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2913776Z  1902 |       0x80,
2025-12-04T12:35:04.2914037Z       |       ^~~~
2025-12-04T12:35:04.2915390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2916602Z  1904 |       0x80,
2025-12-04T12:35:04.2916849Z       |       ^~~~
2025-12-04T12:35:04.2918211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2919435Z  1906 |       0x80,
2025-12-04T12:35:04.2919696Z       |       ^~~~
2025-12-04T12:35:04.2921028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2922291Z  1908 |       0x80,
2025-12-04T12:35:04.2922553Z       |       ^~~~
2025-12-04T12:35:04.2923890Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2925111Z  1910 |       0x80,
2025-12-04T12:35:04.2925372Z       |       ^~~~
2025-12-04T12:35:04.2926796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2928000Z  1912 |       0x80,
2025-12-04T12:35:04.2928263Z       |       ^~~~
2025-12-04T12:35:04.2929610Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2930831Z  1914 |       0x80,
2025-12-04T12:35:04.2931075Z       |       ^~~~
2025-12-04T12:35:04.2932416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2933629Z  1916 |       0x80,
2025-12-04T12:35:04.2933866Z       |       ^~~~
2025-12-04T12:35:04.2935219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2936497Z  1918 |       0x80,
2025-12-04T12:35:04.2936752Z       |       ^~~~
2025-12-04T12:35:04.2938079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2939293Z  1920 |       0x80,
2025-12-04T12:35:04.2939548Z       |       ^~~~
2025-12-04T12:35:04.2941107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2942336Z  1922 |       0x80,
2025-12-04T12:35:04.2942593Z       |       ^~~~
2025-12-04T12:35:04.2944040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2945238Z  1924 |       0x80,
2025-12-04T12:35:04.2945496Z       |       ^~~~
2025-12-04T12:35:04.2946842Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2948056Z  1926 |       0x80,
2025-12-04T12:35:04.2948297Z       |       ^~~~
2025-12-04T12:35:04.2949635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2950850Z  1928 |       0x80);
2025-12-04T12:35:04.2951112Z       |       ^~~~
2025-12-04T12:35:04.2952460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2953682Z  1930 |       0x80,
2025-12-04T12:35:04.2953937Z       |       ^~~~
2025-12-04T12:35:04.2955268Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2956530Z  1932 |       0x80,
2025-12-04T12:35:04.2956788Z       |       ^~~~
2025-12-04T12:35:04.2958132Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2959327Z  1934 |       0x80,
2025-12-04T12:35:04.2959622Z       |       ^~~~
2025-12-04T12:35:04.2961013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2962225Z  1936 |       0x80,
2025-12-04T12:35:04.2962467Z       |       ^~~~
2025-12-04T12:35:04.2963815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2965034Z  1938 |       0x80,
2025-12-04T12:35:04.2965278Z       |       ^~~~
2025-12-04T12:35:04.2966623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2967833Z  1940 |       0x80,
2025-12-04T12:35:04.2968096Z       |       ^~~~
2025-12-04T12:35:04.2969434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2970649Z  1942 |       0x80,
2025-12-04T12:35:04.2970903Z       |       ^~~~
2025-12-04T12:35:04.2972476Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2973701Z  1944 |       0x80,
2025-12-04T12:35:04.2973957Z       |       ^~~~
2025-12-04T12:35:04.2975299Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2976560Z  1946 |       0x80,
2025-12-04T12:35:04.2976822Z       |       ^~~~
2025-12-04T12:35:04.2978310Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2979524Z  1948 |       0x80,
2025-12-04T12:35:04.2979765Z       |       ^~~~
2025-12-04T12:35:04.2981109Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2982324Z  1950 |       0x80,
2025-12-04T12:35:04.2982568Z       |       ^~~~
2025-12-04T12:35:04.2983915Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2985192Z  1952 |       0x80,
2025-12-04T12:35:04.2985452Z       |       ^~~~
2025-12-04T12:35:04.2986850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2988073Z  1954 |       0x80,
2025-12-04T12:35:04.2988335Z       |       ^~~~
2025-12-04T12:35:04.2989689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2990945Z  1956 |       0x80,
2025-12-04T12:35:04.2991201Z       |       ^~~~
2025-12-04T12:35:04.2992551Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2993750Z  1958 |       0x80,
2025-12-04T12:35:04.2994008Z       |       ^~~~
2025-12-04T12:35:04.2995353Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2996567Z  1960 |       0x80,
2025-12-04T12:35:04.2996808Z       |       ^~~~
2025-12-04T12:35:04.2998157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.2999370Z  1962 |       0x80,
2025-12-04T12:35:04.2999629Z       |       ^~~~
2025-12-04T12:35:04.3000977Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3002203Z  1964 |       0x80,
2025-12-04T12:35:04.3002461Z       |       ^~~~
2025-12-04T12:35:04.3003842Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3005062Z  1966 |       0x80,
2025-12-04T12:35:04.3005321Z       |       ^~~~
2025-12-04T12:35:04.3006667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3007869Z  1968 |       0x80,
2025-12-04T12:35:04.3008131Z       |       ^~~~
2025-12-04T12:35:04.3009478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3010705Z  1970 |       0x80,
2025-12-04T12:35:04.3010950Z       |       ^~~~
2025-12-04T12:35:04.3012309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3013523Z  1972 |       0x80,
2025-12-04T12:35:04.3013768Z       |       ^~~~
2025-12-04T12:35:04.3015123Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3016446Z  1974 |       0x80,
2025-12-04T12:35:04.3016702Z       |       ^~~~
2025-12-04T12:35:04.3018045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3020138Z  1976 |       0x80,
2025-12-04T12:35:04.3020409Z       |       ^~~~
2025-12-04T12:35:04.3021849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3023067Z  1978 |       0x80,
2025-12-04T12:35:04.3023321Z       |       ^~~~
2025-12-04T12:35:04.3024665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3025877Z  1980 |       0x80,
2025-12-04T12:35:04.3026129Z       |       ^~~~
2025-12-04T12:35:04.3027473Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3028691Z  1982 |       0x80,
2025-12-04T12:35:04.3028933Z       |       ^~~~
2025-12-04T12:35:04.3030295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3031496Z  1984 |       0x80,
2025-12-04T12:35:04.3031738Z       |       ^~~~
2025-12-04T12:35:04.3033074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3034290Z  1986 |       0x80,
2025-12-04T12:35:04.3034548Z       |       ^~~~
2025-12-04T12:35:04.3035870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3037086Z  1988 |       0x80,
2025-12-04T12:35:04.3037342Z       |       ^~~~
2025-12-04T12:35:04.3038729Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3039927Z  1990 |       0x80,
2025-12-04T12:35:04.3040180Z       |       ^~~~
2025-12-04T12:35:04.3041522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3042719Z  1992 |       0x80,
2025-12-04T12:35:04.3042975Z       |       ^~~~
2025-12-04T12:35:04.3044318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.3045596Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.3046002Z       |                                      ^~~~~~
2025-12-04T12:35:04.3048713Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = true; T = unsigned char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.3051403Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2081:27:   required from here
2025-12-04T12:35:04.3053318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3054575Z  1866 |       0x80,
2025-12-04T12:35:04.3054831Z       |       ^~~~
2025-12-04T12:35:04.3056210Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3057511Z  1868 |       0x80,
2025-12-04T12:35:04.3057756Z       |       ^~~~
2025-12-04T12:35:04.3059106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3060328Z  1870 |       0x80,
2025-12-04T12:35:04.3060584Z       |       ^~~~
2025-12-04T12:35:04.3061909Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3063122Z  1872 |       0x80,
2025-12-04T12:35:04.3063391Z       |       ^~~~
2025-12-04T12:35:04.3064725Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3065922Z  1874 |       0x80,
2025-12-04T12:35:04.3066174Z       |       ^~~~
2025-12-04T12:35:04.3067508Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3068721Z  1876 |       0x80,
2025-12-04T12:35:04.3068963Z       |       ^~~~
2025-12-04T12:35:04.3070304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3071822Z  1878 |       0x80,
2025-12-04T12:35:04.3072075Z       |       ^~~~
2025-12-04T12:35:04.3073444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3074659Z  1880 |       0x80,
2025-12-04T12:35:04.3074915Z       |       ^~~~
2025-12-04T12:35:04.3076247Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3077464Z  1882 |       0x80,
2025-12-04T12:35:04.3077724Z       |       ^~~~
2025-12-04T12:35:04.3079058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3080281Z  1884 |       0x80,
2025-12-04T12:35:04.3080549Z       |       ^~~~
2025-12-04T12:35:04.3081889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3083089Z  1886 |       0x80,
2025-12-04T12:35:04.3083346Z       |       ^~~~
2025-12-04T12:35:04.3084756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3085966Z  1888 |       0x80,
2025-12-04T12:35:04.3086206Z       |       ^~~~
2025-12-04T12:35:04.3087547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3088817Z  1890 |       0x80,
2025-12-04T12:35:04.3089130Z       |       ^~~~
2025-12-04T12:35:04.3090480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3091689Z  1892 |       0x80,
2025-12-04T12:35:04.3091942Z       |       ^~~~
2025-12-04T12:35:04.3093279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3094495Z  1894 |       0x80,
2025-12-04T12:35:04.3094754Z       |       ^~~~
2025-12-04T12:35:04.3096097Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3097411Z  1896 |       0x80,
2025-12-04T12:35:04.3097676Z       |       ^~~~
2025-12-04T12:35:04.3099028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3100221Z  1898 |       0x80,
2025-12-04T12:35:04.3100482Z       |       ^~~~
2025-12-04T12:35:04.3101840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3103052Z  1900 |       0x80,
2025-12-04T12:35:04.3103298Z       |       ^~~~
2025-12-04T12:35:04.3104696Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3105928Z  1902 |       0x80,
2025-12-04T12:35:04.3106190Z       |       ^~~~
2025-12-04T12:35:04.3107529Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3108745Z  1904 |       0x80,
2025-12-04T12:35:04.3109010Z       |       ^~~~
2025-12-04T12:35:04.3110351Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3111569Z  1906 |       0x80,
2025-12-04T12:35:04.3111827Z       |       ^~~~
2025-12-04T12:35:04.3113177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3114390Z  1908 |       0x80,
2025-12-04T12:35:04.3114653Z       |       ^~~~
2025-12-04T12:35:04.3115993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3117216Z  1910 |       0x80,
2025-12-04T12:35:04.3117455Z       |       ^~~~
2025-12-04T12:35:04.3118845Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3120057Z  1912 |       0x80,
2025-12-04T12:35:04.3120299Z       |       ^~~~
2025-12-04T12:35:04.3121642Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3123479Z  1914 |       0x80,
2025-12-04T12:35:04.3123740Z       |       ^~~~
2025-12-04T12:35:04.3125877Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3127103Z  1916 |       0x80,
2025-12-04T12:35:04.3127372Z       |       ^~~~
2025-12-04T12:35:04.3128718Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3129932Z  1918 |       0x80,
2025-12-04T12:35:04.3130195Z       |       ^~~~
2025-12-04T12:35:04.3131550Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3132757Z  1920 |       0x80,
2025-12-04T12:35:04.3133010Z       |       ^~~~
2025-12-04T12:35:04.3134356Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3135565Z  1922 |       0x80,
2025-12-04T12:35:04.3135813Z       |       ^~~~
2025-12-04T12:35:04.3137226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3138444Z  1924 |       0x80,
2025-12-04T12:35:04.3138691Z       |       ^~~~
2025-12-04T12:35:04.3140138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3141365Z  1926 |       0x80,
2025-12-04T12:35:04.3141628Z       |       ^~~~
2025-12-04T12:35:04.3142959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3144625Z  1928 |       0x80);
2025-12-04T12:35:04.3144947Z       |       ^~~~
2025-12-04T12:35:04.3146310Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3147506Z  1930 |       0x80,
2025-12-04T12:35:04.3147763Z       |       ^~~~
2025-12-04T12:35:04.3149118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3150453Z  1932 |       0x80,
2025-12-04T12:35:04.3150715Z       |       ^~~~
2025-12-04T12:35:04.3152062Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3153277Z  1934 |       0x80,
2025-12-04T12:35:04.3153595Z       |       ^~~~
2025-12-04T12:35:04.3155158Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3156370Z  1936 |       0x80,
2025-12-04T12:35:04.3156634Z       |       ^~~~
2025-12-04T12:35:04.3158036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3159283Z  1938 |       0x80,
2025-12-04T12:35:04.3159542Z       |       ^~~~
2025-12-04T12:35:04.3160872Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3162089Z  1940 |       0x80,
2025-12-04T12:35:04.3162349Z       |       ^~~~
2025-12-04T12:35:04.3163692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3164903Z  1942 |       0x80,
2025-12-04T12:35:04.3165161Z       |       ^~~~
2025-12-04T12:35:04.3166521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3167739Z  1944 |       0x80,
2025-12-04T12:35:04.3167985Z       |       ^~~~
2025-12-04T12:35:04.3169322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3170541Z  1946 |       0x80,
2025-12-04T12:35:04.3170787Z       |       ^~~~
2025-12-04T12:35:04.3172416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3173636Z  1948 |       0x80,
2025-12-04T12:35:04.3173896Z       |       ^~~~
2025-12-04T12:35:04.3175345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3176629Z  1950 |       0x80,
2025-12-04T12:35:04.3176886Z       |       ^~~~
2025-12-04T12:35:04.3178221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3179491Z  1952 |       0x80,
2025-12-04T12:35:04.3179749Z       |       ^~~~
2025-12-04T12:35:04.3181091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3182292Z  1954 |       0x80,
2025-12-04T12:35:04.3182548Z       |       ^~~~
2025-12-04T12:35:04.3183999Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3185236Z  1956 |       0x80,
2025-12-04T12:35:04.3185482Z       |       ^~~~
2025-12-04T12:35:04.3186825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3188052Z  1958 |       0x80,
2025-12-04T12:35:04.3188299Z       |       ^~~~
2025-12-04T12:35:04.3189648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3190864Z  1960 |       0x80,
2025-12-04T12:35:04.3191124Z       |       ^~~~
2025-12-04T12:35:04.3192474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3193693Z  1962 |       0x80,
2025-12-04T12:35:04.3193957Z       |       ^~~~
2025-12-04T12:35:04.3195305Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3196515Z  1964 |       0x80,
2025-12-04T12:35:04.3196777Z       |       ^~~~
2025-12-04T12:35:04.3198124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3199337Z  1966 |       0x80,
2025-12-04T12:35:04.3199585Z       |       ^~~~
2025-12-04T12:35:04.3200989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3202206Z  1968 |       0x80,
2025-12-04T12:35:04.3202456Z       |       ^~~~
2025-12-04T12:35:04.3203802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3205019Z  1970 |       0x80,
2025-12-04T12:35:04.3205276Z       |       ^~~~
2025-12-04T12:35:04.3206600Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3207806Z  1972 |       0x80,
2025-12-04T12:35:04.3208068Z       |       ^~~~
2025-12-04T12:35:04.3209408Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3210615Z  1974 |       0x80,
2025-12-04T12:35:04.3210867Z       |       ^~~~
2025-12-04T12:35:04.3212210Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3213472Z  1976 |       0x80,
2025-12-04T12:35:04.3213733Z       |       ^~~~
2025-12-04T12:35:04.3215078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3216347Z  1978 |       0x80,
2025-12-04T12:35:04.3216670Z       |       ^~~~
2025-12-04T12:35:04.3218097Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3219311Z  1980 |       0x80,
2025-12-04T12:35:04.3219560Z       |       ^~~~
2025-12-04T12:35:04.3220908Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3222130Z  1982 |       0x80,
2025-12-04T12:35:04.3222392Z       |       ^~~~
2025-12-04T12:35:04.3223715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3224927Z  1984 |       0x80,
2025-12-04T12:35:04.3225191Z       |       ^~~~
2025-12-04T12:35:04.3226544Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3227740Z  1986 |       0x80,
2025-12-04T12:35:04.3227994Z       |       ^~~~
2025-12-04T12:35:04.3229338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3230536Z  1988 |       0x80,
2025-12-04T12:35:04.3230795Z       |       ^~~~
2025-12-04T12:35:04.3232138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3233347Z  1990 |       0x80,
2025-12-04T12:35:04.3233595Z       |       ^~~~
2025-12-04T12:35:04.3234993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3236201Z  1992 |       0x80,
2025-12-04T12:35:04.3236456Z       |       ^~~~
2025-12-04T12:35:04.3237785Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.3239068Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.3239481Z       |                                      ^~~~~~
2025-12-04T12:35:04.3242188Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = false; T = signed char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.3244816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2109:28:   required from here
2025-12-04T12:35:04.3246729Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3247989Z  1866 |       0x80,
2025-12-04T12:35:04.3248248Z       |       ^~~~
2025-12-04T12:35:04.3249581Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3250840Z  1868 |       0x80,
2025-12-04T12:35:04.3251094Z       |       ^~~~
2025-12-04T12:35:04.3252477Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3253674Z  1870 |       0x80,
2025-12-04T12:35:04.3253928Z       |       ^~~~
2025-12-04T12:35:04.3255270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3256555Z  1872 |       0x80,
2025-12-04T12:35:04.3256795Z       |       ^~~~
2025-12-04T12:35:04.3258151Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3259375Z  1874 |       0x80,
2025-12-04T12:35:04.3259616Z       |       ^~~~
2025-12-04T12:35:04.3260972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3262180Z  1876 |       0x80,
2025-12-04T12:35:04.3262435Z       |       ^~~~
2025-12-04T12:35:04.3263767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3264982Z  1878 |       0x80,
2025-12-04T12:35:04.3265245Z       |       ^~~~
2025-12-04T12:35:04.3266566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3267786Z  1880 |       0x80,
2025-12-04T12:35:04.3268041Z       |       ^~~~
2025-12-04T12:35:04.3269439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3270637Z  1882 |       0x80,
2025-12-04T12:35:04.3304336Z       |       ^~~~
2025-12-04T12:35:04.3306131Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3307391Z  1884 |       0x80,
2025-12-04T12:35:04.3307662Z       |       ^~~~
2025-12-04T12:35:04.3309038Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3310272Z  1886 |       0x80,
2025-12-04T12:35:04.3310520Z       |       ^~~~
2025-12-04T12:35:04.3311898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3313117Z  1888 |       0x80,
2025-12-04T12:35:04.3313379Z       |       ^~~~
2025-12-04T12:35:04.3314711Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3316122Z  1890 |       0x80,
2025-12-04T12:35:04.3316387Z       |       ^~~~
2025-12-04T12:35:04.3317742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3319021Z  1892 |       0x80,
2025-12-04T12:35:04.3319299Z       |       ^~~~
2025-12-04T12:35:04.3320720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3321917Z  1894 |       0x80,
2025-12-04T12:35:04.3322178Z       |       ^~~~
2025-12-04T12:35:04.3323525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3324744Z  1896 |       0x80,
2025-12-04T12:35:04.3324988Z       |       ^~~~
2025-12-04T12:35:04.3326336Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3327554Z  1898 |       0x80,
2025-12-04T12:35:04.3327802Z       |       ^~~~
2025-12-04T12:35:04.3329148Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3330361Z  1900 |       0x80,
2025-12-04T12:35:04.3330618Z       |       ^~~~
2025-12-04T12:35:04.3331946Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3333163Z  1902 |       0x80,
2025-12-04T12:35:04.3333414Z       |       ^~~~
2025-12-04T12:35:04.3334742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3335958Z  1904 |       0x80,
2025-12-04T12:35:04.3336398Z       |       ^~~~
2025-12-04T12:35:04.3337796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3338999Z  1906 |       0x80,
2025-12-04T12:35:04.3339257Z       |       ^~~~
2025-12-04T12:35:04.3340599Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3341812Z  1908 |       0x80,
2025-12-04T12:35:04.3342055Z       |       ^~~~
2025-12-04T12:35:04.3343396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3344613Z  1910 |       0x80,
2025-12-04T12:35:04.3344864Z       |       ^~~~
2025-12-04T12:35:04.3346209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3347417Z  1912 |       0x80,
2025-12-04T12:35:04.3347671Z       |       ^~~~
2025-12-04T12:35:04.3348993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3350252Z  1914 |       0x80,
2025-12-04T12:35:04.3350507Z       |       ^~~~
2025-12-04T12:35:04.3351848Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3353093Z  1916 |       0x80,
2025-12-04T12:35:04.3353401Z       |       ^~~~
2025-12-04T12:35:04.3354755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3355970Z  1918 |       0x80,
2025-12-04T12:35:04.3356215Z       |       ^~~~
2025-12-04T12:35:04.3357547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3358764Z  1920 |       0x80,
2025-12-04T12:35:04.3359010Z       |       ^~~~
2025-12-04T12:35:04.3360366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3361596Z  1922 |       0x80,
2025-12-04T12:35:04.3361859Z       |       ^~~~
2025-12-04T12:35:04.3363183Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3364399Z  1924 |       0x80,
2025-12-04T12:35:04.3364651Z       |       ^~~~
2025-12-04T12:35:04.3365975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3367190Z  1926 |       0x80,
2025-12-04T12:35:04.3367447Z       |       ^~~~
2025-12-04T12:35:04.3368786Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3370023Z  1928 |       0x80);
2025-12-04T12:35:04.3370298Z       |       ^~~~
2025-12-04T12:35:04.3371853Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3373068Z  1930 |       0x80,
2025-12-04T12:35:04.3373309Z       |       ^~~~
2025-12-04T12:35:04.3374667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3375877Z  1932 |       0x80,
2025-12-04T12:35:04.3376117Z       |       ^~~~
2025-12-04T12:35:04.3377539Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3378762Z  1934 |       0x80,
2025-12-04T12:35:04.3379025Z       |       ^~~~
2025-12-04T12:35:04.3380356Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3381561Z  1936 |       0x80,
2025-12-04T12:35:04.3381818Z       |       ^~~~
2025-12-04T12:35:04.3383243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3384439Z  1938 |       0x80,
2025-12-04T12:35:04.3384695Z       |       ^~~~
2025-12-04T12:35:04.3386036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3387336Z  1940 |       0x80,
2025-12-04T12:35:04.3387593Z       |       ^~~~
2025-12-04T12:35:04.3388935Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3390139Z  1942 |       0x80,
2025-12-04T12:35:04.3390378Z       |       ^~~~
2025-12-04T12:35:04.3391726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3392944Z  1944 |       0x80,
2025-12-04T12:35:04.3393189Z       |       ^~~~
2025-12-04T12:35:04.3394594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3395811Z  1946 |       0x80,
2025-12-04T12:35:04.3396072Z       |       ^~~~
2025-12-04T12:35:04.3397405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3398616Z  1948 |       0x80,
2025-12-04T12:35:04.3398874Z       |       ^~~~
2025-12-04T12:35:04.3400228Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3401426Z  1950 |       0x80,
2025-12-04T12:35:04.3401686Z       |       ^~~~
2025-12-04T12:35:04.3403039Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3404262Z  1952 |       0x80,
2025-12-04T12:35:04.3404510Z       |       ^~~~
2025-12-04T12:35:04.3405864Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3407091Z  1954 |       0x80,
2025-12-04T12:35:04.3407402Z       |       ^~~~
2025-12-04T12:35:04.3408748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3409959Z  1956 |       0x80,
2025-12-04T12:35:04.3410217Z       |       ^~~~
2025-12-04T12:35:04.3411552Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3412852Z  1958 |       0x80,
2025-12-04T12:35:04.3413112Z       |       ^~~~
2025-12-04T12:35:04.3414447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3415660Z  1960 |       0x80,
2025-12-04T12:35:04.3415929Z       |       ^~~~
2025-12-04T12:35:04.3417338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3418536Z  1962 |       0x80,
2025-12-04T12:35:04.3418791Z       |       ^~~~
2025-12-04T12:35:04.3420146Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3421362Z  1964 |       0x80,
2025-12-04T12:35:04.3421603Z       |       ^~~~
2025-12-04T12:35:04.3422946Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3424150Z  1966 |       0x80,
2025-12-04T12:35:04.3424399Z       |       ^~~~
2025-12-04T12:35:04.3425744Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3426951Z  1968 |       0x80,
2025-12-04T12:35:04.3427210Z       |       ^~~~
2025-12-04T12:35:04.3428593Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3429811Z  1970 |       0x80,
2025-12-04T12:35:04.3430068Z       |       ^~~~
2025-12-04T12:35:04.3431408Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3432601Z  1972 |       0x80,
2025-12-04T12:35:04.3432864Z       |       ^~~~
2025-12-04T12:35:04.3434206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3435402Z  1974 |       0x80,
2025-12-04T12:35:04.3435660Z       |       ^~~~
2025-12-04T12:35:04.3437014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3438231Z  1976 |       0x80,
2025-12-04T12:35:04.3438476Z       |       ^~~~
2025-12-04T12:35:04.3439817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3441025Z  1978 |       0x80,
2025-12-04T12:35:04.3441322Z       |       ^~~~
2025-12-04T12:35:04.3442652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3443860Z  1980 |       0x80,
2025-12-04T12:35:04.3444114Z       |       ^~~~
2025-12-04T12:35:04.3445479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3446741Z  1982 |       0x80,
2025-12-04T12:35:04.3446994Z       |       ^~~~
2025-12-04T12:35:04.3448330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3449532Z  1984 |       0x80,
2025-12-04T12:35:04.3449794Z       |       ^~~~
2025-12-04T12:35:04.3451135Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3452340Z  1986 |       0x80,
2025-12-04T12:35:04.3452584Z       |       ^~~~
2025-12-04T12:35:04.3453936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3455140Z  1988 |       0x80,
2025-12-04T12:35:04.3455379Z       |       ^~~~
2025-12-04T12:35:04.3456786Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3458004Z  1990 |       0x80,
2025-12-04T12:35:04.3458261Z       |       ^~~~
2025-12-04T12:35:04.3459594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3460802Z  1992 |       0x80,
2025-12-04T12:35:04.3461058Z       |       ^~~~
2025-12-04T12:35:04.3462451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.3463724Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.3464131Z       |                                      ^~~~~~
2025-12-04T12:35:04.3466850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = false; T = unsigned char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.3470195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2116:28:   required from here
2025-12-04T12:35:04.3472336Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3473553Z  1866 |       0x80,
2025-12-04T12:35:04.3473821Z       |       ^~~~
2025-12-04T12:35:04.3475347Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3476665Z  1868 |       0x80,
2025-12-04T12:35:04.3476929Z       |       ^~~~
2025-12-04T12:35:04.3478292Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3479502Z  1870 |       0x80,
2025-12-04T12:35:04.3479803Z       |       ^~~~
2025-12-04T12:35:04.3481211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3482428Z  1872 |       0x80,
2025-12-04T12:35:04.3482670Z       |       ^~~~
2025-12-04T12:35:04.3484007Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3485223Z  1874 |       0x80,
2025-12-04T12:35:04.3485482Z       |       ^~~~
2025-12-04T12:35:04.3486807Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3488026Z  1876 |       0x80,
2025-12-04T12:35:04.3488291Z       |       ^~~~
2025-12-04T12:35:04.3489653Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3490850Z  1878 |       0x80,
2025-12-04T12:35:04.3491108Z       |       ^~~~
2025-12-04T12:35:04.3492450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3493656Z  1880 |       0x80,
2025-12-04T12:35:04.3493918Z       |       ^~~~
2025-12-04T12:35:04.3495267Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3496551Z  1882 |       0x80,
2025-12-04T12:35:04.3496804Z       |       ^~~~
2025-12-04T12:35:04.3498240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3499453Z  1884 |       0x80,
2025-12-04T12:35:04.3499712Z       |       ^~~~
2025-12-04T12:35:04.3501039Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3502259Z  1886 |       0x80,
2025-12-04T12:35:04.3502516Z       |       ^~~~
2025-12-04T12:35:04.3503843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3505059Z  1888 |       0x80,
2025-12-04T12:35:04.3505327Z       |       ^~~~
2025-12-04T12:35:04.3506687Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3507883Z  1890 |       0x80,
2025-12-04T12:35:04.3508146Z       |       ^~~~
2025-12-04T12:35:04.3509488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3510745Z  1892 |       0x80,
2025-12-04T12:35:04.3510985Z       |       ^~~~
2025-12-04T12:35:04.3512330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3513575Z  1894 |       0x80,
2025-12-04T12:35:04.3513814Z       |       ^~~~
2025-12-04T12:35:04.3515195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3516400Z  1896 |       0x80,
2025-12-04T12:35:04.3516653Z       |       ^~~~
2025-12-04T12:35:04.3517976Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3519189Z  1898 |       0x80,
2025-12-04T12:35:04.3519447Z       |       ^~~~
2025-12-04T12:35:04.3520772Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3521990Z  1900 |       0x80,
2025-12-04T12:35:04.3522245Z       |       ^~~~
2025-12-04T12:35:04.3523598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3524791Z  1902 |       0x80,
2025-12-04T12:35:04.3525047Z       |       ^~~~
2025-12-04T12:35:04.3526385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3527598Z  1904 |       0x80,
2025-12-04T12:35:04.3527839Z       |       ^~~~
2025-12-04T12:35:04.3529179Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3530397Z  1906 |       0x80,
2025-12-04T12:35:04.3530638Z       |       ^~~~
2025-12-04T12:35:04.3532027Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3533240Z  1908 |       0x80,
2025-12-04T12:35:04.3533495Z       |       ^~~~
2025-12-04T12:35:04.3534819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3536041Z  1910 |       0x80,
2025-12-04T12:35:04.3536361Z       |       ^~~~
2025-12-04T12:35:04.3537722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3538925Z  1912 |       0x80,
2025-12-04T12:35:04.3539190Z       |       ^~~~
2025-12-04T12:35:04.3540542Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3541745Z  1914 |       0x80,
2025-12-04T12:35:04.3541998Z       |       ^~~~
2025-12-04T12:35:04.3543339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3544594Z  1916 |       0x80,
2025-12-04T12:35:04.3544836Z       |       ^~~~
2025-12-04T12:35:04.3546181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3547428Z  1918 |       0x80,
2025-12-04T12:35:04.3547687Z       |       ^~~~
2025-12-04T12:35:04.3549057Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3550263Z  1920 |       0x80,
2025-12-04T12:35:04.3550516Z       |       ^~~~
2025-12-04T12:35:04.3551847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3553056Z  1922 |       0x80,
2025-12-04T12:35:04.3553310Z       |       ^~~~
2025-12-04T12:35:04.3554646Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3555844Z  1924 |       0x80,
2025-12-04T12:35:04.3556101Z       |       ^~~~
2025-12-04T12:35:04.3557448Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3558652Z  1926 |       0x80,
2025-12-04T12:35:04.3558896Z       |       ^~~~
2025-12-04T12:35:04.3560230Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3561449Z  1928 |       0x80);
2025-12-04T12:35:04.3561703Z       |       ^~~~
2025-12-04T12:35:04.3563048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3564265Z  1930 |       0x80,
2025-12-04T12:35:04.3564526Z       |       ^~~~
2025-12-04T12:35:04.3565900Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3567112Z  1932 |       0x80,
2025-12-04T12:35:04.3567368Z       |       ^~~~
2025-12-04T12:35:04.3568698Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3569908Z  1934 |       0x80,
2025-12-04T12:35:04.3570163Z       |       ^~~~
2025-12-04T12:35:04.3571689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3572893Z  1936 |       0x80,
2025-12-04T12:35:04.3573155Z       |       ^~~~
2025-12-04T12:35:04.3574505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3575722Z  1938 |       0x80,
2025-12-04T12:35:04.3575966Z       |       ^~~~
2025-12-04T12:35:04.3577380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3578684Z  1940 |       0x80,
2025-12-04T12:35:04.3578929Z       |       ^~~~
2025-12-04T12:35:04.3580283Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3581568Z  1942 |       0x80,
2025-12-04T12:35:04.3581835Z       |       ^~~~
2025-12-04T12:35:04.3583229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3584457Z  1944 |       0x80,
2025-12-04T12:35:04.3584721Z       |       ^~~~
2025-12-04T12:35:04.3586085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3587290Z  1946 |       0x80,
2025-12-04T12:35:04.3587551Z       |       ^~~~
2025-12-04T12:35:04.3588893Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3590110Z  1948 |       0x80,
2025-12-04T12:35:04.3590411Z       |       ^~~~
2025-12-04T12:35:04.3591765Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3592977Z  1950 |       0x80,
2025-12-04T12:35:04.3593224Z       |       ^~~~
2025-12-04T12:35:04.3594576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3595801Z  1952 |       0x80,
2025-12-04T12:35:04.3596064Z       |       ^~~~
2025-12-04T12:35:04.3597393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3598615Z  1954 |       0x80,
2025-12-04T12:35:04.3598879Z       |       ^~~~
2025-12-04T12:35:04.3600209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3601411Z  1956 |       0x80,
2025-12-04T12:35:04.3601665Z       |       ^~~~
2025-12-04T12:35:04.3603001Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3604239Z  1958 |       0x80,
2025-12-04T12:35:04.3604493Z       |       ^~~~
2025-12-04T12:35:04.3605833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3607085Z  1960 |       0x80,
2025-12-04T12:35:04.3607365Z       |       ^~~~
2025-12-04T12:35:04.3608709Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3609919Z  1962 |       0x80,
2025-12-04T12:35:04.3610160Z       |       ^~~~
2025-12-04T12:35:04.3611495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3612710Z  1964 |       0x80,
2025-12-04T12:35:04.3612967Z       |       ^~~~
2025-12-04T12:35:04.3614299Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3615519Z  1966 |       0x80,
2025-12-04T12:35:04.3615781Z       |       ^~~~
2025-12-04T12:35:04.3617181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3618383Z  1968 |       0x80,
2025-12-04T12:35:04.3618641Z       |       ^~~~
2025-12-04T12:35:04.3620000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3621201Z  1970 |       0x80,
2025-12-04T12:35:04.3621462Z       |       ^~~~
2025-12-04T12:35:04.3622799Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3624075Z  1972 |       0x80,
2025-12-04T12:35:04.3624328Z       |       ^~~~
2025-12-04T12:35:04.3625680Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3626889Z  1974 |       0x80,
2025-12-04T12:35:04.3627133Z       |       ^~~~
2025-12-04T12:35:04.3628481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3629700Z  1976 |       0x80,
2025-12-04T12:35:04.3629960Z       |       ^~~~
2025-12-04T12:35:04.3631282Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3632513Z  1978 |       0x80,
2025-12-04T12:35:04.3632765Z       |       ^~~~
2025-12-04T12:35:04.3634102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3635315Z  1980 |       0x80,
2025-12-04T12:35:04.3635567Z       |       ^~~~
2025-12-04T12:35:04.3636947Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3638148Z  1982 |       0x80,
2025-12-04T12:35:04.3638391Z       |       ^~~~
2025-12-04T12:35:04.3639720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3641002Z  1984 |       0x80,
2025-12-04T12:35:04.3641248Z       |       ^~~~
2025-12-04T12:35:04.3642584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3643789Z  1986 |       0x80,
2025-12-04T12:35:04.3644045Z       |       ^~~~
2025-12-04T12:35:04.3645375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3646587Z  1988 |       0x80,
2025-12-04T12:35:04.3646840Z       |       ^~~~
2025-12-04T12:35:04.3648175Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3649395Z  1990 |       0x80,
2025-12-04T12:35:04.3649651Z       |       ^~~~
2025-12-04T12:35:04.3650994Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3652187Z  1992 |       0x80,
2025-12-04T12:35:04.3652447Z       |       ^~~~
2025-12-04T12:35:04.3653798Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.3655076Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.3655469Z       |                                      ^~~~~~
2025-12-04T12:35:04.3656243Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:16,
2025-12-04T12:35:04.3657390Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.3658353Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.3659333Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.3660358Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.3661625Z                  from /tmp/ROfn0q/tmpejwtemx7/data/aotinductor/model/cn3k2mlnpdktb5d42n3gbws3qpzrim5w2lb6w5t7cv3mzl7dq3b5.wrapper.cpp:723:
2025-12-04T12:35:04.3663957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = signed char; int64_t = long int]’:
2025-12-04T12:35:04.3665830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:696:31:   required from here
2025-12-04T12:35:04.3667780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3669072Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3669417Z       |       ^~~~
2025-12-04T12:35:04.3670770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3672203Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3672547Z       |             ^~~~
2025-12-04T12:35:04.3674084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3675315Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3675665Z       |                   ^~~~
2025-12-04T12:35:04.3677069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3678315Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3678649Z       |                         ^~~~
2025-12-04T12:35:04.3680073Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3681319Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3681640Z       |       ^~~~
2025-12-04T12:35:04.3683009Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3684246Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3684588Z       |             ^~~~
2025-12-04T12:35:04.3685951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3687194Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3687544Z       |                   ^~~~
2025-12-04T12:35:04.3688937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3690231Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3690578Z       |                         ^~~~
2025-12-04T12:35:04.3691990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3693215Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3693529Z       |       ^~~~
2025-12-04T12:35:04.3694893Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3696123Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3696533Z       |             ^~~~
2025-12-04T12:35:04.3697929Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3699178Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3699519Z       |                   ^~~~
2025-12-04T12:35:04.3700902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3702227Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3702572Z       |                         ^~~~
2025-12-04T12:35:04.3703987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3705201Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3705574Z       |       ^~~~
2025-12-04T12:35:04.3706976Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3708217Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3708540Z       |             ^~~~
2025-12-04T12:35:04.3709921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3711159Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3711484Z       |                   ^~~~
2025-12-04T12:35:04.3712882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3714117Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3714458Z       |                         ^~~~
2025-12-04T12:35:04.3715866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3717104Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3717434Z       |       ^~~~
2025-12-04T12:35:04.3718783Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3720007Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3720343Z       |             ^~~~
2025-12-04T12:35:04.3721717Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3722990Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3723336Z       |                   ^~~~
2025-12-04T12:35:04.3724732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3725961Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3726284Z       |                         ^~~~
2025-12-04T12:35:04.3727700Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3728945Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3729276Z       |       ^~~~
2025-12-04T12:35:04.3731342Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3732587Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3732930Z       |             ^~~~
2025-12-04T12:35:04.3734300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3735595Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3735932Z       |                   ^~~~
2025-12-04T12:35:04.3737393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3738619Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3739006Z       |                         ^~~~
2025-12-04T12:35:04.3740476Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3741710Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3742028Z       |       ^~~~
2025-12-04T12:35:04.3743381Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3744794Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3745131Z       |             ^~~~
2025-12-04T12:35:04.3746509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3747752Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3748097Z       |                   ^~~~
2025-12-04T12:35:04.3749496Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3750729Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3751067Z       |                         ^~~~
2025-12-04T12:35:04.3752483Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3753705Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3754033Z       |       ^~~~
2025-12-04T12:35:04.3755385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3756688Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3757019Z       |             ^~~~
2025-12-04T12:35:04.3758402Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3759637Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3759975Z       |                   ^~~~
2025-12-04T12:35:04.3761366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3762606Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3762952Z       |                         ^~~~
2025-12-04T12:35:04.3764363Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3765608Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3765946Z       |       ^~~~
2025-12-04T12:35:04.3767304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3768598Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3768938Z       |             ^~~~
2025-12-04T12:35:04.3770329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3770448Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3770610Z       |                   ^~~~
2025-12-04T12:35:04.3772069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3772186Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3772305Z       |                         ^~~~
2025-12-04T12:35:04.3773495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3773633Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3773728Z       |       ^~~~
2025-12-04T12:35:04.3774917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3775053Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3775152Z       |             ^~~~
2025-12-04T12:35:04.3776863Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3776980Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3777087Z       |                   ^~~~
2025-12-04T12:35:04.3778304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3778424Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3778527Z       |                         ^~~~
2025-12-04T12:35:04.3779732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3779861Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3779975Z       |       ^~~~
2025-12-04T12:35:04.3781156Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3781270Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3781379Z       |             ^~~~
2025-12-04T12:35:04.3782637Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3782764Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3782864Z       |                   ^~~~
2025-12-04T12:35:04.3784091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3784266Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3784366Z       |                         ^~~~
2025-12-04T12:35:04.3785565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3785685Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3785777Z       |       ^~~~
2025-12-04T12:35:04.3786971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3787083Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3787187Z       |             ^~~~
2025-12-04T12:35:04.3788401Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3788515Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3788638Z       |                   ^~~~
2025-12-04T12:35:04.3789820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3789940Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3790057Z       |                         ^~~~
2025-12-04T12:35:04.3791563Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = unsigned char; int64_t = long int]’:
2025-12-04T12:35:04.3792249Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:933:31:   required from here
2025-12-04T12:35:04.3793445Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3793560Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3793669Z       |       ^~~~
2025-12-04T12:35:04.3794859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3794985Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3795087Z       |             ^~~~
2025-12-04T12:35:04.3796281Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3796423Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3796525Z       |                   ^~~~
2025-12-04T12:35:04.3797726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3797840Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3797988Z       |                         ^~~~
2025-12-04T12:35:04.3799185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3799298Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3799392Z       |       ^~~~
2025-12-04T12:35:04.3800666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3800782Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3800893Z       |             ^~~~
2025-12-04T12:35:04.3802078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3802197Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3802310Z       |                   ^~~~
2025-12-04T12:35:04.3803492Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3803625Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3803725Z       |                         ^~~~
2025-12-04T12:35:04.3804914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3805043Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3805135Z       |       ^~~~
2025-12-04T12:35:04.3806335Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3806455Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3806554Z       |             ^~~~
2025-12-04T12:35:04.3807752Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3807912Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3808023Z       |                   ^~~~
2025-12-04T12:35:04.3809217Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3809332Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3809448Z       |                         ^~~~
2025-12-04T12:35:04.3810633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3810745Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3810853Z       |       ^~~~
2025-12-04T12:35:04.3812043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3812183Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3812280Z       |             ^~~~
2025-12-04T12:35:04.3813457Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3813584Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3813723Z       |                   ^~~~
2025-12-04T12:35:04.3814920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3815034Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3815139Z       |                         ^~~~
2025-12-04T12:35:04.3816498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3816615Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3816710Z       |       ^~~~
2025-12-04T12:35:04.3817920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3818040Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3818151Z       |             ^~~~
2025-12-04T12:35:04.3819333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3819452Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3819569Z       |                   ^~~~
2025-12-04T12:35:04.3820769Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3820897Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3821003Z       |                         ^~~~
2025-12-04T12:35:04.3822178Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3822310Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3822404Z       |       ^~~~
2025-12-04T12:35:04.3823601Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3823762Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3823867Z       |             ^~~~
2025-12-04T12:35:04.3825063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3825176Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3825277Z       |                   ^~~~
2025-12-04T12:35:04.3826479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3826590Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3826703Z       |                         ^~~~
2025-12-04T12:35:04.3827892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3828011Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3828117Z       |       ^~~~
2025-12-04T12:35:04.3829298Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3829466Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3829566Z       |             ^~~~
2025-12-04T12:35:04.3830757Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3830882Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3830982Z       |                   ^~~~
2025-12-04T12:35:04.3832253Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3832368Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3832472Z       |                         ^~~~
2025-12-04T12:35:04.3833666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3833783Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3833876Z       |       ^~~~
2025-12-04T12:35:04.3835077Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3835198Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3835308Z       |             ^~~~
2025-12-04T12:35:04.3836499Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3836611Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3836729Z       |                   ^~~~
2025-12-04T12:35:04.3837912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3838047Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3838150Z       |                         ^~~~
2025-12-04T12:35:04.3839327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3839507Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3839610Z       |       ^~~~
2025-12-04T12:35:04.3840804Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3840931Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3841028Z       |             ^~~~
2025-12-04T12:35:04.3842232Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3842345Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3842446Z       |                   ^~~~
2025-12-04T12:35:04.3843649Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3843769Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3843884Z       |                         ^~~~
2025-12-04T12:35:04.3845061Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3845217Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3845324Z       |       ^~~~
2025-12-04T12:35:04.3846504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3846632Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3846763Z       |             ^~~~
2025-12-04T12:35:04.3848007Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3848135Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3848237Z       |                   ^~~~
2025-12-04T12:35:04.3849422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3849555Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3849658Z       |                         ^~~~
2025-12-04T12:35:04.3850848Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3850986Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3851080Z       |       ^~~~
2025-12-04T12:35:04.3852286Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3852401Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3852515Z       |             ^~~~
2025-12-04T12:35:04.3853704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3853825Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3853942Z       |                   ^~~~
2025-12-04T12:35:04.3855129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3855314Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3855423Z       |                         ^~~~
2025-12-04T12:35:04.3856684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3856815Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3856909Z       |       ^~~~
2025-12-04T12:35:04.3858108Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3858238Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3858337Z       |             ^~~~
2025-12-04T12:35:04.3859554Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3859677Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3859780Z       |                   ^~~~
2025-12-04T12:35:04.3860983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3861147Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.3861267Z       |                         ^~~~
2025-12-04T12:35:04.3861374Z PASSED [9.4856s] [ 26%]
2025-12-04T12:35:04.3862365Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_bool_input In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_float.h:12,
2025-12-04T12:35:04.3862825Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:11,
2025-12-04T12:35:04.3863280Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.3863739Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.3864148Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.3864618Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.3865260Z                  from /tmp/Qt7dz2/tmpq_68ffd4/data/aotinductor/model/cad56iehvyrgd23725fkkazyktxy2vkmfdx2f6hgjyqe5hsp2q7e.wrapper.cpp:656:
2025-12-04T12:35:04.3865868Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/sleef.h:192:10: warning: ISO C++ prohibits anonymous structs [-Wpedantic]
2025-12-04T12:35:04.3865989Z   192 |   struct {
2025-12-04T12:35:04.3866085Z       |          ^
2025-12-04T12:35:04.3866593Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15,
2025-12-04T12:35:04.3866973Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.3867421Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.3867827Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.3868307Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.3868927Z                  from /tmp/Qt7dz2/tmpq_68ffd4/data/aotinductor/model/cad56iehvyrgd23725fkkazyktxy2vkmfdx2f6hgjyqe5hsp2q7e.wrapper.cpp:656:
2025-12-04T12:35:04.3871468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<short int>&, const at::vec::CPU_CAPABILITY::Vectorized<short int>&, const at::vec::CPU_CAPABILITY::Vectorized<short int>&)’:
2025-12-04T12:35:04.3872660Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:544:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.3872833Z   544 |     auto msb_one = _mm512_set1_epi16(0xFFFF);
2025-12-04T12:35:04.3872951Z       |                                      ^~~~~~
2025-12-04T12:35:04.3873454Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15,
2025-12-04T12:35:04.3873842Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.3874296Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.3874715Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.3875178Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.3875848Z                  from /tmp/Qt7dz2/tmpq_68ffd4/data/aotinductor/model/cad56iehvyrgd23725fkkazyktxy2vkmfdx2f6hgjyqe5hsp2q7e.wrapper.cpp:656:
2025-12-04T12:35:04.3877492Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.3878761Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:697:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.3878988Z   697 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.3879114Z       |                                                      ^~~~~~
2025-12-04T12:35:04.3880748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.3881918Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:701:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.3882131Z   701 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.3882283Z       |                                                      ^~~~~~
2025-12-04T12:35:04.3883894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.3885072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:705:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.3885283Z   705 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.3885408Z       |                                                      ^~~~~~
2025-12-04T12:35:04.3887084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.3888256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:709:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.3888471Z   709 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.3888635Z       |                                                      ^~~~~~
2025-12-04T12:35:04.3890731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator>(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.3891997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:713:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.3892205Z   713 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.3892346Z       |                                                      ^~~~~~
2025-12-04T12:35:04.3893961Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator>=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.3895144Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:717:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.3895355Z   717 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.3895498Z       |                                                      ^~~~~~
2025-12-04T12:35:04.3898009Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&, const at::vec::CPU_CAPABILITY::Vectorized<signed char>&, const at::vec::CPU_CAPABILITY::Vectorized<signed char>&)’:
2025-12-04T12:35:04.3899233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1153:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3899397Z  1153 |     auto msb_one = _mm512_set1_epi8(0xFF);
2025-12-04T12:35:04.3899513Z       |                                     ^~~~
2025-12-04T12:35:04.3901255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.3902452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1166:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3902679Z  1166 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.3902803Z       |                                                     ^~~~
2025-12-04T12:35:04.3904455Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.3905676Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1170:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3905882Z  1170 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.3906021Z       |                                                     ^~~~
2025-12-04T12:35:04.3907675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.3908916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1174:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3909174Z  1174 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.3909340Z       |                                                     ^~~~
2025-12-04T12:35:04.3911011Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.3912203Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1178:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3912420Z  1178 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.3912545Z       |                                                     ^~~~
2025-12-04T12:35:04.3914887Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&, const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&, const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&)’:
2025-12-04T12:35:04.3916093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1207:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3916247Z  1207 |     auto msb_one = _mm512_set1_epi8(0xFF);
2025-12-04T12:35:04.3916378Z       |                                     ^~~~
2025-12-04T12:35:04.3918080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.3919345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1220:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3919555Z  1220 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.3919679Z       |                                                     ^~~~
2025-12-04T12:35:04.3921389Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.3922573Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1224:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3922805Z  1224 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.3922929Z       |                                                     ^~~~
2025-12-04T12:35:04.3924632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.3925858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1228:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3926058Z  1228 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.3926230Z       |                                                     ^~~~
2025-12-04T12:35:04.3927957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.3929161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1232:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.3929367Z  1232 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.3929489Z       |                                                     ^~~~
2025-12-04T12:35:04.3931883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = true; T = signed char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.3932471Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2074:27:   required from here
2025-12-04T12:35:04.3933672Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3933775Z  1866 |       0x80,
2025-12-04T12:35:04.3933885Z       |       ^~~~
2025-12-04T12:35:04.3935065Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3935167Z  1868 |       0x80,
2025-12-04T12:35:04.3935273Z       |       ^~~~
2025-12-04T12:35:04.3936591Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3936702Z  1870 |       0x80,
2025-12-04T12:35:04.3936796Z       |       ^~~~
2025-12-04T12:35:04.3937990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3938108Z  1872 |       0x80,
2025-12-04T12:35:04.3938201Z       |       ^~~~
2025-12-04T12:35:04.3939375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3939492Z  1874 |       0x80,
2025-12-04T12:35:04.3939584Z       |       ^~~~
2025-12-04T12:35:04.3940784Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3940879Z  1876 |       0x80,
2025-12-04T12:35:04.3940969Z       |       ^~~~
2025-12-04T12:35:04.3942155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3942288Z  1878 |       0x80,
2025-12-04T12:35:04.3942381Z       |       ^~~~
2025-12-04T12:35:04.3943579Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3943713Z  1880 |       0x80,
2025-12-04T12:35:04.3943823Z       |       ^~~~
2025-12-04T12:35:04.3945040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3945134Z  1882 |       0x80,
2025-12-04T12:35:04.3945239Z       |       ^~~~
2025-12-04T12:35:04.3946421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3946535Z  1884 |       0x80,
2025-12-04T12:35:04.3946628Z       |       ^~~~
2025-12-04T12:35:04.3947804Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3947917Z  1886 |       0x80,
2025-12-04T12:35:04.3948018Z       |       ^~~~
2025-12-04T12:35:04.3949197Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3949303Z  1888 |       0x80,
2025-12-04T12:35:04.3949394Z       |       ^~~~
2025-12-04T12:35:04.3950577Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3950677Z  1890 |       0x80,
2025-12-04T12:35:04.3950769Z       |       ^~~~
2025-12-04T12:35:04.3951962Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3952065Z  1892 |       0x80,
2025-12-04T12:35:04.3952212Z       |       ^~~~
2025-12-04T12:35:04.3953396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3953492Z  1894 |       0x80,
2025-12-04T12:35:04.3953604Z       |       ^~~~
2025-12-04T12:35:04.3954776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3954876Z  1896 |       0x80,
2025-12-04T12:35:04.3954985Z       |       ^~~~
2025-12-04T12:35:04.3956160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3956276Z  1898 |       0x80,
2025-12-04T12:35:04.3956380Z       |       ^~~~
2025-12-04T12:35:04.3957553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3957660Z  1900 |       0x80,
2025-12-04T12:35:04.3957751Z       |       ^~~~
2025-12-04T12:35:04.3958941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3959073Z  1902 |       0x80,
2025-12-04T12:35:04.3959165Z       |       ^~~~
2025-12-04T12:35:04.3960357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3960493Z  1904 |       0x80,
2025-12-04T12:35:04.3960621Z       |       ^~~~
2025-12-04T12:35:04.3961813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3961907Z  1906 |       0x80,
2025-12-04T12:35:04.3962012Z       |       ^~~~
2025-12-04T12:35:04.3963191Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3963290Z  1908 |       0x80,
2025-12-04T12:35:04.3963396Z       |       ^~~~
2025-12-04T12:35:04.3964571Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3964689Z  1910 |       0x80,
2025-12-04T12:35:04.3964788Z       |       ^~~~
2025-12-04T12:35:04.3965967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3966077Z  1912 |       0x80,
2025-12-04T12:35:04.3966171Z       |       ^~~~
2025-12-04T12:35:04.3967352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3967467Z  1914 |       0x80,
2025-12-04T12:35:04.3967562Z       |       ^~~~
2025-12-04T12:35:04.3968755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3968898Z  1916 |       0x80,
2025-12-04T12:35:04.3968999Z       |       ^~~~
2025-12-04T12:35:04.3970190Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3970287Z  1918 |       0x80,
2025-12-04T12:35:04.3970395Z       |       ^~~~
2025-12-04T12:35:04.3971771Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3971868Z  1920 |       0x80,
2025-12-04T12:35:04.3971976Z       |       ^~~~
2025-12-04T12:35:04.3973155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3973265Z  1922 |       0x80,
2025-12-04T12:35:04.3973381Z       |       ^~~~
2025-12-04T12:35:04.3974558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3974669Z  1924 |       0x80,
2025-12-04T12:35:04.3974763Z       |       ^~~~
2025-12-04T12:35:04.3976059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3976171Z  1926 |       0x80,
2025-12-04T12:35:04.3976265Z       |       ^~~~
2025-12-04T12:35:04.3977528Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3977742Z  1928 |       0x80);
2025-12-04T12:35:04.3977841Z       |       ^~~~
2025-12-04T12:35:04.3979037Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3979133Z  1930 |       0x80,
2025-12-04T12:35:04.3979226Z       |       ^~~~
2025-12-04T12:35:04.3980427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3980522Z  1932 |       0x80,
2025-12-04T12:35:04.3980630Z       |       ^~~~
2025-12-04T12:35:04.3981813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3981920Z  1934 |       0x80,
2025-12-04T12:35:04.3982030Z       |       ^~~~
2025-12-04T12:35:04.3983206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3983315Z  1936 |       0x80,
2025-12-04T12:35:04.3983411Z       |       ^~~~
2025-12-04T12:35:04.3984588Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3984694Z  1938 |       0x80,
2025-12-04T12:35:04.3984788Z       |       ^~~~
2025-12-04T12:35:04.3986012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3986133Z  1940 |       0x80,
2025-12-04T12:35:04.3986224Z       |       ^~~~
2025-12-04T12:35:04.3987412Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3987507Z  1942 |       0x80,
2025-12-04T12:35:04.3987604Z       |       ^~~~
2025-12-04T12:35:04.3988794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3988889Z  1944 |       0x80,
2025-12-04T12:35:04.3988983Z       |       ^~~~
2025-12-04T12:35:04.3990171Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3990284Z  1946 |       0x80,
2025-12-04T12:35:04.3990391Z       |       ^~~~
2025-12-04T12:35:04.3991569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3991662Z  1948 |       0x80,
2025-12-04T12:35:04.3991808Z       |       ^~~~
2025-12-04T12:35:04.3992983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3993095Z  1950 |       0x80,
2025-12-04T12:35:04.3993188Z       |       ^~~~
2025-12-04T12:35:04.3994398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3994541Z  1952 |       0x80,
2025-12-04T12:35:04.3994635Z       |       ^~~~
2025-12-04T12:35:04.3995811Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3995921Z  1954 |       0x80,
2025-12-04T12:35:04.3996021Z       |       ^~~~
2025-12-04T12:35:04.3997206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3997301Z  1956 |       0x80,
2025-12-04T12:35:04.3997393Z       |       ^~~~
2025-12-04T12:35:04.3998595Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.3998695Z  1958 |       0x80,
2025-12-04T12:35:04.3998803Z       |       ^~~~
2025-12-04T12:35:04.3999974Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4000068Z  1960 |       0x80,
2025-12-04T12:35:04.4000180Z       |       ^~~~
2025-12-04T12:35:04.4001350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4001443Z  1962 |       0x80,
2025-12-04T12:35:04.4001554Z       |       ^~~~
2025-12-04T12:35:04.4002767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4002882Z  1964 |       0x80,
2025-12-04T12:35:04.4002975Z       |       ^~~~
2025-12-04T12:35:04.4004156Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4004300Z  1966 |       0x80,
2025-12-04T12:35:04.4004391Z       |       ^~~~
2025-12-04T12:35:04.4005580Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4005674Z  1968 |       0x80,
2025-12-04T12:35:04.4005766Z       |       ^~~~
2025-12-04T12:35:04.4006992Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4007120Z  1970 |       0x80,
2025-12-04T12:35:04.4007211Z       |       ^~~~
2025-12-04T12:35:04.4008400Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4008503Z  1972 |       0x80,
2025-12-04T12:35:04.4008609Z       |       ^~~~
2025-12-04T12:35:04.4009784Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4009880Z  1974 |       0x80,
2025-12-04T12:35:04.4009987Z       |       ^~~~
2025-12-04T12:35:04.4011177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4011286Z  1976 |       0x80,
2025-12-04T12:35:04.4011380Z       |       ^~~~
2025-12-04T12:35:04.4012556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4012670Z  1978 |       0x80,
2025-12-04T12:35:04.4012763Z       |       ^~~~
2025-12-04T12:35:04.4013937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4014046Z  1980 |       0x80,
2025-12-04T12:35:04.4014140Z       |       ^~~~
2025-12-04T12:35:04.4015373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4015468Z  1982 |       0x80,
2025-12-04T12:35:04.4015562Z       |       ^~~~
2025-12-04T12:35:04.4016822Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4016926Z  1984 |       0x80,
2025-12-04T12:35:04.4017022Z       |       ^~~~
2025-12-04T12:35:04.4018218Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4018315Z  1986 |       0x80,
2025-12-04T12:35:04.4018424Z       |       ^~~~
2025-12-04T12:35:04.4019618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4019713Z  1988 |       0x80,
2025-12-04T12:35:04.4019822Z       |       ^~~~
2025-12-04T12:35:04.4020998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4021152Z  1990 |       0x80,
2025-12-04T12:35:04.4021244Z       |       ^~~~
2025-12-04T12:35:04.4022426Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4022533Z  1992 |       0x80,
2025-12-04T12:35:04.4022626Z       |       ^~~~
2025-12-04T12:35:04.4023896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.4024056Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.4024174Z       |                                      ^~~~~~
2025-12-04T12:35:04.4026595Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = true; T = unsigned char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.4027185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2081:27:   required from here
2025-12-04T12:35:04.4028399Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4028494Z  1866 |       0x80,
2025-12-04T12:35:04.4028586Z       |       ^~~~
2025-12-04T12:35:04.4029783Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4029883Z  1868 |       0x80,
2025-12-04T12:35:04.4029991Z       |       ^~~~
2025-12-04T12:35:04.4031166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4031259Z  1870 |       0x80,
2025-12-04T12:35:04.4031371Z       |       ^~~~
2025-12-04T12:35:04.4032611Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4032719Z  1872 |       0x80,
2025-12-04T12:35:04.4032811Z       |       ^~~~
2025-12-04T12:35:04.4033986Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4034099Z  1874 |       0x80,
2025-12-04T12:35:04.4034192Z       |       ^~~~
2025-12-04T12:35:04.4035364Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4035470Z  1876 |       0x80,
2025-12-04T12:35:04.4035570Z       |       ^~~~
2025-12-04T12:35:04.4036770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4036866Z  1878 |       0x80,
2025-12-04T12:35:04.4036958Z       |       ^~~~
2025-12-04T12:35:04.4038154Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4038283Z  1880 |       0x80,
2025-12-04T12:35:04.4038377Z       |       ^~~~
2025-12-04T12:35:04.4039577Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4039708Z  1882 |       0x80,
2025-12-04T12:35:04.4039813Z       |       ^~~~
2025-12-04T12:35:04.4041033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4041133Z  1884 |       0x80,
2025-12-04T12:35:04.4041247Z       |       ^~~~
2025-12-04T12:35:04.4042423Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4042542Z  1886 |       0x80,
2025-12-04T12:35:04.4042636Z       |       ^~~~
2025-12-04T12:35:04.4043810Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4043925Z  1888 |       0x80,
2025-12-04T12:35:04.4044020Z       |       ^~~~
2025-12-04T12:35:04.4045199Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4045308Z  1890 |       0x80,
2025-12-04T12:35:04.4045400Z       |       ^~~~
2025-12-04T12:35:04.4046591Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4046690Z  1892 |       0x80,
2025-12-04T12:35:04.4046782Z       |       ^~~~
2025-12-04T12:35:04.4047972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4048074Z  1894 |       0x80,
2025-12-04T12:35:04.4048181Z       |       ^~~~
2025-12-04T12:35:04.4049401Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4049495Z  1896 |       0x80,
2025-12-04T12:35:04.4049600Z       |       ^~~~
2025-12-04T12:35:04.4050774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4050874Z  1898 |       0x80,
2025-12-04T12:35:04.4050981Z       |       ^~~~
2025-12-04T12:35:04.4052150Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4052265Z  1900 |       0x80,
2025-12-04T12:35:04.4052376Z       |       ^~~~
2025-12-04T12:35:04.4053575Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4053685Z  1902 |       0x80,
2025-12-04T12:35:04.4053779Z       |       ^~~~
2025-12-04T12:35:04.4054966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4055101Z  1904 |       0x80,
2025-12-04T12:35:04.4055195Z       |       ^~~~
2025-12-04T12:35:04.4056447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4056589Z  1906 |       0x80,
2025-12-04T12:35:04.4056685Z       |       ^~~~
2025-12-04T12:35:04.4057922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4058021Z  1908 |       0x80,
2025-12-04T12:35:04.4058130Z       |       ^~~~
2025-12-04T12:35:04.4059308Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4059410Z  1910 |       0x80,
2025-12-04T12:35:04.4059522Z       |       ^~~~
2025-12-04T12:35:04.4060695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4060812Z  1912 |       0x80,
2025-12-04T12:35:04.4060908Z       |       ^~~~
2025-12-04T12:35:04.4062107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4062220Z  1914 |       0x80,
2025-12-04T12:35:04.4062314Z       |       ^~~~
2025-12-04T12:35:04.4063502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4063619Z  1916 |       0x80,
2025-12-04T12:35:04.4063714Z       |       ^~~~
2025-12-04T12:35:04.4064903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4065006Z  1918 |       0x80,
2025-12-04T12:35:04.4065102Z       |       ^~~~
2025-12-04T12:35:04.4066340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4066438Z  1920 |       0x80,
2025-12-04T12:35:04.4066534Z       |       ^~~~
2025-12-04T12:35:04.4067722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4067822Z  1922 |       0x80,
2025-12-04T12:35:04.4067929Z       |       ^~~~
2025-12-04T12:35:04.4069101Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4069202Z  1924 |       0x80,
2025-12-04T12:35:04.4069315Z       |       ^~~~
2025-12-04T12:35:04.4070495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4070609Z  1926 |       0x80,
2025-12-04T12:35:04.4070703Z       |       ^~~~
2025-12-04T12:35:04.4072100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4072303Z  1928 |       0x80);
2025-12-04T12:35:04.4072395Z       |       ^~~~
2025-12-04T12:35:04.4073589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4073735Z  1930 |       0x80,
2025-12-04T12:35:04.4073835Z       |       ^~~~
2025-12-04T12:35:04.4075079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4075175Z  1932 |       0x80,
2025-12-04T12:35:04.4075266Z       |       ^~~~
2025-12-04T12:35:04.4076463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4076561Z  1934 |       0x80,
2025-12-04T12:35:04.4076666Z       |       ^~~~
2025-12-04T12:35:04.4077837Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4077939Z  1936 |       0x80,
2025-12-04T12:35:04.4078052Z       |       ^~~~
2025-12-04T12:35:04.4079240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4079350Z  1938 |       0x80,
2025-12-04T12:35:04.4079444Z       |       ^~~~
2025-12-04T12:35:04.4080620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4080738Z  1940 |       0x80,
2025-12-04T12:35:04.4080829Z       |       ^~~~
2025-12-04T12:35:04.4082004Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4082121Z  1942 |       0x80,
2025-12-04T12:35:04.4082272Z       |       ^~~~
2025-12-04T12:35:04.4083462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4083557Z  1944 |       0x80,
2025-12-04T12:35:04.4083649Z       |       ^~~~
2025-12-04T12:35:04.4084835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4084936Z  1946 |       0x80,
2025-12-04T12:35:04.4085028Z       |       ^~~~
2025-12-04T12:35:04.4086211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4086317Z  1948 |       0x80,
2025-12-04T12:35:04.4086431Z       |       ^~~~
2025-12-04T12:35:04.4087608Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4087702Z  1950 |       0x80,
2025-12-04T12:35:04.4087811Z       |       ^~~~
2025-12-04T12:35:04.4088983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4089128Z  1952 |       0x80,
2025-12-04T12:35:04.4089222Z       |       ^~~~
2025-12-04T12:35:04.4090393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4090545Z  1954 |       0x80,
2025-12-04T12:35:04.4090688Z       |       ^~~~
2025-12-04T12:35:04.4091862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4091969Z  1956 |       0x80,
2025-12-04T12:35:04.4092062Z       |       ^~~~
2025-12-04T12:35:04.4093250Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4093345Z  1958 |       0x80,
2025-12-04T12:35:04.4093439Z       |       ^~~~
2025-12-04T12:35:04.4094630Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4094762Z  1960 |       0x80,
2025-12-04T12:35:04.4094873Z       |       ^~~~
2025-12-04T12:35:04.4096048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4096144Z  1962 |       0x80,
2025-12-04T12:35:04.4096251Z       |       ^~~~
2025-12-04T12:35:04.4097518Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4097616Z  1964 |       0x80,
2025-12-04T12:35:04.4097731Z       |       ^~~~
2025-12-04T12:35:04.4098907Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4099036Z  1966 |       0x80,
2025-12-04T12:35:04.4099131Z       |       ^~~~
2025-12-04T12:35:04.4100306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4100419Z  1968 |       0x80,
2025-12-04T12:35:04.4100513Z       |       ^~~~
2025-12-04T12:35:04.4101763Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4101860Z  1970 |       0x80,
2025-12-04T12:35:04.4101955Z       |       ^~~~
2025-12-04T12:35:04.4103152Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4103321Z  1972 |       0x80,
2025-12-04T12:35:04.4103413Z       |       ^~~~
2025-12-04T12:35:04.4104599Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4104695Z  1974 |       0x80,
2025-12-04T12:35:04.4104801Z       |       ^~~~
2025-12-04T12:35:04.4105978Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4106072Z  1976 |       0x80,
2025-12-04T12:35:04.4106177Z       |       ^~~~
2025-12-04T12:35:04.4107360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4107479Z  1978 |       0x80,
2025-12-04T12:35:04.4107572Z       |       ^~~~
2025-12-04T12:35:04.4108746Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4108856Z  1980 |       0x80,
2025-12-04T12:35:04.4108948Z       |       ^~~~
2025-12-04T12:35:04.4110743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4110855Z  1982 |       0x80,
2025-12-04T12:35:04.4110952Z       |       ^~~~
2025-12-04T12:35:04.4112208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4112315Z  1984 |       0x80,
2025-12-04T12:35:04.4112408Z       |       ^~~~
2025-12-04T12:35:04.4113598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4113693Z  1986 |       0x80,
2025-12-04T12:35:04.4113792Z       |       ^~~~
2025-12-04T12:35:04.4114979Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4115076Z  1988 |       0x80,
2025-12-04T12:35:04.4115185Z       |       ^~~~
2025-12-04T12:35:04.4116363Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4116469Z  1990 |       0x80,
2025-12-04T12:35:04.4116577Z       |       ^~~~
2025-12-04T12:35:04.4117756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4117862Z  1992 |       0x80,
2025-12-04T12:35:04.4117993Z       |       ^~~~
2025-12-04T12:35:04.4119172Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.4119341Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.4119459Z       |                                      ^~~~~~
2025-12-04T12:35:04.4121969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = false; T = signed char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.4122553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2109:28:   required from here
2025-12-04T12:35:04.4123912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4124022Z  1866 |       0x80,
2025-12-04T12:35:04.4124116Z       |       ^~~~
2025-12-04T12:35:04.4125326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4125425Z  1868 |       0x80,
2025-12-04T12:35:04.4125518Z       |       ^~~~
2025-12-04T12:35:04.4126716Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4126816Z  1870 |       0x80,
2025-12-04T12:35:04.4126921Z       |       ^~~~
2025-12-04T12:35:04.4128093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4128187Z  1872 |       0x80,
2025-12-04T12:35:04.4128297Z       |       ^~~~
2025-12-04T12:35:04.4129538Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4129633Z  1874 |       0x80,
2025-12-04T12:35:04.4129742Z       |       ^~~~
2025-12-04T12:35:04.4130921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4131034Z  1876 |       0x80,
2025-12-04T12:35:04.4131128Z       |       ^~~~
2025-12-04T12:35:04.4132300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4132409Z  1878 |       0x80,
2025-12-04T12:35:04.4132503Z       |       ^~~~
2025-12-04T12:35:04.4133704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4133799Z  1880 |       0x80,
2025-12-04T12:35:04.4133893Z       |       ^~~~
2025-12-04T12:35:04.4135092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4135233Z  1882 |       0x80,
2025-12-04T12:35:04.4135327Z       |       ^~~~
2025-12-04T12:35:04.4136596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4136690Z  1884 |       0x80,
2025-12-04T12:35:04.4145356Z       |       ^~~~
2025-12-04T12:35:04.4146998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4147101Z  1886 |       0x80,
2025-12-04T12:35:04.4147211Z       |       ^~~~
2025-12-04T12:35:04.4148407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4148511Z  1888 |       0x80,
2025-12-04T12:35:04.4148624Z       |       ^~~~
2025-12-04T12:35:04.4149817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4149928Z  1890 |       0x80,
2025-12-04T12:35:04.4150021Z       |       ^~~~
2025-12-04T12:35:04.4151214Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4151325Z  1892 |       0x80,
2025-12-04T12:35:04.4151419Z       |       ^~~~
2025-12-04T12:35:04.4152614Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4152715Z  1894 |       0x80,
2025-12-04T12:35:04.4152811Z       |       ^~~~
2025-12-04T12:35:04.4154460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4154559Z  1896 |       0x80,
2025-12-04T12:35:04.4154662Z       |       ^~~~
2025-12-04T12:35:04.4155937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4156041Z  1898 |       0x80,
2025-12-04T12:35:04.4156149Z       |       ^~~~
2025-12-04T12:35:04.4157333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4157434Z  1900 |       0x80,
2025-12-04T12:35:04.4157545Z       |       ^~~~
2025-12-04T12:35:04.4158722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4158830Z  1902 |       0x80,
2025-12-04T12:35:04.4158931Z       |       ^~~~
2025-12-04T12:35:04.4160124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4160229Z  1904 |       0x80,
2025-12-04T12:35:04.4160323Z       |       ^~~~
2025-12-04T12:35:04.4161495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4161641Z  1906 |       0x80,
2025-12-04T12:35:04.4161732Z       |       ^~~~
2025-12-04T12:35:04.4162931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4163028Z  1908 |       0x80,
2025-12-04T12:35:04.4163184Z       |       ^~~~
2025-12-04T12:35:04.4164419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4164517Z  1910 |       0x80,
2025-12-04T12:35:04.4164611Z       |       ^~~~
2025-12-04T12:35:04.4165796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4165898Z  1912 |       0x80,
2025-12-04T12:35:04.4166005Z       |       ^~~~
2025-12-04T12:35:04.4167176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4167273Z  1914 |       0x80,
2025-12-04T12:35:04.4167393Z       |       ^~~~
2025-12-04T12:35:04.4168587Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4168700Z  1916 |       0x80,
2025-12-04T12:35:04.4168798Z       |       ^~~~
2025-12-04T12:35:04.4169968Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4170085Z  1918 |       0x80,
2025-12-04T12:35:04.4170177Z       |       ^~~~
2025-12-04T12:35:04.4171565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4171674Z  1920 |       0x80,
2025-12-04T12:35:04.4171774Z       |       ^~~~
2025-12-04T12:35:04.4173061Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4173161Z  1922 |       0x80,
2025-12-04T12:35:04.4173252Z       |       ^~~~
2025-12-04T12:35:04.4174447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4174547Z  1924 |       0x80,
2025-12-04T12:35:04.4174653Z       |       ^~~~
2025-12-04T12:35:04.4175830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4175931Z  1926 |       0x80,
2025-12-04T12:35:04.4176038Z       |       ^~~~
2025-12-04T12:35:04.4177307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4177410Z  1928 |       0x80);
2025-12-04T12:35:04.4177522Z       |       ^~~~
2025-12-04T12:35:04.4178702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4178881Z  1930 |       0x80,
2025-12-04T12:35:04.4178975Z       |       ^~~~
2025-12-04T12:35:04.4180154Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4180307Z  1932 |       0x80,
2025-12-04T12:35:04.4180400Z       |       ^~~~
2025-12-04T12:35:04.4181657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4181755Z  1934 |       0x80,
2025-12-04T12:35:04.4181853Z       |       ^~~~
2025-12-04T12:35:04.4183051Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4183153Z  1936 |       0x80,
2025-12-04T12:35:04.4183248Z       |       ^~~~
2025-12-04T12:35:04.4184443Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4184543Z  1938 |       0x80,
2025-12-04T12:35:04.4184651Z       |       ^~~~
2025-12-04T12:35:04.4185847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4185942Z  1940 |       0x80,
2025-12-04T12:35:04.4186048Z       |       ^~~~
2025-12-04T12:35:04.4187226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4187338Z  1942 |       0x80,
2025-12-04T12:35:04.4187430Z       |       ^~~~
2025-12-04T12:35:04.4188602Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4188717Z  1944 |       0x80,
2025-12-04T12:35:04.4188810Z       |       ^~~~
2025-12-04T12:35:04.4190028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4190138Z  1946 |       0x80,
2025-12-04T12:35:04.4190232Z       |       ^~~~
2025-12-04T12:35:04.4191418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4191520Z  1948 |       0x80,
2025-12-04T12:35:04.4191614Z       |       ^~~~
2025-12-04T12:35:04.4192807Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4192908Z  1950 |       0x80,
2025-12-04T12:35:04.4193001Z       |       ^~~~
2025-12-04T12:35:04.4194197Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4194292Z  1952 |       0x80,
2025-12-04T12:35:04.4194401Z       |       ^~~~
2025-12-04T12:35:04.4195576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4195711Z  1954 |       0x80,
2025-12-04T12:35:04.4195817Z       |       ^~~~
2025-12-04T12:35:04.4196997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4197139Z  1956 |       0x80,
2025-12-04T12:35:04.4197234Z       |       ^~~~
2025-12-04T12:35:04.4198452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4198561Z  1958 |       0x80,
2025-12-04T12:35:04.4198654Z       |       ^~~~
2025-12-04T12:35:04.4199840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4199940Z  1960 |       0x80,
2025-12-04T12:35:04.4200032Z       |       ^~~~
2025-12-04T12:35:04.4201213Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4201312Z  1962 |       0x80,
2025-12-04T12:35:04.4202639Z       |       ^~~~
2025-12-04T12:35:04.4203881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4203978Z  1964 |       0x80,
2025-12-04T12:35:04.4204087Z       |       ^~~~
2025-12-04T12:35:04.4205268Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4205369Z  1966 |       0x80,
2025-12-04T12:35:04.4205476Z       |       ^~~~
2025-12-04T12:35:04.4206651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4206768Z  1968 |       0x80,
2025-12-04T12:35:04.4206867Z       |       ^~~~
2025-12-04T12:35:04.4208047Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4208151Z  1970 |       0x80,
2025-12-04T12:35:04.4208246Z       |       ^~~~
2025-12-04T12:35:04.4209416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4209573Z  1972 |       0x80,
2025-12-04T12:35:04.4209666Z       |       ^~~~
2025-12-04T12:35:04.4210857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4210989Z  1974 |       0x80,
2025-12-04T12:35:04.4211089Z       |       ^~~~
2025-12-04T12:35:04.4212314Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4212409Z  1976 |       0x80,
2025-12-04T12:35:04.4212502Z       |       ^~~~
2025-12-04T12:35:04.4213690Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4213790Z  1978 |       0x80,
2025-12-04T12:35:04.4213895Z       |       ^~~~
2025-12-04T12:35:04.4215074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4215175Z  1980 |       0x80,
2025-12-04T12:35:04.4215286Z       |       ^~~~
2025-12-04T12:35:04.4216558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4216668Z  1982 |       0x80,
2025-12-04T12:35:04.4216760Z       |       ^~~~
2025-12-04T12:35:04.4217939Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4218054Z  1984 |       0x80,
2025-12-04T12:35:04.4218147Z       |       ^~~~
2025-12-04T12:35:04.4219319Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4219434Z  1986 |       0x80,
2025-12-04T12:35:04.4219595Z       |       ^~~~
2025-12-04T12:35:04.4220795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4220890Z  1988 |       0x80,
2025-12-04T12:35:04.4220982Z       |       ^~~~
2025-12-04T12:35:04.4222173Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4222275Z  1990 |       0x80,
2025-12-04T12:35:04.4222381Z       |       ^~~~
2025-12-04T12:35:04.4223559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4223667Z  1992 |       0x80,
2025-12-04T12:35:04.4223780Z       |       ^~~~
2025-12-04T12:35:04.4224960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.4225123Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.4225259Z       |                                      ^~~~~~
2025-12-04T12:35:04.4228280Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = false; T = unsigned char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.4228984Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2116:28:   required from here
2025-12-04T12:35:04.4230184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4230301Z  1866 |       0x80,
2025-12-04T12:35:04.4230398Z       |       ^~~~
2025-12-04T12:35:04.4231592Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4231704Z  1868 |       0x80,
2025-12-04T12:35:04.4231800Z       |       ^~~~
2025-12-04T12:35:04.4232994Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4233102Z  1870 |       0x80,
2025-12-04T12:35:04.4233194Z       |       ^~~~
2025-12-04T12:35:04.4234378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4234473Z  1872 |       0x80,
2025-12-04T12:35:04.4234565Z       |       ^~~~
2025-12-04T12:35:04.4235940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4236037Z  1874 |       0x80,
2025-12-04T12:35:04.4236145Z       |       ^~~~
2025-12-04T12:35:04.4237376Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4237485Z  1876 |       0x80,
2025-12-04T12:35:04.4237592Z       |       ^~~~
2025-12-04T12:35:04.4238768Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4238875Z  1878 |       0x80,
2025-12-04T12:35:04.4238970Z       |       ^~~~
2025-12-04T12:35:04.4240156Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4240263Z  1880 |       0x80,
2025-12-04T12:35:04.4240356Z       |       ^~~~
2025-12-04T12:35:04.4241536Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4241666Z  1882 |       0x80,
2025-12-04T12:35:04.4241762Z       |       ^~~~
2025-12-04T12:35:04.4242956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4243052Z  1884 |       0x80,
2025-12-04T12:35:04.4243191Z       |       ^~~~
2025-12-04T12:35:04.4244387Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4244481Z  1886 |       0x80,
2025-12-04T12:35:04.4244575Z       |       ^~~~
2025-12-04T12:35:04.4245776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4245950Z  1888 |       0x80,
2025-12-04T12:35:04.4246057Z       |       ^~~~
2025-12-04T12:35:04.4247230Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4247323Z  1890 |       0x80,
2025-12-04T12:35:04.4247438Z       |       ^~~~
2025-12-04T12:35:04.4248609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4248717Z  1892 |       0x80,
2025-12-04T12:35:04.4248811Z       |       ^~~~
2025-12-04T12:35:04.4249998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4250112Z  1894 |       0x80,
2025-12-04T12:35:04.4250204Z       |       ^~~~
2025-12-04T12:35:04.4251377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4251486Z  1896 |       0x80,
2025-12-04T12:35:04.4251586Z       |       ^~~~
2025-12-04T12:35:04.4252775Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4252870Z  1898 |       0x80,
2025-12-04T12:35:04.4252962Z       |       ^~~~
2025-12-04T12:35:04.4254201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4254303Z  1900 |       0x80,
2025-12-04T12:35:04.4254410Z       |       ^~~~
2025-12-04T12:35:04.4255587Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4255684Z  1902 |       0x80,
2025-12-04T12:35:04.4255796Z       |       ^~~~
2025-12-04T12:35:04.4257056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4257154Z  1904 |       0x80,
2025-12-04T12:35:04.4257265Z       |       ^~~~
2025-12-04T12:35:04.4258460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4258575Z  1906 |       0x80,
2025-12-04T12:35:04.4258668Z       |       ^~~~
2025-12-04T12:35:04.4259839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4260002Z  1908 |       0x80,
2025-12-04T12:35:04.4260095Z       |       ^~~~
2025-12-04T12:35:04.4261279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4261372Z  1910 |       0x80,
2025-12-04T12:35:04.4261462Z       |       ^~~~
2025-12-04T12:35:04.4262722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4262817Z  1912 |       0x80,
2025-12-04T12:35:04.4262909Z       |       ^~~~
2025-12-04T12:35:04.4264097Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4264199Z  1914 |       0x80,
2025-12-04T12:35:04.4264304Z       |       ^~~~
2025-12-04T12:35:04.4265473Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4265566Z  1916 |       0x80,
2025-12-04T12:35:04.4265670Z       |       ^~~~
2025-12-04T12:35:04.4266862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4266970Z  1918 |       0x80,
2025-12-04T12:35:04.4267064Z       |       ^~~~
2025-12-04T12:35:04.4268235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4268352Z  1920 |       0x80,
2025-12-04T12:35:04.4268446Z       |       ^~~~
2025-12-04T12:35:04.4269621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4269733Z  1922 |       0x80,
2025-12-04T12:35:04.4269828Z       |       ^~~~
2025-12-04T12:35:04.4271313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4271412Z  1924 |       0x80,
2025-12-04T12:35:04.4271508Z       |       ^~~~
2025-12-04T12:35:04.4272713Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4272816Z  1926 |       0x80,
2025-12-04T12:35:04.4272923Z       |       ^~~~
2025-12-04T12:35:04.4274098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4274196Z  1928 |       0x80);
2025-12-04T12:35:04.4274305Z       |       ^~~~
2025-12-04T12:35:04.4275493Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4275590Z  1930 |       0x80,
2025-12-04T12:35:04.4275696Z       |       ^~~~
2025-12-04T12:35:04.4276865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4277030Z  1932 |       0x80,
2025-12-04T12:35:04.4277121Z       |       ^~~~
2025-12-04T12:35:04.4278300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4278407Z  1934 |       0x80,
2025-12-04T12:35:04.4278499Z       |       ^~~~
2025-12-04T12:35:04.4279805Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4279902Z  1936 |       0x80,
2025-12-04T12:35:04.4279995Z       |       ^~~~
2025-12-04T12:35:04.4281181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4281283Z  1938 |       0x80,
2025-12-04T12:35:04.4281377Z       |       ^~~~
2025-12-04T12:35:04.4282562Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4282660Z  1940 |       0x80,
2025-12-04T12:35:04.4282781Z       |       ^~~~
2025-12-04T12:35:04.4283971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4284068Z  1942 |       0x80,
2025-12-04T12:35:04.4284178Z       |       ^~~~
2025-12-04T12:35:04.4285351Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4285467Z  1944 |       0x80,
2025-12-04T12:35:04.4285562Z       |       ^~~~
2025-12-04T12:35:04.4286747Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4286858Z  1946 |       0x80,
2025-12-04T12:35:04.4286957Z       |       ^~~~
2025-12-04T12:35:04.4288207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4288315Z  1948 |       0x80,
2025-12-04T12:35:04.4288408Z       |       ^~~~
2025-12-04T12:35:04.4289599Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4289701Z  1950 |       0x80,
2025-12-04T12:35:04.4289793Z       |       ^~~~
2025-12-04T12:35:04.4290981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4291075Z  1952 |       0x80,
2025-12-04T12:35:04.4291206Z       |       ^~~~
2025-12-04T12:35:04.4292437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4292532Z  1954 |       0x80,
2025-12-04T12:35:04.4292639Z       |       ^~~~
2025-12-04T12:35:04.4293814Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4293944Z  1956 |       0x80,
2025-12-04T12:35:04.4294051Z       |       ^~~~
2025-12-04T12:35:04.4295226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4295334Z  1958 |       0x80,
2025-12-04T12:35:04.4295432Z       |       ^~~~
2025-12-04T12:35:04.4296712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4296823Z  1960 |       0x80,
2025-12-04T12:35:04.4296917Z       |       ^~~~
2025-12-04T12:35:04.4298103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4298218Z  1962 |       0x80,
2025-12-04T12:35:04.4298312Z       |       ^~~~
2025-12-04T12:35:04.4299505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4299608Z  1964 |       0x80,
2025-12-04T12:35:04.4299701Z       |       ^~~~
2025-12-04T12:35:04.4300941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4301037Z  1966 |       0x80,
2025-12-04T12:35:04.4301143Z       |       ^~~~
2025-12-04T12:35:04.4302314Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4302416Z  1968 |       0x80,
2025-12-04T12:35:04.4302520Z       |       ^~~~
2025-12-04T12:35:04.4303701Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4303801Z  1970 |       0x80,
2025-12-04T12:35:04.4303906Z       |       ^~~~
2025-12-04T12:35:04.4305089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4305196Z  1972 |       0x80,
2025-12-04T12:35:04.4305291Z       |       ^~~~
2025-12-04T12:35:04.4306462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4306611Z  1974 |       0x80,
2025-12-04T12:35:04.4306703Z       |       ^~~~
2025-12-04T12:35:04.4307897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4308026Z  1976 |       0x80,
2025-12-04T12:35:04.4308120Z       |       ^~~~
2025-12-04T12:35:04.4309355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4309451Z  1978 |       0x80,
2025-12-04T12:35:04.4309543Z       |       ^~~~
2025-12-04T12:35:04.4310732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4310833Z  1980 |       0x80,
2025-12-04T12:35:04.4310939Z       |       ^~~~
2025-12-04T12:35:04.4312110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4312210Z  1982 |       0x80,
2025-12-04T12:35:04.4312316Z       |       ^~~~
2025-12-04T12:35:04.4313504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4313613Z  1984 |       0x80,
2025-12-04T12:35:04.4313707Z       |       ^~~~
2025-12-04T12:35:04.4314878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4314996Z  1986 |       0x80,
2025-12-04T12:35:04.4315090Z       |       ^~~~
2025-12-04T12:35:04.4316266Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4316385Z  1988 |       0x80,
2025-12-04T12:35:04.4316478Z       |       ^~~~
2025-12-04T12:35:04.4317708Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4317806Z  1990 |       0x80,
2025-12-04T12:35:04.4317898Z       |       ^~~~
2025-12-04T12:35:04.4319090Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4319190Z  1992 |       0x80,
2025-12-04T12:35:04.4319296Z       |       ^~~~
2025-12-04T12:35:04.4320478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.4320645Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.4320789Z       |                                      ^~~~~~
2025-12-04T12:35:04.4321299Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:16,
2025-12-04T12:35:04.4321667Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.4322120Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.4322595Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.4323060Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.4323702Z                  from /tmp/Qt7dz2/tmpq_68ffd4/data/aotinductor/model/cad56iehvyrgd23725fkkazyktxy2vkmfdx2f6hgjyqe5hsp2q7e.wrapper.cpp:656:
2025-12-04T12:35:04.4325255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = signed char; int64_t = long int]’:
2025-12-04T12:35:04.4325836Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:696:31:   required from here
2025-12-04T12:35:04.4327042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4327171Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4327282Z       |       ^~~~
2025-12-04T12:35:04.4328470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4328600Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4328725Z       |             ^~~~
2025-12-04T12:35:04.4329916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4330047Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4330153Z       |                   ^~~~
2025-12-04T12:35:04.4331344Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4331477Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4331583Z       |                         ^~~~
2025-12-04T12:35:04.4332818Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4332958Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4333054Z       |       ^~~~
2025-12-04T12:35:04.4334256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4334378Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4334478Z       |             ^~~~
2025-12-04T12:35:04.4335680Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4335795Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4335919Z       |                   ^~~~
2025-12-04T12:35:04.4337181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4337297Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4337416Z       |                         ^~~~
2025-12-04T12:35:04.4338605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4338778Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4338874Z       |       ^~~~
2025-12-04T12:35:04.4340055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4340221Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4340319Z       |             ^~~~
2025-12-04T12:35:04.4341567Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4341693Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4341792Z       |                   ^~~~
2025-12-04T12:35:04.4342994Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4343116Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4343219Z       |                         ^~~~
2025-12-04T12:35:04.4344416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4344555Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4344660Z       |       ^~~~
2025-12-04T12:35:04.4345839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4345953Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4346072Z       |             ^~~~
2025-12-04T12:35:04.4347251Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4347381Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4347481Z       |                   ^~~~
2025-12-04T12:35:04.4348712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4348846Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4348947Z       |                         ^~~~
2025-12-04T12:35:04.4350124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4350258Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4350350Z       |       ^~~~
2025-12-04T12:35:04.4351542Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4351654Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4351757Z       |             ^~~~
2025-12-04T12:35:04.4352967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4353080Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4353191Z       |                   ^~~~
2025-12-04T12:35:04.4354373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4354529Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4354642Z       |                         ^~~~
2025-12-04T12:35:04.4355820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4355981Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4356082Z       |       ^~~~
2025-12-04T12:35:04.4357304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4357432Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4357529Z       |             ^~~~
2025-12-04T12:35:04.4358718Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4358851Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4358951Z       |                   ^~~~
2025-12-04T12:35:04.4360149Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4360287Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4360390Z       |                         ^~~~
2025-12-04T12:35:04.4361585Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4361696Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4361807Z       |       ^~~~
2025-12-04T12:35:04.4362985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4363097Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4363204Z       |             ^~~~
2025-12-04T12:35:04.4364425Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4364560Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4364658Z       |                   ^~~~
2025-12-04T12:35:04.4365840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4365971Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4366073Z       |                         ^~~~
2025-12-04T12:35:04.4367249Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4367376Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4367478Z       |       ^~~~
2025-12-04T12:35:04.4368684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4368799Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4368895Z       |             ^~~~
2025-12-04T12:35:04.4370094Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4370245Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4370359Z       |                   ^~~~
2025-12-04T12:35:04.4371742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4371947Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4372073Z       |                         ^~~~
2025-12-04T12:35:04.4373322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4373453Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4373547Z       |       ^~~~
2025-12-04T12:35:04.4374732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4374865Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4374962Z       |             ^~~~
2025-12-04T12:35:04.4376138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4376340Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4376450Z       |                   ^~~~
2025-12-04T12:35:04.4377653Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4377771Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4377880Z       |                         ^~~~
2025-12-04T12:35:04.4379070Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4379182Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4379291Z       |       ^~~~
2025-12-04T12:35:04.4380533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4380654Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4380762Z       |             ^~~~
2025-12-04T12:35:04.4381943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4382061Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4382175Z       |                   ^~~~
2025-12-04T12:35:04.4383354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4383480Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4383590Z       |                         ^~~~
2025-12-04T12:35:04.4384780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4384908Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4385002Z       |       ^~~~
2025-12-04T12:35:04.4386197Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4386361Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4386456Z       |             ^~~~
2025-12-04T12:35:04.4388152Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4388322Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4388448Z       |                   ^~~~
2025-12-04T12:35:04.4389682Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4389797Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4389917Z       |                         ^~~~
2025-12-04T12:35:04.4391101Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4391219Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4391324Z       |       ^~~~
2025-12-04T12:35:04.4392505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4392680Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4392779Z       |             ^~~~
2025-12-04T12:35:04.4393962Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4394087Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4394192Z       |                   ^~~~
2025-12-04T12:35:04.4395384Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4395497Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4395599Z       |                         ^~~~
2025-12-04T12:35:04.4397098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = unsigned char; int64_t = long int]’:
2025-12-04T12:35:04.4397679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:933:31:   required from here
2025-12-04T12:35:04.4398886Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4399040Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4399136Z       |       ^~~~
2025-12-04T12:35:04.4400457Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4400674Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4400790Z       |             ^~~~
2025-12-04T12:35:04.4402033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4402149Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4402272Z       |                   ^~~~
2025-12-04T12:35:04.4403461Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4403599Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4403705Z       |                         ^~~~
2025-12-04T12:35:04.4404882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4405028Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4405123Z       |       ^~~~
2025-12-04T12:35:04.4406304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4406432Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4406530Z       |             ^~~~
2025-12-04T12:35:04.4407733Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4407845Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4407947Z       |                   ^~~~
2025-12-04T12:35:04.4409202Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4409323Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4409438Z       |                         ^~~~
2025-12-04T12:35:04.4410619Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4410736Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4410843Z       |       ^~~~
2025-12-04T12:35:04.4412024Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4412148Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4412252Z       |             ^~~~
2025-12-04T12:35:04.4413446Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4413572Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4413674Z       |                   ^~~~
2025-12-04T12:35:04.4414857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4415035Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4415138Z       |                         ^~~~
2025-12-04T12:35:04.4416398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4416565Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4416660Z       |       ^~~~
2025-12-04T12:35:04.4417918Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4418033Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4418147Z       |             ^~~~
2025-12-04T12:35:04.4419340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4419464Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4419581Z       |                   ^~~~
2025-12-04T12:35:04.4420772Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4420903Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4421029Z       |                         ^~~~
2025-12-04T12:35:04.4422208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4422338Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4422440Z       |       ^~~~
2025-12-04T12:35:04.4423624Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4423755Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4423855Z       |             ^~~~
2025-12-04T12:35:04.4425112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4425238Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4425345Z       |                   ^~~~
2025-12-04T12:35:04.4426547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4426676Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4426795Z       |                         ^~~~
2025-12-04T12:35:04.4427983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4428098Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4428217Z       |       ^~~~
2025-12-04T12:35:04.4429410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4429524Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4429644Z       |             ^~~~
2025-12-04T12:35:04.4430827Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4431007Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4431106Z       |                   ^~~~
2025-12-04T12:35:04.4432645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4432830Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4432939Z       |                         ^~~~
2025-12-04T12:35:04.4434187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4434307Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4434399Z       |       ^~~~
2025-12-04T12:35:04.4435604Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4435726Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4435840Z       |             ^~~~
2025-12-04T12:35:04.4437026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4437153Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4437277Z       |                   ^~~~
2025-12-04T12:35:04.4438464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4438576Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4438703Z       |                         ^~~~
2025-12-04T12:35:04.4439886Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4440013Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4440106Z       |       ^~~~
2025-12-04T12:35:04.4441334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4441467Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4441563Z       |             ^~~~
2025-12-04T12:35:04.4442758Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4442877Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4442978Z       |                   ^~~~
2025-12-04T12:35:04.4444176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4444290Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4444398Z       |                         ^~~~
2025-12-04T12:35:04.4445608Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4445720Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4445831Z       |       ^~~~
2025-12-04T12:35:04.4447015Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4447166Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4447278Z       |             ^~~~
2025-12-04T12:35:04.4448462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4448625Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4448732Z       |                   ^~~~
2025-12-04T12:35:04.4449953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4450083Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4450184Z       |                         ^~~~
2025-12-04T12:35:04.4451387Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4451507Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4451598Z       |       ^~~~
2025-12-04T12:35:04.4452793Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4452924Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4453022Z       |             ^~~~
2025-12-04T12:35:04.4454215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4454329Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4454451Z       |                   ^~~~
2025-12-04T12:35:04.4455634Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4455746Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4455860Z       |                         ^~~~
2025-12-04T12:35:04.4457174Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4457308Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4457404Z       |       ^~~~
2025-12-04T12:35:04.4458598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4458736Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4458837Z       |             ^~~~
2025-12-04T12:35:04.4460038Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4460152Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4460260Z       |                   ^~~~
2025-12-04T12:35:04.4461476Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4461590Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4461692Z       |                         ^~~~
2025-12-04T12:35:04.4462888Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4463063Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4463171Z       |       ^~~~
2025-12-04T12:35:04.4464358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4464506Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4464624Z       |             ^~~~
2025-12-04T12:35:04.4465848Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4465974Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4466077Z       |                   ^~~~
2025-12-04T12:35:04.4467257Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4467394Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4467497Z       |                         ^~~~
2025-12-04T12:35:04.4467604Z PASSED [8.9354s] [ 27%]
2025-12-04T12:35:04.4468335Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_after_package SKIPPED [0.0003s] (Test is only supported on CUDA 12.6+) [ 28%]
2025-12-04T12:35:04.4469096Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_after_package_multi_arch SKIPPED [0.0002s] (Test is only supported on CUDA 12.8+) [ 29%]
2025-12-04T12:35:04.4469831Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_after_package_static SKIPPED [0.0003s] (Test is only supported on CUDA 12.6+) [ 30%]
2025-12-04T12:35:04.4471234Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_standalone_cos W1204 12:28:03.440000 140836 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True.
2025-12-04T12:35:04.4471420Z -- The CXX compiler identification is GNU 11.4.0
2025-12-04T12:35:04.4471547Z -- Detecting CXX compiler ABI info
2025-12-04T12:35:04.4471689Z -- Detecting CXX compiler ABI info - done
2025-12-04T12:35:04.4471925Z -- Check for working CXX compiler: /opt/cache/bin/c++ - skipped
2025-12-04T12:35:04.4472155Z -- Detecting CXX compile features
2025-12-04T12:35:04.4472295Z -- Detecting CXX compile features - done
2025-12-04T12:35:04.4472481Z -- Found CUDA: /usr/local/cuda (found version "12.4")
2025-12-04T12:35:04.4472798Z -- The CUDA compiler identification is NVIDIA 12.4.131 with host compiler GNU 11.4.0
2025-12-04T12:35:04.4472938Z -- Detecting CUDA compiler ABI info
2025-12-04T12:35:04.4473077Z -- Detecting CUDA compiler ABI info - done
2025-12-04T12:35:04.4473322Z -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
2025-12-04T12:35:04.4473460Z -- Detecting CUDA compile features
2025-12-04T12:35:04.4473596Z -- Detecting CUDA compile features - done
2025-12-04T12:35:04.4473851Z -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.4.131")
2025-12-04T12:35:04.4474013Z -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
2025-12-04T12:35:04.4474189Z -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
2025-12-04T12:35:04.4474299Z -- Found Threads: TRUE
2025-12-04T12:35:04.4474437Z -- PyTorch: CUDA detected: 12.4
2025-12-04T12:35:04.4474600Z -- PyTorch: CUDA nvcc is: /usr/local/cuda/bin/nvcc
2025-12-04T12:35:04.4474777Z -- PyTorch: CUDA toolkit directory: /usr/local/cuda
2025-12-04T12:35:04.4474899Z -- PyTorch: Header version is: 12.4
2025-12-04T12:35:04.4475333Z -- Found Python: /opt/conda/envs/py_3.10/bin/python3.10 (found version "3.10.14") found components: Interpreter
2025-12-04T12:35:04.4475901Z CMake Warning at /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:149 (message):
2025-12-04T12:35:04.4476100Z   Failed to compute shorthash for libnvrtc.so
2025-12-04T12:35:04.4476223Z Call Stack (most recent call first):
2025-12-04T12:35:04.4476709Z   /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86 (include)
2025-12-04T12:35:04.4477206Z   /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
2025-12-04T12:35:04.4477378Z   CMakeLists.txt:11 (find_package)
2025-12-04T12:35:04.4477391Z 
2025-12-04T12:35:04.4477437Z 
2025-12-04T12:35:04.4477624Z -- USE_CUDNN is set to 0. Compiling without cuDNN support
2025-12-04T12:35:04.4477860Z -- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
2025-12-04T12:35:04.4478052Z -- USE_CUDSS is set to 0. Compiling without cuDSS support
2025-12-04T12:35:04.4478241Z -- USE_CUFILE is set to 0. Compiling without cuFile support
2025-12-04T12:35:04.4478813Z CMake Warning at /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:332 (message):
2025-12-04T12:35:04.4479082Z   pytorch is not compatible with `CMAKE_CUDA_ARCHITECTURES` and will ignore
2025-12-04T12:35:04.4479288Z   its value.  Please configure `TORCH_CUDA_ARCH_LIST` instead.
2025-12-04T12:35:04.4479426Z Call Stack (most recent call first):
2025-12-04T12:35:04.4479895Z   /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86 (include)
2025-12-04T12:35:04.4480389Z   /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
2025-12-04T12:35:04.4480525Z   CMakeLists.txt:11 (find_package)
2025-12-04T12:35:04.4480531Z 
2025-12-04T12:35:04.4480536Z 
2025-12-04T12:35:04.4480758Z -- Added CUDA NVCC flags for: -gencode;arch=compute_75,code=sm_75
2025-12-04T12:35:04.4481308Z CMake Warning at /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
2025-12-04T12:35:04.4481484Z   static library kineto_LIBRARY-NOTFOUND not found.
2025-12-04T12:35:04.4481607Z Call Stack (most recent call first):
2025-12-04T12:35:04.4482189Z   /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:125 (append_torchlib_if_found)
2025-12-04T12:35:04.4482315Z   CMakeLists.txt:11 (find_package)
2025-12-04T12:35:04.4482320Z 
2025-12-04T12:35:04.4482327Z 
2025-12-04T12:35:04.4482679Z -- Found Torch: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libtorch.so
2025-12-04T12:35:04.4482833Z -- Configuring done (2.9s)
2025-12-04T12:35:04.4482950Z -- Generating done (0.0s)
2025-12-04T12:35:04.4483321Z -- Build files have been written to: /tmp/tmpwdu6dysi/cos.wrapper/data/aotinductor/model/build
2025-12-04T12:35:04.4483535Z [ 50%] Building CXX object CMakeFiles/cos.dir/cos.wrapper.cpp.o
2025-12-04T12:35:04.4483673Z [100%] Linking CXX static library libcos.a
2025-12-04T12:35:04.4483794Z [100%] Built target cos
2025-12-04T12:35:04.4483904Z PASSED [9.5524s] [ 31%]
2025-12-04T12:35:04.4484617Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_with_exporter SKIPPED [0.0003s] (Test is only supported on CUDA 12.6+) [ 32%]
2025-12-04T12:35:04.4485353Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_with_exporter_weights SKIPPED [0.0002s] (Test is only supported on CUDA 12.6+) [ 34%]
2025-12-04T12:35:04.4486577Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_deepcopy_compiled_model W1204 12:28:18.162000 140836 site-packages/torch/export/pt2_archive/_package.py:763] AOTICompiledModel deepcopy warning: AOTICompiledModel.loader is not deepcopied.
2025-12-04T12:35:04.4486699Z PASSED [5.1877s] [ 35%]
2025-12-04T12:35:04.4487193Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_duplicate_calls PASSED [21.6305s] [ 36%]
2025-12-04T12:35:04.4488163Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_linear In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_float.h:12,
2025-12-04T12:35:04.4488632Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:11,
2025-12-04T12:35:04.4489000Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.4489490Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.4489928Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.4490404Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.4491065Z                  from /tmp/Ld3r6p/tmpxth9d7tz/data/aotinductor/model/cuzep6c3r5og2e4er75yunzp3ebrphkoo5sxbhxof3er5uux4ih3.wrapper.cpp:750:
2025-12-04T12:35:04.4491668Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/sleef.h:192:10: warning: ISO C++ prohibits anonymous structs [-Wpedantic]
2025-12-04T12:35:04.4491791Z   192 |   struct {
2025-12-04T12:35:04.4491885Z       |          ^
2025-12-04T12:35:04.4492401Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15,
2025-12-04T12:35:04.4492773Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.4493216Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.4493634Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.4494099Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.4494776Z                  from /tmp/Ld3r6p/tmpxth9d7tz/data/aotinductor/model/cuzep6c3r5og2e4er75yunzp3ebrphkoo5sxbhxof3er5uux4ih3.wrapper.cpp:750:
2025-12-04T12:35:04.4497195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<short int>&, const at::vec::CPU_CAPABILITY::Vectorized<short int>&, const at::vec::CPU_CAPABILITY::Vectorized<short int>&)’:
2025-12-04T12:35:04.4498417Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:544:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.4498574Z   544 |     auto msb_one = _mm512_set1_epi16(0xFFFF);
2025-12-04T12:35:04.4498692Z       |                                      ^~~~~~
2025-12-04T12:35:04.4499220Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15,
2025-12-04T12:35:04.4499589Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.4500052Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.4500467Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.4500939Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.4501617Z                  from /tmp/Ld3r6p/tmpxth9d7tz/data/aotinductor/model/cuzep6c3r5og2e4er75yunzp3ebrphkoo5sxbhxof3er5uux4ih3.wrapper.cpp:750:
2025-12-04T12:35:04.4503253Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.4504480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:697:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.4504724Z   697 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.4504890Z       |                                                      ^~~~~~
2025-12-04T12:35:04.4506529Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.4507703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:701:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.4507934Z   701 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.4508064Z       |                                                      ^~~~~~
2025-12-04T12:35:04.4509706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.4510880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:705:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.4511092Z   705 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.4511241Z       |                                                      ^~~~~~
2025-12-04T12:35:04.4512855Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.4514105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:709:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.4514309Z   709 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.4514437Z       |                                                      ^~~~~~
2025-12-04T12:35:04.4516062Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator>(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.4517229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:713:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.4517454Z   713 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.4517593Z       |                                                      ^~~~~~
2025-12-04T12:35:04.4519652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator>=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.4520830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:717:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.4521094Z   717 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.4521234Z       |                                                      ^~~~~~
2025-12-04T12:35:04.4523552Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&, const at::vec::CPU_CAPABILITY::Vectorized<signed char>&, const at::vec::CPU_CAPABILITY::Vectorized<signed char>&)’:
2025-12-04T12:35:04.4524804Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1153:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4524963Z  1153 |     auto msb_one = _mm512_set1_epi8(0xFF);
2025-12-04T12:35:04.4525097Z       |                                     ^~~~
2025-12-04T12:35:04.4526752Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.4527968Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1166:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4528186Z  1166 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.4528312Z       |                                                     ^~~~
2025-12-04T12:35:04.4529972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.4531215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1170:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4531480Z  1170 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.4531613Z       |                                                     ^~~~
2025-12-04T12:35:04.4533272Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.4534523Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1174:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4534726Z  1174 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.4534863Z       |                                                     ^~~~
2025-12-04T12:35:04.4536713Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.4537937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1178:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4538149Z  1178 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.4538273Z       |                                                     ^~~~
2025-12-04T12:35:04.4540637Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&, const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&, const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&)’:
2025-12-04T12:35:04.4541838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1207:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4542000Z  1207 |     auto msb_one = _mm512_set1_epi8(0xFF);
2025-12-04T12:35:04.4542116Z       |                                     ^~~~
2025-12-04T12:35:04.4543806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.4545139Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1220:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4545403Z  1220 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.4545546Z       |                                                     ^~~~
2025-12-04T12:35:04.4547265Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.4548480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1224:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4548685Z  1224 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.4548817Z       |                                                     ^~~~
2025-12-04T12:35:04.4550530Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.4551721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1228:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4551980Z  1228 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.4552102Z       |                                                     ^~~~
2025-12-04T12:35:04.4553815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.4555083Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1232:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4555286Z  1232 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.4555424Z       |                                                     ^~~~
2025-12-04T12:35:04.4557810Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = true; T = signed char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.4558421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2074:27:   required from here
2025-12-04T12:35:04.4559615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4559730Z  1866 |       0x80,
2025-12-04T12:35:04.4559826Z       |       ^~~~
2025-12-04T12:35:04.4561014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4561121Z  1868 |       0x80,
2025-12-04T12:35:04.4561214Z       |       ^~~~
2025-12-04T12:35:04.4562391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4562548Z  1870 |       0x80,
2025-12-04T12:35:04.4562650Z       |       ^~~~
2025-12-04T12:35:04.4563842Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4563938Z  1872 |       0x80,
2025-12-04T12:35:04.4564033Z       |       ^~~~
2025-12-04T12:35:04.4565237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4565332Z  1874 |       0x80,
2025-12-04T12:35:04.4565425Z       |       ^~~~
2025-12-04T12:35:04.4566607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4566721Z  1876 |       0x80,
2025-12-04T12:35:04.4566826Z       |       ^~~~
2025-12-04T12:35:04.4567999Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4568093Z  1878 |       0x80,
2025-12-04T12:35:04.4568200Z       |       ^~~~
2025-12-04T12:35:04.4569418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4569528Z  1880 |       0x80,
2025-12-04T12:35:04.4569621Z       |       ^~~~
2025-12-04T12:35:04.4570801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4571224Z  1882 |       0x80,
2025-12-04T12:35:04.4571321Z       |       ^~~~
2025-12-04T12:35:04.4572509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4572620Z  1884 |       0x80,
2025-12-04T12:35:04.4572717Z       |       ^~~~
2025-12-04T12:35:04.4573913Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4574012Z  1886 |       0x80,
2025-12-04T12:35:04.4574104Z       |       ^~~~
2025-12-04T12:35:04.4575302Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4575409Z  1888 |       0x80,
2025-12-04T12:35:04.4575521Z       |       ^~~~
2025-12-04T12:35:04.4576776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4576871Z  1890 |       0x80,
2025-12-04T12:35:04.4576986Z       |       ^~~~
2025-12-04T12:35:04.4578166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4578275Z  1892 |       0x80,
2025-12-04T12:35:04.4578368Z       |       ^~~~
2025-12-04T12:35:04.4579598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4579720Z  1894 |       0x80,
2025-12-04T12:35:04.4579812Z       |       ^~~~
2025-12-04T12:35:04.4580991Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4581096Z  1896 |       0x80,
2025-12-04T12:35:04.4581195Z       |       ^~~~
2025-12-04T12:35:04.4582376Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4582470Z  1898 |       0x80,
2025-12-04T12:35:04.4582562Z       |       ^~~~
2025-12-04T12:35:04.4583754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4583859Z  1900 |       0x80,
2025-12-04T12:35:04.4583951Z       |       ^~~~
2025-12-04T12:35:04.4585138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4585231Z  1902 |       0x80,
2025-12-04T12:35:04.4585387Z       |       ^~~~
2025-12-04T12:35:04.4586565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4586660Z  1904 |       0x80,
2025-12-04T12:35:04.4586765Z       |       ^~~~
2025-12-04T12:35:04.4587984Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4588130Z  1906 |       0x80,
2025-12-04T12:35:04.4588223Z       |       ^~~~
2025-12-04T12:35:04.4589398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4589509Z  1908 |       0x80,
2025-12-04T12:35:04.4589607Z       |       ^~~~
2025-12-04T12:35:04.4590774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4590882Z  1910 |       0x80,
2025-12-04T12:35:04.4590974Z       |       ^~~~
2025-12-04T12:35:04.4592169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4592270Z  1912 |       0x80,
2025-12-04T12:35:04.4592366Z       |       ^~~~
2025-12-04T12:35:04.4593558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4593654Z  1914 |       0x80,
2025-12-04T12:35:04.4593767Z       |       ^~~~
2025-12-04T12:35:04.4594943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4595039Z  1916 |       0x80,
2025-12-04T12:35:04.4595147Z       |       ^~~~
2025-12-04T12:35:04.4596374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4596477Z  1918 |       0x80,
2025-12-04T12:35:04.4596584Z       |       ^~~~
2025-12-04T12:35:04.4597767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4597883Z  1920 |       0x80,
2025-12-04T12:35:04.4597977Z       |       ^~~~
2025-12-04T12:35:04.4599157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4599264Z  1922 |       0x80,
2025-12-04T12:35:04.4599356Z       |       ^~~~
2025-12-04T12:35:04.4600557Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4600652Z  1924 |       0x80,
2025-12-04T12:35:04.4600743Z       |       ^~~~
2025-12-04T12:35:04.4601929Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4602069Z  1926 |       0x80,
2025-12-04T12:35:04.4602161Z       |       ^~~~
2025-12-04T12:35:04.4603362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4603458Z  1928 |       0x80);
2025-12-04T12:35:04.4603565Z       |       ^~~~
2025-12-04T12:35:04.4604824Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4604919Z  1930 |       0x80,
2025-12-04T12:35:04.4605026Z       |       ^~~~
2025-12-04T12:35:04.4606203Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4606317Z  1932 |       0x80,
2025-12-04T12:35:04.4606410Z       |       ^~~~
2025-12-04T12:35:04.4607590Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4607700Z  1934 |       0x80,
2025-12-04T12:35:04.4607794Z       |       ^~~~
2025-12-04T12:35:04.4608993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4609103Z  1936 |       0x80,
2025-12-04T12:35:04.4609201Z       |       ^~~~
2025-12-04T12:35:04.4610393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4610496Z  1938 |       0x80,
2025-12-04T12:35:04.4610588Z       |       ^~~~
2025-12-04T12:35:04.4611782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4611879Z  1940 |       0x80,
2025-12-04T12:35:04.4611974Z       |       ^~~~
2025-12-04T12:35:04.4613229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4613326Z  1942 |       0x80,
2025-12-04T12:35:04.4613436Z       |       ^~~~
2025-12-04T12:35:04.4614613Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4614715Z  1944 |       0x80,
2025-12-04T12:35:04.4614827Z       |       ^~~~
2025-12-04T12:35:04.4616002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4616111Z  1946 |       0x80,
2025-12-04T12:35:04.4616213Z       |       ^~~~
2025-12-04T12:35:04.4617481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4617594Z  1948 |       0x80,
2025-12-04T12:35:04.4617690Z       |       ^~~~
2025-12-04T12:35:04.4618866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4619067Z  1950 |       0x80,
2025-12-04T12:35:04.4619157Z       |       ^~~~
2025-12-04T12:35:04.4620345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4620443Z  1952 |       0x80,
2025-12-04T12:35:04.4620584Z       |       ^~~~
2025-12-04T12:35:04.4621827Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4621922Z  1954 |       0x80,
2025-12-04T12:35:04.4622029Z       |       ^~~~
2025-12-04T12:35:04.4623205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4623311Z  1956 |       0x80,
2025-12-04T12:35:04.4623416Z       |       ^~~~
2025-12-04T12:35:04.4624592Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4624692Z  1958 |       0x80,
2025-12-04T12:35:04.4624806Z       |       ^~~~
2025-12-04T12:35:04.4625999Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4626111Z  1960 |       0x80,
2025-12-04T12:35:04.4626203Z       |       ^~~~
2025-12-04T12:35:04.4627374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4627487Z  1962 |       0x80,
2025-12-04T12:35:04.4627578Z       |       ^~~~
2025-12-04T12:35:04.4628764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4628858Z  1964 |       0x80,
2025-12-04T12:35:04.4628956Z       |       ^~~~
2025-12-04T12:35:04.4630185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4630281Z  1966 |       0x80,
2025-12-04T12:35:04.4630373Z       |       ^~~~
2025-12-04T12:35:04.4631567Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4631670Z  1968 |       0x80,
2025-12-04T12:35:04.4631779Z       |       ^~~~
2025-12-04T12:35:04.4632953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4633047Z  1970 |       0x80,
2025-12-04T12:35:04.4633192Z       |       ^~~~
2025-12-04T12:35:04.4634423Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4634534Z  1972 |       0x80,
2025-12-04T12:35:04.4634627Z       |       ^~~~
2025-12-04T12:35:04.4635802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4635945Z  1974 |       0x80,
2025-12-04T12:35:04.4636036Z       |       ^~~~
2025-12-04T12:35:04.4637217Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4637333Z  1976 |       0x80,
2025-12-04T12:35:04.4637425Z       |       ^~~~
2025-12-04T12:35:04.4638626Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4638721Z  1978 |       0x80,
2025-12-04T12:35:04.4638814Z       |       ^~~~
2025-12-04T12:35:04.4640005Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4640108Z  1980 |       0x80,
2025-12-04T12:35:04.4640217Z       |       ^~~~
2025-12-04T12:35:04.4641386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4641487Z  1982 |       0x80,
2025-12-04T12:35:04.4641591Z       |       ^~~~
2025-12-04T12:35:04.4642808Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4642905Z  1984 |       0x80,
2025-12-04T12:35:04.4643012Z       |       ^~~~
2025-12-04T12:35:04.4644179Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4644294Z  1986 |       0x80,
2025-12-04T12:35:04.4644387Z       |       ^~~~
2025-12-04T12:35:04.4645556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4645672Z  1988 |       0x80,
2025-12-04T12:35:04.4645764Z       |       ^~~~
2025-12-04T12:35:04.4646959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4647054Z  1990 |       0x80,
2025-12-04T12:35:04.4647146Z       |       ^~~~
2025-12-04T12:35:04.4648328Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4648458Z  1992 |       0x80,
2025-12-04T12:35:04.4648552Z       |       ^~~~
2025-12-04T12:35:04.4649752Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.4649949Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.4650088Z       |                                      ^~~~~~
2025-12-04T12:35:04.4652555Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = true; T = unsigned char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.4653161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2081:27:   required from here
2025-12-04T12:35:04.4654356Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4654460Z  1866 |       0x80,
2025-12-04T12:35:04.4654577Z       |       ^~~~
2025-12-04T12:35:04.4655764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4655876Z  1868 |       0x80,
2025-12-04T12:35:04.4655971Z       |       ^~~~
2025-12-04T12:35:04.4657230Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4657350Z  1870 |       0x80,
2025-12-04T12:35:04.4657444Z       |       ^~~~
2025-12-04T12:35:04.4658626Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4658744Z  1872 |       0x80,
2025-12-04T12:35:04.4658884Z       |       ^~~~
2025-12-04T12:35:04.4660091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4660185Z  1874 |       0x80,
2025-12-04T12:35:04.4660277Z       |       ^~~~
2025-12-04T12:35:04.4661469Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4661569Z  1876 |       0x80,
2025-12-04T12:35:04.4661662Z       |       ^~~~
2025-12-04T12:35:04.4662849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4662949Z  1878 |       0x80,
2025-12-04T12:35:04.4663060Z       |       ^~~~
2025-12-04T12:35:04.4664239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4664333Z  1880 |       0x80,
2025-12-04T12:35:04.4664442Z       |       ^~~~
2025-12-04T12:35:04.4665614Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4665761Z  1882 |       0x80,
2025-12-04T12:35:04.4665853Z       |       ^~~~
2025-12-04T12:35:04.4667032Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4667179Z  1884 |       0x80,
2025-12-04T12:35:04.4667279Z       |       ^~~~
2025-12-04T12:35:04.4668490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4668598Z  1886 |       0x80,
2025-12-04T12:35:04.4668690Z       |       ^~~~
2025-12-04T12:35:04.4669879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4669979Z  1888 |       0x80,
2025-12-04T12:35:04.4670072Z       |       ^~~~
2025-12-04T12:35:04.4671488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4671591Z  1890 |       0x80,
2025-12-04T12:35:04.4671712Z       |       ^~~~
2025-12-04T12:35:04.4672889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4672984Z  1892 |       0x80,
2025-12-04T12:35:04.4673094Z       |       ^~~~
2025-12-04T12:35:04.4674273Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4674373Z  1894 |       0x80,
2025-12-04T12:35:04.4674480Z       |       ^~~~
2025-12-04T12:35:04.4675647Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4675853Z  1896 |       0x80,
2025-12-04T12:35:04.4675954Z       |       ^~~~
2025-12-04T12:35:04.4677131Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4677239Z  1898 |       0x80,
2025-12-04T12:35:04.4677332Z       |       ^~~~
2025-12-04T12:35:04.4679083Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4679188Z  1900 |       0x80,
2025-12-04T12:35:04.4679281Z       |       ^~~~
2025-12-04T12:35:04.4680481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4680589Z  1902 |       0x80,
2025-12-04T12:35:04.4680690Z       |       ^~~~
2025-12-04T12:35:04.4681877Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4681971Z  1904 |       0x80,
2025-12-04T12:35:04.4682080Z       |       ^~~~
2025-12-04T12:35:04.4683257Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4683431Z  1906 |       0x80,
2025-12-04T12:35:04.4683546Z       |       ^~~~
2025-12-04T12:35:04.4684728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4684891Z  1908 |       0x80,
2025-12-04T12:35:04.4685032Z       |       ^~~~
2025-12-04T12:35:04.4686217Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4686325Z  1910 |       0x80,
2025-12-04T12:35:04.4686419Z       |       ^~~~
2025-12-04T12:35:04.4687597Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4687703Z  1912 |       0x80,
2025-12-04T12:35:04.4687796Z       |       ^~~~
2025-12-04T12:35:04.4688979Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4689085Z  1914 |       0x80,
2025-12-04T12:35:04.4689184Z       |       ^~~~
2025-12-04T12:35:04.4690374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4690469Z  1916 |       0x80,
2025-12-04T12:35:04.4690561Z       |       ^~~~
2025-12-04T12:35:04.4691754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4691849Z  1918 |       0x80,
2025-12-04T12:35:04.4691955Z       |       ^~~~
2025-12-04T12:35:04.4693128Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4693271Z  1920 |       0x80,
2025-12-04T12:35:04.4693379Z       |       ^~~~
2025-12-04T12:35:04.4694551Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4694678Z  1922 |       0x80,
2025-12-04T12:35:04.4694773Z       |       ^~~~
2025-12-04T12:35:04.4695953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4696062Z  1924 |       0x80,
2025-12-04T12:35:04.4696159Z       |       ^~~~
2025-12-04T12:35:04.4697426Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4697535Z  1926 |       0x80,
2025-12-04T12:35:04.4697631Z       |       ^~~~
2025-12-04T12:35:04.4698829Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4698930Z  1928 |       0x80);
2025-12-04T12:35:04.4699024Z       |       ^~~~
2025-12-04T12:35:04.4700281Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4700377Z  1930 |       0x80,
2025-12-04T12:35:04.4700486Z       |       ^~~~
2025-12-04T12:35:04.4701667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4701837Z  1932 |       0x80,
2025-12-04T12:35:04.4701946Z       |       ^~~~
2025-12-04T12:35:04.4703125Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4703235Z  1934 |       0x80,
2025-12-04T12:35:04.4703336Z       |       ^~~~
2025-12-04T12:35:04.4704519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4704630Z  1936 |       0x80,
2025-12-04T12:35:04.4704730Z       |       ^~~~
2025-12-04T12:35:04.4705917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4706040Z  1938 |       0x80,
2025-12-04T12:35:04.4706136Z       |       ^~~~
2025-12-04T12:35:04.4707324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4707422Z  1940 |       0x80,
2025-12-04T12:35:04.4707523Z       |       ^~~~
2025-12-04T12:35:04.4708710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4708810Z  1942 |       0x80,
2025-12-04T12:35:04.4708903Z       |       ^~~~
2025-12-04T12:35:04.4710140Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4710242Z  1944 |       0x80,
2025-12-04T12:35:04.4710353Z       |       ^~~~
2025-12-04T12:35:04.4711528Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4711625Z  1946 |       0x80,
2025-12-04T12:35:04.4711741Z       |       ^~~~
2025-12-04T12:35:04.4712916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4713029Z  1948 |       0x80,
2025-12-04T12:35:04.4713125Z       |       ^~~~
2025-12-04T12:35:04.4714309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4714429Z  1950 |       0x80,
2025-12-04T12:35:04.4714523Z       |       ^~~~
2025-12-04T12:35:04.4715698Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4715807Z  1952 |       0x80,
2025-12-04T12:35:04.4715936Z       |       ^~~~
2025-12-04T12:35:04.4717131Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4717225Z  1954 |       0x80,
2025-12-04T12:35:04.4717317Z       |       ^~~~
2025-12-04T12:35:04.4718549Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4718674Z  1956 |       0x80,
2025-12-04T12:35:04.4718779Z       |       ^~~~
2025-12-04T12:35:04.4719952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4720052Z  1958 |       0x80,
2025-12-04T12:35:04.4720164Z       |       ^~~~
2025-12-04T12:35:04.4721332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4721425Z  1960 |       0x80,
2025-12-04T12:35:04.4721531Z       |       ^~~~
2025-12-04T12:35:04.4722714Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4722826Z  1962 |       0x80,
2025-12-04T12:35:04.4722919Z       |       ^~~~
2025-12-04T12:35:04.4724086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4724199Z  1964 |       0x80,
2025-12-04T12:35:04.4724289Z       |       ^~~~
2025-12-04T12:35:04.4725474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4725570Z  1966 |       0x80,
2025-12-04T12:35:04.4725663Z       |       ^~~~
2025-12-04T12:35:04.4726921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4727021Z  1968 |       0x80,
2025-12-04T12:35:04.4727116Z       |       ^~~~
2025-12-04T12:35:04.4728307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4728436Z  1970 |       0x80,
2025-12-04T12:35:04.4728543Z       |       ^~~~
2025-12-04T12:35:04.4729717Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4729812Z  1972 |       0x80,
2025-12-04T12:35:04.4729920Z       |       ^~~~
2025-12-04T12:35:04.4731173Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4731281Z  1974 |       0x80,
2025-12-04T12:35:04.4731374Z       |       ^~~~
2025-12-04T12:35:04.4732551Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4732665Z  1976 |       0x80,
2025-12-04T12:35:04.4732758Z       |       ^~~~
2025-12-04T12:35:04.4733928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4734036Z  1978 |       0x80,
2025-12-04T12:35:04.4734128Z       |       ^~~~
2025-12-04T12:35:04.4735329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4735426Z  1980 |       0x80,
2025-12-04T12:35:04.4735518Z       |       ^~~~
2025-12-04T12:35:04.4736774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4736878Z  1982 |       0x80,
2025-12-04T12:35:04.4736975Z       |       ^~~~
2025-12-04T12:35:04.4738169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4738265Z  1984 |       0x80,
2025-12-04T12:35:04.4738375Z       |       ^~~~
2025-12-04T12:35:04.4739607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4739705Z  1986 |       0x80,
2025-12-04T12:35:04.4739816Z       |       ^~~~
2025-12-04T12:35:04.4740995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4741112Z  1988 |       0x80,
2025-12-04T12:35:04.4741205Z       |       ^~~~
2025-12-04T12:35:04.4742379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4742488Z  1990 |       0x80,
2025-12-04T12:35:04.4742587Z       |       ^~~~
2025-12-04T12:35:04.4743768Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4743877Z  1992 |       0x80,
2025-12-04T12:35:04.4743970Z       |       ^~~~
2025-12-04T12:35:04.4745165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.4745368Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.4745484Z       |                                      ^~~~~~
2025-12-04T12:35:04.4747956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = false; T = signed char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.4748569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2109:28:   required from here
2025-12-04T12:35:04.4749767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4749870Z  1866 |       0x80,
2025-12-04T12:35:04.4749982Z       |       ^~~~
2025-12-04T12:35:04.4751156Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4751251Z  1868 |       0x80,
2025-12-04T12:35:04.4751367Z       |       ^~~~
2025-12-04T12:35:04.4752552Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4752657Z  1870 |       0x80,
2025-12-04T12:35:04.4752750Z       |       ^~~~
2025-12-04T12:35:04.4753927Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4754042Z  1872 |       0x80,
2025-12-04T12:35:04.4754134Z       |       ^~~~
2025-12-04T12:35:04.4755306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4755420Z  1874 |       0x80,
2025-12-04T12:35:04.4755511Z       |       ^~~~
2025-12-04T12:35:04.4756754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4756849Z  1876 |       0x80,
2025-12-04T12:35:04.4756941Z       |       ^~~~
2025-12-04T12:35:04.4758126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4758226Z  1878 |       0x80,
2025-12-04T12:35:04.4758317Z       |       ^~~~
2025-12-04T12:35:04.4759505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4759603Z  1880 |       0x80,
2025-12-04T12:35:04.4759713Z       |       ^~~~
2025-12-04T12:35:04.4760896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4760990Z  1882 |       0x80,
2025-12-04T12:35:04.4761099Z       |       ^~~~
2025-12-04T12:35:04.4762275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4762420Z  1884 |       0x80,
2025-12-04T12:35:04.4762512Z       |       ^~~~
2025-12-04T12:35:04.4767113Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4767332Z  1886 |       0x80,
2025-12-04T12:35:04.4767442Z       |       ^~~~
2025-12-04T12:35:04.4770815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4770930Z  1888 |       0x80,
2025-12-04T12:35:04.4771223Z       |       ^~~~
2025-12-04T12:35:04.4772467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4772590Z  1890 |       0x80,
2025-12-04T12:35:04.4772686Z       |       ^~~~
2025-12-04T12:35:04.4773910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4774012Z  1892 |       0x80,
2025-12-04T12:35:04.4774119Z       |       ^~~~
2025-12-04T12:35:04.4775320Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4775416Z  1894 |       0x80,
2025-12-04T12:35:04.4775523Z       |       ^~~~
2025-12-04T12:35:04.4776765Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4776867Z  1896 |       0x80,
2025-12-04T12:35:04.4776975Z       |       ^~~~
2025-12-04T12:35:04.4778160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4778277Z  1898 |       0x80,
2025-12-04T12:35:04.4778370Z       |       ^~~~
2025-12-04T12:35:04.4779550Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4779660Z  1900 |       0x80,
2025-12-04T12:35:04.4779753Z       |       ^~~~
2025-12-04T12:35:04.4780930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4781046Z  1902 |       0x80,
2025-12-04T12:35:04.4781139Z       |       ^~~~
2025-12-04T12:35:04.4782329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4782429Z  1904 |       0x80,
2025-12-04T12:35:04.4782521Z       |       ^~~~
2025-12-04T12:35:04.4783710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4783804Z  1906 |       0x80,
2025-12-04T12:35:04.4783908Z       |       ^~~~
2025-12-04T12:35:04.4785080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4785278Z  1908 |       0x80,
2025-12-04T12:35:04.4785388Z       |       ^~~~
2025-12-04T12:35:04.4786664Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4786809Z  1910 |       0x80,
2025-12-04T12:35:04.4786918Z       |       ^~~~
2025-12-04T12:35:04.4788153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4788262Z  1912 |       0x80,
2025-12-04T12:35:04.4788357Z       |       ^~~~
2025-12-04T12:35:04.4789534Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4789652Z  1914 |       0x80,
2025-12-04T12:35:04.4789746Z       |       ^~~~
2025-12-04T12:35:04.4790931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4791030Z  1916 |       0x80,
2025-12-04T12:35:04.4791124Z       |       ^~~~
2025-12-04T12:35:04.4792312Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4792406Z  1918 |       0x80,
2025-12-04T12:35:04.4792499Z       |       ^~~~
2025-12-04T12:35:04.4793684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4793785Z  1920 |       0x80,
2025-12-04T12:35:04.4793894Z       |       ^~~~
2025-12-04T12:35:04.4795077Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4795177Z  1922 |       0x80,
2025-12-04T12:35:04.4795286Z       |       ^~~~
2025-12-04T12:35:04.4796466Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4796573Z  1924 |       0x80,
2025-12-04T12:35:04.4796671Z       |       ^~~~
2025-12-04T12:35:04.4797841Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4797958Z  1926 |       0x80,
2025-12-04T12:35:04.4798051Z       |       ^~~~
2025-12-04T12:35:04.4799234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4799351Z  1928 |       0x80);
2025-12-04T12:35:04.4799444Z       |       ^~~~
2025-12-04T12:35:04.4800635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4800730Z  1930 |       0x80,
2025-12-04T12:35:04.4800826Z       |       ^~~~
2025-12-04T12:35:04.4802020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4802159Z  1932 |       0x80,
2025-12-04T12:35:04.4802264Z       |       ^~~~
2025-12-04T12:35:04.4803492Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4803623Z  1934 |       0x80,
2025-12-04T12:35:04.4803729Z       |       ^~~~
2025-12-04T12:35:04.4804950Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4805047Z  1936 |       0x80,
2025-12-04T12:35:04.4805158Z       |       ^~~~
2025-12-04T12:35:04.4806340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4806456Z  1938 |       0x80,
2025-12-04T12:35:04.4806550Z       |       ^~~~
2025-12-04T12:35:04.4807734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4807852Z  1940 |       0x80,
2025-12-04T12:35:04.4807946Z       |       ^~~~
2025-12-04T12:35:04.4809141Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4809236Z  1942 |       0x80,
2025-12-04T12:35:04.4809331Z       |       ^~~~
2025-12-04T12:35:04.4810521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4810624Z  1944 |       0x80,
2025-12-04T12:35:04.4810718Z       |       ^~~~
2025-12-04T12:35:04.4811914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4812015Z  1946 |       0x80,
2025-12-04T12:35:04.4812119Z       |       ^~~~
2025-12-04T12:35:04.4813297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4813388Z  1948 |       0x80,
2025-12-04T12:35:04.4813495Z       |       ^~~~
2025-12-04T12:35:04.4814673Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4814779Z  1950 |       0x80,
2025-12-04T12:35:04.4814873Z       |       ^~~~
2025-12-04T12:35:04.4816046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4816162Z  1952 |       0x80,
2025-12-04T12:35:04.4816254Z       |       ^~~~
2025-12-04T12:35:04.4817518Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4817626Z  1954 |       0x80,
2025-12-04T12:35:04.4817719Z       |       ^~~~
2025-12-04T12:35:04.4818950Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4819044Z  1956 |       0x80,
2025-12-04T12:35:04.4819137Z       |       ^~~~
2025-12-04T12:35:04.4820385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4820513Z  1958 |       0x80,
2025-12-04T12:35:04.4820605Z       |       ^~~~
2025-12-04T12:35:04.4821837Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4821933Z  1960 |       0x80,
2025-12-04T12:35:04.4822041Z       |       ^~~~
2025-12-04T12:35:04.4823223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4823317Z  1962 |       0x80,
2025-12-04T12:35:04.4823426Z       |       ^~~~
2025-12-04T12:35:04.4824599Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4824713Z  1964 |       0x80,
2025-12-04T12:35:04.4824806Z       |       ^~~~
2025-12-04T12:35:04.4825978Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4826085Z  1966 |       0x80,
2025-12-04T12:35:04.4826179Z       |       ^~~~
2025-12-04T12:35:04.4827353Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4827459Z  1968 |       0x80,
2025-12-04T12:35:04.4827552Z       |       ^~~~
2025-12-04T12:35:04.4828755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4828855Z  1970 |       0x80,
2025-12-04T12:35:04.4828948Z       |       ^~~~
2025-12-04T12:35:04.4830135Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4830232Z  1972 |       0x80,
2025-12-04T12:35:04.4830343Z       |       ^~~~
2025-12-04T12:35:04.4831558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4831653Z  1974 |       0x80,
2025-12-04T12:35:04.4831756Z       |       ^~~~
2025-12-04T12:35:04.4832932Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4833068Z  1976 |       0x80,
2025-12-04T12:35:04.4833177Z       |       ^~~~
2025-12-04T12:35:04.4834357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4834467Z  1978 |       0x80,
2025-12-04T12:35:04.4834567Z       |       ^~~~
2025-12-04T12:35:04.4835740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4835852Z  1980 |       0x80,
2025-12-04T12:35:04.4835946Z       |       ^~~~
2025-12-04T12:35:04.4837173Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4837276Z  1982 |       0x80,
2025-12-04T12:35:04.4837369Z       |       ^~~~
2025-12-04T12:35:04.4838590Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4838687Z  1984 |       0x80,
2025-12-04T12:35:04.4838785Z       |       ^~~~
2025-12-04T12:35:04.4839977Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4840070Z  1986 |       0x80,
2025-12-04T12:35:04.4840175Z       |       ^~~~
2025-12-04T12:35:04.4841354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4841455Z  1988 |       0x80,
2025-12-04T12:35:04.4841562Z       |       ^~~~
2025-12-04T12:35:04.4842740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4842851Z  1990 |       0x80,
2025-12-04T12:35:04.4842949Z       |       ^~~~
2025-12-04T12:35:04.4844123Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4844229Z  1992 |       0x80,
2025-12-04T12:35:04.4844321Z       |       ^~~~
2025-12-04T12:35:04.4845514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.4845692Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.4845819Z       |                                      ^~~~~~
2025-12-04T12:35:04.4848260Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = false; T = unsigned char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.4848887Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2116:28:   required from here
2025-12-04T12:35:04.4850133Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4850232Z  1866 |       0x80,
2025-12-04T12:35:04.4850328Z       |       ^~~~
2025-12-04T12:35:04.4851530Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4851633Z  1868 |       0x80,
2025-12-04T12:35:04.4851727Z       |       ^~~~
2025-12-04T12:35:04.4852922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4853016Z  1870 |       0x80,
2025-12-04T12:35:04.4853122Z       |       ^~~~
2025-12-04T12:35:04.4854348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4854445Z  1872 |       0x80,
2025-12-04T12:35:04.4854553Z       |       ^~~~
2025-12-04T12:35:04.4855760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4855881Z  1874 |       0x80,
2025-12-04T12:35:04.4855975Z       |       ^~~~
2025-12-04T12:35:04.4857224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4857338Z  1876 |       0x80,
2025-12-04T12:35:04.4857439Z       |       ^~~~
2025-12-04T12:35:04.4858622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4858733Z  1878 |       0x80,
2025-12-04T12:35:04.4858833Z       |       ^~~~
2025-12-04T12:35:04.4860021Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4860123Z  1880 |       0x80,
2025-12-04T12:35:04.4860217Z       |       ^~~~
2025-12-04T12:35:04.4861404Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4861499Z  1882 |       0x80,
2025-12-04T12:35:04.4861613Z       |       ^~~~
2025-12-04T12:35:04.4862791Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4862892Z  1884 |       0x80,
2025-12-04T12:35:04.4863003Z       |       ^~~~
2025-12-04T12:35:04.4864172Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4864331Z  1886 |       0x80,
2025-12-04T12:35:04.4864424Z       |       ^~~~
2025-12-04T12:35:04.4865601Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4865707Z  1888 |       0x80,
2025-12-04T12:35:04.4865872Z       |       ^~~~
2025-12-04T12:35:04.4867050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4867157Z  1890 |       0x80,
2025-12-04T12:35:04.4867288Z       |       ^~~~
2025-12-04T12:35:04.4868472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4868572Z  1892 |       0x80,
2025-12-04T12:35:04.4868664Z       |       ^~~~
2025-12-04T12:35:04.4869846Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4869941Z  1894 |       0x80,
2025-12-04T12:35:04.4870046Z       |       ^~~~
2025-12-04T12:35:04.4871418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4871513Z  1896 |       0x80,
2025-12-04T12:35:04.4871625Z       |       ^~~~
2025-12-04T12:35:04.4872804Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4872905Z  1898 |       0x80,
2025-12-04T12:35:04.4873012Z       |       ^~~~
2025-12-04T12:35:04.4874177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4874286Z  1900 |       0x80,
2025-12-04T12:35:04.4874391Z       |       ^~~~
2025-12-04T12:35:04.4875561Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4875668Z  1902 |       0x80,
2025-12-04T12:35:04.4875784Z       |       ^~~~
2025-12-04T12:35:04.4876972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4877074Z  1904 |       0x80,
2025-12-04T12:35:04.4877184Z       |       ^~~~
2025-12-04T12:35:04.4878359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4878461Z  1906 |       0x80,
2025-12-04T12:35:04.4878575Z       |       ^~~~
2025-12-04T12:35:04.4879753Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4879856Z  1908 |       0x80,
2025-12-04T12:35:04.4879964Z       |       ^~~~
2025-12-04T12:35:04.4881143Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4881368Z  1910 |       0x80,
2025-12-04T12:35:04.4881463Z       |       ^~~~
2025-12-04T12:35:04.4882643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4882858Z  1912 |       0x80,
2025-12-04T12:35:04.4882954Z       |       ^~~~
2025-12-04T12:35:04.4884149Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4884296Z  1914 |       0x80,
2025-12-04T12:35:04.4884392Z       |       ^~~~
2025-12-04T12:35:04.4885591Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4885695Z  1916 |       0x80,
2025-12-04T12:35:04.4885789Z       |       ^~~~
2025-12-04T12:35:04.4886981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4887090Z  1918 |       0x80,
2025-12-04T12:35:04.4887200Z       |       ^~~~
2025-12-04T12:35:04.4888384Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4888486Z  1920 |       0x80,
2025-12-04T12:35:04.4888597Z       |       ^~~~
2025-12-04T12:35:04.4889773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4889892Z  1922 |       0x80,
2025-12-04T12:35:04.4889987Z       |       ^~~~
2025-12-04T12:35:04.4891164Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4891288Z  1924 |       0x80,
2025-12-04T12:35:04.4891380Z       |       ^~~~
2025-12-04T12:35:04.4892557Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4892674Z  1926 |       0x80,
2025-12-04T12:35:04.4892768Z       |       ^~~~
2025-12-04T12:35:04.4893958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4894063Z  1928 |       0x80);
2025-12-04T12:35:04.4894159Z       |       ^~~~
2025-12-04T12:35:04.4895358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4895462Z  1930 |       0x80,
2025-12-04T12:35:04.4895570Z       |       ^~~~
2025-12-04T12:35:04.4896845Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4896944Z  1932 |       0x80,
2025-12-04T12:35:04.4897051Z       |       ^~~~
2025-12-04T12:35:04.4898233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4898391Z  1934 |       0x80,
2025-12-04T12:35:04.4898496Z       |       ^~~~
2025-12-04T12:35:04.4899710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4899852Z  1936 |       0x80,
2025-12-04T12:35:04.4899943Z       |       ^~~~
2025-12-04T12:35:04.4901160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4901273Z  1938 |       0x80,
2025-12-04T12:35:04.4901366Z       |       ^~~~
2025-12-04T12:35:04.4902550Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4902650Z  1940 |       0x80,
2025-12-04T12:35:04.4902745Z       |       ^~~~
2025-12-04T12:35:04.4903931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4904032Z  1942 |       0x80,
2025-12-04T12:35:04.4904124Z       |       ^~~~
2025-12-04T12:35:04.4905312Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4905406Z  1944 |       0x80,
2025-12-04T12:35:04.4905514Z       |       ^~~~
2025-12-04T12:35:04.4906684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4906785Z  1946 |       0x80,
2025-12-04T12:35:04.4906894Z       |       ^~~~
2025-12-04T12:35:04.4908067Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4908183Z  1948 |       0x80,
2025-12-04T12:35:04.4908277Z       |       ^~~~
2025-12-04T12:35:04.4909450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4909560Z  1950 |       0x80,
2025-12-04T12:35:04.4909654Z       |       ^~~~
2025-12-04T12:35:04.4910831Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4910947Z  1952 |       0x80,
2025-12-04T12:35:04.4911041Z       |       ^~~~
2025-12-04T12:35:04.4912229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4912330Z  1954 |       0x80,
2025-12-04T12:35:04.4912422Z       |       ^~~~
2025-12-04T12:35:04.4913610Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4913705Z  1956 |       0x80,
2025-12-04T12:35:04.4913797Z       |       ^~~~
2025-12-04T12:35:04.4914980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4915118Z  1958 |       0x80,
2025-12-04T12:35:04.4915223Z       |       ^~~~
2025-12-04T12:35:04.4916436Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4916563Z  1960 |       0x80,
2025-12-04T12:35:04.4916668Z       |       ^~~~
2025-12-04T12:35:04.4917882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4917991Z  1962 |       0x80,
2025-12-04T12:35:04.4918087Z       |       ^~~~
2025-12-04T12:35:04.4919263Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4919383Z  1964 |       0x80,
2025-12-04T12:35:04.4919479Z       |       ^~~~
2025-12-04T12:35:04.4920660Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4920775Z  1966 |       0x80,
2025-12-04T12:35:04.4920868Z       |       ^~~~
2025-12-04T12:35:04.4922059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4922153Z  1968 |       0x80,
2025-12-04T12:35:04.4922247Z       |       ^~~~
2025-12-04T12:35:04.4923437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4923537Z  1970 |       0x80,
2025-12-04T12:35:04.4923643Z       |       ^~~~
2025-12-04T12:35:04.4924821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4924921Z  1972 |       0x80,
2025-12-04T12:35:04.4925028Z       |       ^~~~
2025-12-04T12:35:04.4926211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4926304Z  1974 |       0x80,
2025-12-04T12:35:04.4926409Z       |       ^~~~
2025-12-04T12:35:04.4927578Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4927732Z  1976 |       0x80,
2025-12-04T12:35:04.4927825Z       |       ^~~~
2025-12-04T12:35:04.4929002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4929145Z  1978 |       0x80,
2025-12-04T12:35:04.4929239Z       |       ^~~~
2025-12-04T12:35:04.4930435Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4930529Z  1980 |       0x80,
2025-12-04T12:35:04.4930620Z       |       ^~~~
2025-12-04T12:35:04.4931806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4931906Z  1982 |       0x80,
2025-12-04T12:35:04.4931999Z       |       ^~~~
2025-12-04T12:35:04.4933241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4933342Z  1984 |       0x80,
2025-12-04T12:35:04.4933448Z       |       ^~~~
2025-12-04T12:35:04.4934679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4934775Z  1986 |       0x80,
2025-12-04T12:35:04.4934883Z       |       ^~~~
2025-12-04T12:35:04.4936050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4936162Z  1988 |       0x80,
2025-12-04T12:35:04.4936256Z       |       ^~~~
2025-12-04T12:35:04.4937512Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4937628Z  1990 |       0x80,
2025-12-04T12:35:04.4937720Z       |       ^~~~
2025-12-04T12:35:04.4938897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.4939006Z  1992 |       0x80,
2025-12-04T12:35:04.4939098Z       |       ^~~~
2025-12-04T12:35:04.4940300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.4940459Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.4940577Z       |                                      ^~~~~~
2025-12-04T12:35:04.4941100Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:16,
2025-12-04T12:35:04.4941480Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.4941938Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.4942341Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.4942805Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.4943549Z                  from /tmp/Ld3r6p/tmpxth9d7tz/data/aotinductor/model/cuzep6c3r5og2e4er75yunzp3ebrphkoo5sxbhxof3er5uux4ih3.wrapper.cpp:750:
2025-12-04T12:35:04.4945023Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = signed char; int64_t = long int]’:
2025-12-04T12:35:04.4945656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:696:31:   required from here
2025-12-04T12:35:04.4946850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4946970Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4947088Z       |       ^~~~
2025-12-04T12:35:04.4948274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4948405Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4948507Z       |             ^~~~
2025-12-04T12:35:04.4949752Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4949880Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4950021Z       |                   ^~~~
2025-12-04T12:35:04.4951219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4951340Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4951448Z       |                         ^~~~
2025-12-04T12:35:04.4952654Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4952775Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4952877Z       |       ^~~~
2025-12-04T12:35:04.4954074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4954194Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4954307Z       |             ^~~~
2025-12-04T12:35:04.4955479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4955600Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4955716Z       |                   ^~~~
2025-12-04T12:35:04.4956906Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4957039Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4957141Z       |                         ^~~~
2025-12-04T12:35:04.4958326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4958453Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4958640Z       |       ^~~~
2025-12-04T12:35:04.4959914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4986401Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4986631Z       |             ^~~~
2025-12-04T12:35:04.4988209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4988408Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4988513Z       |                   ^~~~
2025-12-04T12:35:04.4989782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4989917Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4990034Z       |                         ^~~~
2025-12-04T12:35:04.4991238Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4991367Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4991463Z       |       ^~~~
2025-12-04T12:35:04.4992681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4992797Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4992903Z       |             ^~~~
2025-12-04T12:35:04.4994107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4994228Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4994342Z       |                   ^~~~
2025-12-04T12:35:04.4995521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4995642Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4995767Z       |                         ^~~~
2025-12-04T12:35:04.4996951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4997087Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4997182Z       |       ^~~~
2025-12-04T12:35:04.4998365Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4998501Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.4998601Z       |             ^~~~
2025-12-04T12:35:04.4999791Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.4999927Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5000029Z       |                   ^~~~
2025-12-04T12:35:04.5001229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5001342Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5001445Z       |                         ^~~~
2025-12-04T12:35:04.5002719Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5002830Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5002940Z       |       ^~~~
2025-12-04T12:35:04.5004160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5004309Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5004422Z       |             ^~~~
2025-12-04T12:35:04.5005641Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5005769Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5005880Z       |                   ^~~~
2025-12-04T12:35:04.5007075Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5007202Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5007314Z       |                         ^~~~
2025-12-04T12:35:04.5008501Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5008629Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5008728Z       |       ^~~~
2025-12-04T12:35:04.5009920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5010038Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5010133Z       |             ^~~~
2025-12-04T12:35:04.5011325Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5011442Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5011561Z       |                   ^~~~
2025-12-04T12:35:04.5012741Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5012859Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5012970Z       |                         ^~~~
2025-12-04T12:35:04.5014148Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5014278Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5014370Z       |       ^~~~
2025-12-04T12:35:04.5015557Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5015685Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5015781Z       |             ^~~~
2025-12-04T12:35:04.5017045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5017174Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5017274Z       |                   ^~~~
2025-12-04T12:35:04.5018524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5018635Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5018737Z       |                         ^~~~
2025-12-04T12:35:04.5019973Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5020118Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5020225Z       |       ^~~~
2025-12-04T12:35:04.5021446Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5021561Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5021680Z       |             ^~~~
2025-12-04T12:35:04.5022858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5022981Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5023080Z       |                   ^~~~
2025-12-04T12:35:04.5024268Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5024393Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5024498Z       |                         ^~~~
2025-12-04T12:35:04.5025674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5025806Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5025900Z       |       ^~~~
2025-12-04T12:35:04.5027091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5027210Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5027312Z       |             ^~~~
2025-12-04T12:35:04.5028510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5028628Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5028740Z       |                   ^~~~
2025-12-04T12:35:04.5029923Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5030041Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5030156Z       |                         ^~~~
2025-12-04T12:35:04.5031336Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5031466Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5031558Z       |       ^~~~
2025-12-04T12:35:04.5032742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5032868Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5032966Z       |             ^~~~
2025-12-04T12:35:04.5034182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5034306Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5034406Z       |                   ^~~~
2025-12-04T12:35:04.5035632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5035778Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5035880Z       |                         ^~~~
2025-12-04T12:35:04.5037107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5037218Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5037330Z       |       ^~~~
2025-12-04T12:35:04.5038513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5038623Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5038732Z       |             ^~~~
2025-12-04T12:35:04.5039925Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5040038Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5040159Z       |                   ^~~~
2025-12-04T12:35:04.5041337Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5041469Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5041568Z       |                         ^~~~
2025-12-04T12:35:04.5043062Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = unsigned char; int64_t = long int]’:
2025-12-04T12:35:04.5043661Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:933:31:   required from here
2025-12-04T12:35:04.5044848Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5044977Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5045071Z       |       ^~~~
2025-12-04T12:35:04.5046255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5046433Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5046529Z       |             ^~~~
2025-12-04T12:35:04.5047734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5047883Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5047984Z       |                   ^~~~
2025-12-04T12:35:04.5049185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5049299Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5049422Z       |                         ^~~~
2025-12-04T12:35:04.5050609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5050723Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5050830Z       |       ^~~~
2025-12-04T12:35:04.5052064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5052189Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5052336Z       |             ^~~~
2025-12-04T12:35:04.5053528Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5053660Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5053759Z       |                   ^~~~
2025-12-04T12:35:04.5054943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5055077Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5055180Z       |                         ^~~~
2025-12-04T12:35:04.5056444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5056567Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5056658Z       |       ^~~~
2025-12-04T12:35:04.5057858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5057976Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5058085Z       |             ^~~~
2025-12-04T12:35:04.5059274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5059391Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5059504Z       |                   ^~~~
2025-12-04T12:35:04.5060695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5060821Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5060924Z       |                         ^~~~
2025-12-04T12:35:04.5062169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5062294Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5062390Z       |       ^~~~
2025-12-04T12:35:04.5063585Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5063753Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5063852Z       |             ^~~~
2025-12-04T12:35:04.5065059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5065171Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5065277Z       |                   ^~~~
2025-12-04T12:35:04.5066467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5066581Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5066695Z       |                         ^~~~
2025-12-04T12:35:04.5067943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5068058Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5068214Z       |       ^~~~
2025-12-04T12:35:04.5069400Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5069519Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5069633Z       |             ^~~~
2025-12-04T12:35:04.5070814Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5071192Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5071357Z       |                   ^~~~
2025-12-04T12:35:04.5072587Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5072724Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5072826Z       |                         ^~~~
2025-12-04T12:35:04.5074022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5074142Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5074236Z       |       ^~~~
2025-12-04T12:35:04.5075443Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5075564Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5075679Z       |             ^~~~
2025-12-04T12:35:04.5076868Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5076983Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5077098Z       |                   ^~~~
2025-12-04T12:35:04.5078274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5078491Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5078609Z       |                         ^~~~
2025-12-04T12:35:04.5079795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5079974Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5080072Z       |       ^~~~
2025-12-04T12:35:04.5081270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5081402Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5081507Z       |             ^~~~
2025-12-04T12:35:04.5082703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5082817Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5082919Z       |                   ^~~~
2025-12-04T12:35:04.5084181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5084302Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5084465Z       |                         ^~~~
2025-12-04T12:35:04.5085647Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5085764Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5085876Z       |       ^~~~
2025-12-04T12:35:04.5087063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5087182Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5087302Z       |             ^~~~
2025-12-04T12:35:04.5088485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5088625Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5088728Z       |                   ^~~~
2025-12-04T12:35:04.5089910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5090048Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5090152Z       |                         ^~~~
2025-12-04T12:35:04.5091352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5091471Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5091565Z       |       ^~~~
2025-12-04T12:35:04.5092768Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5092882Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5092980Z       |             ^~~~
2025-12-04T12:35:04.5094177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5094340Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5094458Z       |                   ^~~~
2025-12-04T12:35:04.5095685Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5095830Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5095947Z       |                         ^~~~
2025-12-04T12:35:04.5097236Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5097368Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5097466Z       |       ^~~~
2025-12-04T12:35:04.5098656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5098782Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5098878Z       |             ^~~~
2025-12-04T12:35:04.5100088Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5100201Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5100308Z       |                   ^~~~
2025-12-04T12:35:04.5101510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5101628Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5101730Z       |                         ^~~~
2025-12-04T12:35:04.5102923Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5103048Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5103158Z       |       ^~~~
2025-12-04T12:35:04.5104345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5104462Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5104570Z       |             ^~~~
2025-12-04T12:35:04.5105747Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5105876Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5105976Z       |                   ^~~~
2025-12-04T12:35:04.5107159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5107294Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5107393Z       |                         ^~~~
2025-12-04T12:35:04.5108587Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5108700Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5108792Z       |       ^~~~
2025-12-04T12:35:04.5110035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5110146Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5110246Z       |             ^~~~
2025-12-04T12:35:04.5111487Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5111653Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5111768Z       |                   ^~~~
2025-12-04T12:35:04.5112986Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5113099Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5113220Z       |                         ^~~~
2025-12-04T12:35:04.5113327Z PASSED [9.3004s] [ 37%]
2025-12-04T12:35:04.5114428Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_loading_wrong_model W1204 12:28:54.274000 140836 site-packages/torch/_inductor/package/package.py:120] Loading outdated pt2 file. Please regenerate your package.
2025-12-04T12:35:04.5114538Z PASSED [5.1753s] [ 38%]
2025-12-04T12:35:04.5115516Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_metadata In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_float.h:12,
2025-12-04T12:35:04.5115971Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:11,
2025-12-04T12:35:04.5116340Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.5116796Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.5117201Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.5117664Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.5118347Z                  from /tmp/zNkm53/tmpp0pxkc5y/data/aotinductor/model/cuzep6c3r5og2e4er75yunzp3ebrphkoo5sxbhxof3er5uux4ih3.wrapper.cpp:750:
2025-12-04T12:35:04.5118947Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/sleef.h:192:10: warning: ISO C++ prohibits anonymous structs [-Wpedantic]
2025-12-04T12:35:04.5119060Z   192 |   struct {
2025-12-04T12:35:04.5119155Z       |          ^
2025-12-04T12:35:04.5119660Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15,
2025-12-04T12:35:04.5120043Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.5120484Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.5120900Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.5121365Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.5122030Z                  from /tmp/zNkm53/tmpp0pxkc5y/data/aotinductor/model/cuzep6c3r5og2e4er75yunzp3ebrphkoo5sxbhxof3er5uux4ih3.wrapper.cpp:750:
2025-12-04T12:35:04.5124293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<short int>&, const at::vec::CPU_CAPABILITY::Vectorized<short int>&, const at::vec::CPU_CAPABILITY::Vectorized<short int>&)’:
2025-12-04T12:35:04.5125548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:544:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.5125752Z   544 |     auto msb_one = _mm512_set1_epi16(0xFFFF);
2025-12-04T12:35:04.5125901Z       |                                      ^~~~~~
2025-12-04T12:35:04.5126406Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15,
2025-12-04T12:35:04.5126825Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.5127268Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.5127687Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.5128156Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.5128818Z                  from /tmp/zNkm53/tmpp0pxkc5y/data/aotinductor/model/cuzep6c3r5og2e4er75yunzp3ebrphkoo5sxbhxof3er5uux4ih3.wrapper.cpp:750:
2025-12-04T12:35:04.5130478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.5131651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:697:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.5131882Z   697 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.5132009Z       |                                                      ^~~~~~
2025-12-04T12:35:04.5133642Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.5134807Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:701:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.5135019Z   701 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.5135156Z       |                                                      ^~~~~~
2025-12-04T12:35:04.5136850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.5138048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:705:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.5138259Z   705 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.5138404Z       |                                                      ^~~~~~
2025-12-04T12:35:04.5140030Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.5141187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:709:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.5141462Z   709 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.5141588Z       |                                                      ^~~~~~
2025-12-04T12:35:04.5143252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator>(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.5144470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:713:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.5144689Z   713 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.5144822Z       |                                                      ^~~~~~
2025-12-04T12:35:04.5146442Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator>=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.5147620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:717:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.5147825Z   717 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.5147961Z       |                                                      ^~~~~~
2025-12-04T12:35:04.5150212Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&, const at::vec::CPU_CAPABILITY::Vectorized<signed char>&, const at::vec::CPU_CAPABILITY::Vectorized<signed char>&)’:
2025-12-04T12:35:04.5151433Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1153:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5151585Z  1153 |     auto msb_one = _mm512_set1_epi8(0xFF);
2025-12-04T12:35:04.5151700Z       |                                     ^~~~
2025-12-04T12:35:04.5153380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.5154569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1166:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5154788Z  1166 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.5154920Z       |                                                     ^~~~
2025-12-04T12:35:04.5156581Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.5157787Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1170:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5158039Z  1170 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.5158176Z       |                                                     ^~~~
2025-12-04T12:35:04.5159857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.5161091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1174:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5161333Z  1174 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.5161460Z       |                                                     ^~~~
2025-12-04T12:35:04.5163134Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.5164334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1178:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5164552Z  1178 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.5164674Z       |                                                     ^~~~
2025-12-04T12:35:04.5167008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&, const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&, const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&)’:
2025-12-04T12:35:04.5168215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1207:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5168368Z  1207 |     auto msb_one = _mm512_set1_epi8(0xFF);
2025-12-04T12:35:04.5168504Z       |                                     ^~~~
2025-12-04T12:35:04.5170198Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.5172130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1220:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5172441Z  1220 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.5172570Z       |                                                     ^~~~
2025-12-04T12:35:04.5174305Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.5175577Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1224:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5175793Z  1224 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.5175917Z       |                                                     ^~~~
2025-12-04T12:35:04.5177695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.5178956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1228:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5179165Z  1228 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.5179353Z       |                                                     ^~~~
2025-12-04T12:35:04.5181049Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.5182255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1232:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5182460Z  1232 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.5182588Z       |                                                     ^~~~
2025-12-04T12:35:04.5184967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = true; T = signed char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.5185549Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2074:27:   required from here
2025-12-04T12:35:04.5186748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5186851Z  1866 |       0x80,
2025-12-04T12:35:04.5186966Z       |       ^~~~
2025-12-04T12:35:04.5188141Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5188242Z  1868 |       0x80,
2025-12-04T12:35:04.5188349Z       |       ^~~~
2025-12-04T12:35:04.5189527Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5189712Z  1870 |       0x80,
2025-12-04T12:35:04.5189806Z       |       ^~~~
2025-12-04T12:35:04.5190988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5191142Z  1872 |       0x80,
2025-12-04T12:35:04.5191238Z       |       ^~~~
2025-12-04T12:35:04.5192415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5192533Z  1874 |       0x80,
2025-12-04T12:35:04.5192628Z       |       ^~~~
2025-12-04T12:35:04.5193821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5193924Z  1876 |       0x80,
2025-12-04T12:35:04.5194018Z       |       ^~~~
2025-12-04T12:35:04.5195201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5195350Z  1878 |       0x80,
2025-12-04T12:35:04.5195459Z       |       ^~~~
2025-12-04T12:35:04.5196635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5196768Z  1880 |       0x80,
2025-12-04T12:35:04.5196880Z       |       ^~~~
2025-12-04T12:35:04.5198058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5198159Z  1882 |       0x80,
2025-12-04T12:35:04.5198268Z       |       ^~~~
2025-12-04T12:35:04.5199443Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5199571Z  1884 |       0x80,
2025-12-04T12:35:04.5199666Z       |       ^~~~
2025-12-04T12:35:04.5200838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5200958Z  1886 |       0x80,
2025-12-04T12:35:04.5201053Z       |       ^~~~
2025-12-04T12:35:04.5202236Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5202340Z  1888 |       0x80,
2025-12-04T12:35:04.5202434Z       |       ^~~~
2025-12-04T12:35:04.5203628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5203727Z  1890 |       0x80,
2025-12-04T12:35:04.5203822Z       |       ^~~~
2025-12-04T12:35:04.5205017Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5205109Z  1892 |       0x80,
2025-12-04T12:35:04.5205218Z       |       ^~~~
2025-12-04T12:35:04.5206391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5206540Z  1894 |       0x80,
2025-12-04T12:35:04.5206652Z       |       ^~~~
2025-12-04T12:35:04.5207835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5207985Z  1896 |       0x80,
2025-12-04T12:35:04.5208080Z       |       ^~~~
2025-12-04T12:35:04.5209267Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5209379Z  1898 |       0x80,
2025-12-04T12:35:04.5209471Z       |       ^~~~
2025-12-04T12:35:04.5210647Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5210765Z  1900 |       0x80,
2025-12-04T12:35:04.5210857Z       |       ^~~~
2025-12-04T12:35:04.5212085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5212187Z  1902 |       0x80,
2025-12-04T12:35:04.5212281Z       |       ^~~~
2025-12-04T12:35:04.5213502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5213598Z  1904 |       0x80,
2025-12-04T12:35:04.5213690Z       |       ^~~~
2025-12-04T12:35:04.5214871Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5214972Z  1906 |       0x80,
2025-12-04T12:35:04.5215077Z       |       ^~~~
2025-12-04T12:35:04.5216262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5216422Z  1908 |       0x80,
2025-12-04T12:35:04.5216535Z       |       ^~~~
2025-12-04T12:35:04.5217726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5217833Z  1910 |       0x80,
2025-12-04T12:35:04.5217926Z       |       ^~~~
2025-12-04T12:35:04.5219096Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5219211Z  1912 |       0x80,
2025-12-04T12:35:04.5219306Z       |       ^~~~
2025-12-04T12:35:04.5220500Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5220604Z  1914 |       0x80,
2025-12-04T12:35:04.5220695Z       |       ^~~~
2025-12-04T12:35:04.5221887Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5221982Z  1916 |       0x80,
2025-12-04T12:35:04.5222076Z       |       ^~~~
2025-12-04T12:35:04.5223261Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5223416Z  1918 |       0x80,
2025-12-04T12:35:04.5223524Z       |       ^~~~
2025-12-04T12:35:04.5224742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5224867Z  1920 |       0x80,
2025-12-04T12:35:04.5224974Z       |       ^~~~
2025-12-04T12:35:04.5226192Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5226301Z  1922 |       0x80,
2025-12-04T12:35:04.5226393Z       |       ^~~~
2025-12-04T12:35:04.5227577Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5227689Z  1924 |       0x80,
2025-12-04T12:35:04.5227781Z       |       ^~~~
2025-12-04T12:35:04.5228962Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5229075Z  1926 |       0x80,
2025-12-04T12:35:04.5229167Z       |       ^~~~
2025-12-04T12:35:04.5230359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5230455Z  1928 |       0x80);
2025-12-04T12:35:04.5230547Z       |       ^~~~
2025-12-04T12:35:04.5231735Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5231834Z  1930 |       0x80,
2025-12-04T12:35:04.5231926Z       |       ^~~~
2025-12-04T12:35:04.5233117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5233217Z  1932 |       0x80,
2025-12-04T12:35:04.5233323Z       |       ^~~~
2025-12-04T12:35:04.5234512Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5234610Z  1934 |       0x80,
2025-12-04T12:35:04.5234717Z       |       ^~~~
2025-12-04T12:35:04.5235887Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5236001Z  1936 |       0x80,
2025-12-04T12:35:04.5236093Z       |       ^~~~
2025-12-04T12:35:04.5237278Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5237392Z  1938 |       0x80,
2025-12-04T12:35:04.5237483Z       |       ^~~~
2025-12-04T12:35:04.5238666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5238772Z  1940 |       0x80,
2025-12-04T12:35:04.5238864Z       |       ^~~~
2025-12-04T12:35:04.5240045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5240210Z  1942 |       0x80,
2025-12-04T12:35:04.5240303Z       |       ^~~~
2025-12-04T12:35:04.5241530Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5241658Z  1944 |       0x80,
2025-12-04T12:35:04.5241768Z       |       ^~~~
2025-12-04T12:35:04.5242981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5243078Z  1946 |       0x80,
2025-12-04T12:35:04.5243189Z       |       ^~~~
2025-12-04T12:35:04.5244363Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5244462Z  1948 |       0x80,
2025-12-04T12:35:04.5244566Z       |       ^~~~
2025-12-04T12:35:04.5245745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5245860Z  1950 |       0x80,
2025-12-04T12:35:04.5245952Z       |       ^~~~
2025-12-04T12:35:04.5247128Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5247235Z  1952 |       0x80,
2025-12-04T12:35:04.5247328Z       |       ^~~~
2025-12-04T12:35:04.5248513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5248607Z  1954 |       0x80,
2025-12-04T12:35:04.5248698Z       |       ^~~~
2025-12-04T12:35:04.5249888Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5249990Z  1956 |       0x80,
2025-12-04T12:35:04.5250083Z       |       ^~~~
2025-12-04T12:35:04.5251271Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5251365Z  1958 |       0x80,
2025-12-04T12:35:04.5251472Z       |       ^~~~
2025-12-04T12:35:04.5252651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5252743Z  1960 |       0x80,
2025-12-04T12:35:04.5252848Z       |       ^~~~
2025-12-04T12:35:04.5254025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5254139Z  1962 |       0x80,
2025-12-04T12:35:04.5254230Z       |       ^~~~
2025-12-04T12:35:04.5255411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5255517Z  1964 |       0x80,
2025-12-04T12:35:04.5255609Z       |       ^~~~
2025-12-04T12:35:04.5256898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5257007Z  1966 |       0x80,
2025-12-04T12:35:04.5257100Z       |       ^~~~
2025-12-04T12:35:04.5258340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5258470Z  1968 |       0x80,
2025-12-04T12:35:04.5258563Z       |       ^~~~
2025-12-04T12:35:04.5259792Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5259888Z  1970 |       0x80,
2025-12-04T12:35:04.5259981Z       |       ^~~~
2025-12-04T12:35:04.5261172Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5261268Z  1972 |       0x80,
2025-12-04T12:35:04.5261372Z       |       ^~~~
2025-12-04T12:35:04.5262552Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5262651Z  1974 |       0x80,
2025-12-04T12:35:04.5262758Z       |       ^~~~
2025-12-04T12:35:04.5263939Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5264050Z  1976 |       0x80,
2025-12-04T12:35:04.5264150Z       |       ^~~~
2025-12-04T12:35:04.5265326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5265438Z  1978 |       0x80,
2025-12-04T12:35:04.5265530Z       |       ^~~~
2025-12-04T12:35:04.5266706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5266824Z  1980 |       0x80,
2025-12-04T12:35:04.5266917Z       |       ^~~~
2025-12-04T12:35:04.5268110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5268206Z  1982 |       0x80,
2025-12-04T12:35:04.5268339Z       |       ^~~~
2025-12-04T12:35:04.5269525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5269619Z  1984 |       0x80,
2025-12-04T12:35:04.5269724Z       |       ^~~~
2025-12-04T12:35:04.5270898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5271345Z  1986 |       0x80,
2025-12-04T12:35:04.5271452Z       |       ^~~~
2025-12-04T12:35:04.5272649Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5272742Z  1988 |       0x80,
2025-12-04T12:35:04.5272858Z       |       ^~~~
2025-12-04T12:35:04.5274031Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5274140Z  1990 |       0x80,
2025-12-04T12:35:04.5274232Z       |       ^~~~
2025-12-04T12:35:04.5275486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5275603Z  1992 |       0x80,
2025-12-04T12:35:04.5275696Z       |       ^~~~
2025-12-04T12:35:04.5276948Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.5277111Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.5277236Z       |                                      ^~~~~~
2025-12-04T12:35:04.5279657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = true; T = unsigned char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.5280248Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2081:27:   required from here
2025-12-04T12:35:04.5281450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5281551Z  1866 |       0x80,
2025-12-04T12:35:04.5281646Z       |       ^~~~
2025-12-04T12:35:04.5282843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5282940Z  1868 |       0x80,
2025-12-04T12:35:04.5283049Z       |       ^~~~
2025-12-04T12:35:04.5284231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5284327Z  1870 |       0x80,
2025-12-04T12:35:04.5284436Z       |       ^~~~
2025-12-04T12:35:04.5285617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5285786Z  1872 |       0x80,
2025-12-04T12:35:04.5285883Z       |       ^~~~
2025-12-04T12:35:04.5287064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5287174Z  1874 |       0x80,
2025-12-04T12:35:04.5287270Z       |       ^~~~
2025-12-04T12:35:04.5288494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5288606Z  1876 |       0x80,
2025-12-04T12:35:04.5288700Z       |       ^~~~
2025-12-04T12:35:04.5289909Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5290013Z  1878 |       0x80,
2025-12-04T12:35:04.5290105Z       |       ^~~~
2025-12-04T12:35:04.5291291Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5291387Z  1880 |       0x80,
2025-12-04T12:35:04.5291537Z       |       ^~~~
2025-12-04T12:35:04.5292726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5292822Z  1882 |       0x80,
2025-12-04T12:35:04.5292983Z       |       ^~~~
2025-12-04T12:35:04.5294157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5294262Z  1884 |       0x80,
2025-12-04T12:35:04.5294366Z       |       ^~~~
2025-12-04T12:35:04.5295547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5295652Z  1886 |       0x80,
2025-12-04T12:35:04.5295757Z       |       ^~~~
2025-12-04T12:35:04.5296996Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5297107Z  1888 |       0x80,
2025-12-04T12:35:04.5297206Z       |       ^~~~
2025-12-04T12:35:04.5298401Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5298501Z  1890 |       0x80,
2025-12-04T12:35:04.5298595Z       |       ^~~~
2025-12-04T12:35:04.5299784Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5299880Z  1892 |       0x80,
2025-12-04T12:35:04.5299983Z       |       ^~~~
2025-12-04T12:35:04.5301172Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5301266Z  1894 |       0x80,
2025-12-04T12:35:04.5301376Z       |       ^~~~
2025-12-04T12:35:04.5302547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5302685Z  1896 |       0x80,
2025-12-04T12:35:04.5302793Z       |       ^~~~
2025-12-04T12:35:04.5303967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5304078Z  1898 |       0x80,
2025-12-04T12:35:04.5304212Z       |       ^~~~
2025-12-04T12:35:04.5305386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5305496Z  1900 |       0x80,
2025-12-04T12:35:04.5305599Z       |       ^~~~
2025-12-04T12:35:04.5306769Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5306882Z  1902 |       0x80,
2025-12-04T12:35:04.5306974Z       |       ^~~~
2025-12-04T12:35:04.5308165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5308293Z  1904 |       0x80,
2025-12-04T12:35:04.5308393Z       |       ^~~~
2025-12-04T12:35:04.5309584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5309711Z  1906 |       0x80,
2025-12-04T12:35:04.5309805Z       |       ^~~~
2025-12-04T12:35:04.5310993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5311094Z  1908 |       0x80,
2025-12-04T12:35:04.5311203Z       |       ^~~~
2025-12-04T12:35:04.5312377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5312478Z  1910 |       0x80,
2025-12-04T12:35:04.5312590Z       |       ^~~~
2025-12-04T12:35:04.5313761Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5313878Z  1912 |       0x80,
2025-12-04T12:35:04.5313970Z       |       ^~~~
2025-12-04T12:35:04.5315147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5315261Z  1914 |       0x80,
2025-12-04T12:35:04.5315354Z       |       ^~~~
2025-12-04T12:35:04.5316533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5316652Z  1916 |       0x80,
2025-12-04T12:35:04.5316743Z       |       ^~~~
2025-12-04T12:35:04.5317930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5318030Z  1918 |       0x80,
2025-12-04T12:35:04.5318123Z       |       ^~~~
2025-12-04T12:35:04.5319307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5319441Z  1920 |       0x80,
2025-12-04T12:35:04.5319550Z       |       ^~~~
2025-12-04T12:35:04.5320731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5320903Z  1922 |       0x80,
2025-12-04T12:35:04.5321012Z       |       ^~~~
2025-12-04T12:35:04.5322189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5322324Z  1924 |       0x80,
2025-12-04T12:35:04.5322432Z       |       ^~~~
2025-12-04T12:35:04.5323611Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5323725Z  1926 |       0x80,
2025-12-04T12:35:04.5323819Z       |       ^~~~
2025-12-04T12:35:04.5324995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5325115Z  1928 |       0x80);
2025-12-04T12:35:04.5325207Z       |       ^~~~
2025-12-04T12:35:04.5326400Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5326496Z  1930 |       0x80,
2025-12-04T12:35:04.5326587Z       |       ^~~~
2025-12-04T12:35:04.5327775Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5327875Z  1932 |       0x80,
2025-12-04T12:35:04.5327966Z       |       ^~~~
2025-12-04T12:35:04.5329155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5329254Z  1934 |       0x80,
2025-12-04T12:35:04.5329358Z       |       ^~~~
2025-12-04T12:35:04.5330536Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5330629Z  1936 |       0x80,
2025-12-04T12:35:04.5330734Z       |       ^~~~
2025-12-04T12:35:04.5331904Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5332018Z  1938 |       0x80,
2025-12-04T12:35:04.5332110Z       |       ^~~~
2025-12-04T12:35:04.5333288Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5333402Z  1940 |       0x80,
2025-12-04T12:35:04.5333492Z       |       ^~~~
2025-12-04T12:35:04.5334668Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5334774Z  1942 |       0x80,
2025-12-04T12:35:04.5334866Z       |       ^~~~
2025-12-04T12:35:04.5336053Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5336189Z  1944 |       0x80,
2025-12-04T12:35:04.5336281Z       |       ^~~~
2025-12-04T12:35:04.5337603Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5337733Z  1946 |       0x80,
2025-12-04T12:35:04.5337840Z       |       ^~~~
2025-12-04T12:35:04.5339055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5339152Z  1948 |       0x80,
2025-12-04T12:35:04.5339259Z       |       ^~~~
2025-12-04T12:35:04.5340432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5340536Z  1950 |       0x80,
2025-12-04T12:35:04.5340644Z       |       ^~~~
2025-12-04T12:35:04.5341821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5341936Z  1952 |       0x80,
2025-12-04T12:35:04.5342029Z       |       ^~~~
2025-12-04T12:35:04.5343209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5343318Z  1954 |       0x80,
2025-12-04T12:35:04.5343410Z       |       ^~~~
2025-12-04T12:35:04.5344596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5344696Z  1956 |       0x80,
2025-12-04T12:35:04.5344787Z       |       ^~~~
2025-12-04T12:35:04.5345978Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5346079Z  1958 |       0x80,
2025-12-04T12:35:04.5346171Z       |       ^~~~
2025-12-04T12:35:04.5347362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5347457Z  1960 |       0x80,
2025-12-04T12:35:04.5347563Z       |       ^~~~
2025-12-04T12:35:04.5348732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5348832Z  1962 |       0x80,
2025-12-04T12:35:04.5348938Z       |       ^~~~
2025-12-04T12:35:04.5350115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5350228Z  1964 |       0x80,
2025-12-04T12:35:04.5350320Z       |       ^~~~
2025-12-04T12:35:04.5351501Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5351607Z  1966 |       0x80,
2025-12-04T12:35:04.5351700Z       |       ^~~~
2025-12-04T12:35:04.5352869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5353033Z  1968 |       0x80,
2025-12-04T12:35:04.5353127Z       |       ^~~~
2025-12-04T12:35:04.5354362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5354505Z  1970 |       0x80,
2025-12-04T12:35:04.5354598Z       |       ^~~~
2025-12-04T12:35:04.5355832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5355928Z  1972 |       0x80,
2025-12-04T12:35:04.5356020Z       |       ^~~~
2025-12-04T12:35:04.5357209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5357311Z  1974 |       0x80,
2025-12-04T12:35:04.5357418Z       |       ^~~~
2025-12-04T12:35:04.5358596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5358696Z  1976 |       0x80,
2025-12-04T12:35:04.5358801Z       |       ^~~~
2025-12-04T12:35:04.5359983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5360091Z  1978 |       0x80,
2025-12-04T12:35:04.5360183Z       |       ^~~~
2025-12-04T12:35:04.5361352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5361462Z  1980 |       0x80,
2025-12-04T12:35:04.5361555Z       |       ^~~~
2025-12-04T12:35:04.5362734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5362847Z  1982 |       0x80,
2025-12-04T12:35:04.5362940Z       |       ^~~~
2025-12-04T12:35:04.5364131Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5364237Z  1984 |       0x80,
2025-12-04T12:35:04.5364331Z       |       ^~~~
2025-12-04T12:35:04.5365515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5365649Z  1986 |       0x80,
2025-12-04T12:35:04.5365756Z       |       ^~~~
2025-12-04T12:35:04.5366937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5367068Z  1988 |       0x80,
2025-12-04T12:35:04.5367176Z       |       ^~~~
2025-12-04T12:35:04.5368358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5368454Z  1990 |       0x80,
2025-12-04T12:35:04.5368561Z       |       ^~~~
2025-12-04T12:35:04.5369744Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5369853Z  1992 |       0x80,
2025-12-04T12:35:04.5369948Z       |       ^~~~
2025-12-04T12:35:04.5371398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.5371580Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.5371700Z       |                                      ^~~~~~
2025-12-04T12:35:04.5374179Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = false; T = signed char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.5374775Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2109:28:   required from here
2025-12-04T12:35:04.5375987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5376091Z  1866 |       0x80,
2025-12-04T12:35:04.5376186Z       |       ^~~~
2025-12-04T12:35:04.5377464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5377563Z  1868 |       0x80,
2025-12-04T12:35:04.5377657Z       |       ^~~~
2025-12-04T12:35:04.5378855Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5378952Z  1870 |       0x80,
2025-12-04T12:35:04.5379061Z       |       ^~~~
2025-12-04T12:35:04.5380246Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5380348Z  1872 |       0x80,
2025-12-04T12:35:04.5380457Z       |       ^~~~
2025-12-04T12:35:04.5381638Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5381749Z  1874 |       0x80,
2025-12-04T12:35:04.5381909Z       |       ^~~~
2025-12-04T12:35:04.5383087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5383194Z  1876 |       0x80,
2025-12-04T12:35:04.5383290Z       |       ^~~~
2025-12-04T12:35:04.5384468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5384629Z  1878 |       0x80,
2025-12-04T12:35:04.5384722Z       |       ^~~~
2025-12-04T12:35:04.5385918Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5386015Z  1880 |       0x80,
2025-12-04T12:35:04.5386114Z       |       ^~~~
2025-12-04T12:35:04.5387300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5387397Z  1882 |       0x80,
2025-12-04T12:35:04.5387502Z       |       ^~~~
2025-12-04T12:35:04.5388708Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5388808Z  1884 |       0x80,
2025-12-04T12:35:04.5388916Z       |       ^~~~
2025-12-04T12:35:04.5390125Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5390222Z  1886 |       0x80,
2025-12-04T12:35:04.5390335Z       |       ^~~~
2025-12-04T12:35:04.5391513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5391619Z  1888 |       0x80,
2025-12-04T12:35:04.5391710Z       |       ^~~~
2025-12-04T12:35:04.5392892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5393003Z  1890 |       0x80,
2025-12-04T12:35:04.5393097Z       |       ^~~~
2025-12-04T12:35:04.5394280Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5394374Z  1892 |       0x80,
2025-12-04T12:35:04.5394475Z       |       ^~~~
2025-12-04T12:35:04.5395664Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5395759Z  1894 |       0x80,
2025-12-04T12:35:04.5395851Z       |       ^~~~
2025-12-04T12:35:04.5397045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5397146Z  1896 |       0x80,
2025-12-04T12:35:04.5397256Z       |       ^~~~
2025-12-04T12:35:04.5398433Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5398526Z  1898 |       0x80,
2025-12-04T12:35:04.5398679Z       |       ^~~~
2025-12-04T12:35:04.5399847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5399954Z  1900 |       0x80,
2025-12-04T12:35:04.5400046Z       |       ^~~~
2025-12-04T12:35:04.5401254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5401396Z  1902 |       0x80,
2025-12-04T12:35:04.5401488Z       |       ^~~~
2025-12-04T12:35:04.5402694Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5402816Z  1904 |       0x80,
2025-12-04T12:35:04.5402910Z       |       ^~~~
2025-12-04T12:35:04.5404100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5404195Z  1906 |       0x80,
2025-12-04T12:35:04.5404286Z       |       ^~~~
2025-12-04T12:35:04.5405480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5405574Z  1908 |       0x80,
2025-12-04T12:35:04.5405667Z       |       ^~~~
2025-12-04T12:35:04.5406856Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5406956Z  1910 |       0x80,
2025-12-04T12:35:04.5407062Z       |       ^~~~
2025-12-04T12:35:04.5408236Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5408331Z  1912 |       0x80,
2025-12-04T12:35:04.5408435Z       |       ^~~~
2025-12-04T12:35:04.5409622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5409731Z  1914 |       0x80,
2025-12-04T12:35:04.5409824Z       |       ^~~~
2025-12-04T12:35:04.5411009Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5411122Z  1916 |       0x80,
2025-12-04T12:35:04.5411214Z       |       ^~~~
2025-12-04T12:35:04.5412389Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5412498Z  1918 |       0x80,
2025-12-04T12:35:04.5412596Z       |       ^~~~
2025-12-04T12:35:04.5413796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5413891Z  1920 |       0x80,
2025-12-04T12:35:04.5413988Z       |       ^~~~
2025-12-04T12:35:04.5415178Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5415320Z  1922 |       0x80,
2025-12-04T12:35:04.5415426Z       |       ^~~~
2025-12-04T12:35:04.5416667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5416765Z  1924 |       0x80,
2025-12-04T12:35:04.5416939Z       |       ^~~~
2025-12-04T12:35:04.5418153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5418248Z  1926 |       0x80,
2025-12-04T12:35:04.5418389Z       |       ^~~~
2025-12-04T12:35:04.5419563Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5419680Z  1928 |       0x80);
2025-12-04T12:35:04.5419775Z       |       ^~~~
2025-12-04T12:35:04.5420949Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5421060Z  1930 |       0x80,
2025-12-04T12:35:04.5421164Z       |       ^~~~
2025-12-04T12:35:04.5422345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5422438Z  1932 |       0x80,
2025-12-04T12:35:04.5422537Z       |       ^~~~
2025-12-04T12:35:04.5423725Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5423826Z  1934 |       0x80,
2025-12-04T12:35:04.5423917Z       |       ^~~~
2025-12-04T12:35:04.5425100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5425194Z  1936 |       0x80,
2025-12-04T12:35:04.5425313Z       |       ^~~~
2025-12-04T12:35:04.5426481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5426573Z  1938 |       0x80,
2025-12-04T12:35:04.5426683Z       |       ^~~~
2025-12-04T12:35:04.5427859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5427972Z  1940 |       0x80,
2025-12-04T12:35:04.5428064Z       |       ^~~~
2025-12-04T12:35:04.5429231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5429340Z  1942 |       0x80,
2025-12-04T12:35:04.5429442Z       |       ^~~~
2025-12-04T12:35:04.5430609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5430716Z  1944 |       0x80,
2025-12-04T12:35:04.5430813Z       |       ^~~~
2025-12-04T12:35:04.5432000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5432141Z  1946 |       0x80,
2025-12-04T12:35:04.5432232Z       |       ^~~~
2025-12-04T12:35:04.5433426Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5433557Z  1948 |       0x80,
2025-12-04T12:35:04.5433682Z       |       ^~~~
2025-12-04T12:35:04.5434871Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5435000Z  1950 |       0x80,
2025-12-04T12:35:04.5435111Z       |       ^~~~
2025-12-04T12:35:04.5436295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5436394Z  1952 |       0x80,
2025-12-04T12:35:04.5436501Z       |       ^~~~
2025-12-04T12:35:04.5437675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5437793Z  1954 |       0x80,
2025-12-04T12:35:04.5437885Z       |       ^~~~
2025-12-04T12:35:04.5439058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5439171Z  1956 |       0x80,
2025-12-04T12:35:04.5439264Z       |       ^~~~
2025-12-04T12:35:04.5440435Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5440552Z  1958 |       0x80,
2025-12-04T12:35:04.5440644Z       |       ^~~~
2025-12-04T12:35:04.5441836Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5441944Z  1960 |       0x80,
2025-12-04T12:35:04.5442038Z       |       ^~~~
2025-12-04T12:35:04.5443222Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5443324Z  1962 |       0x80,
2025-12-04T12:35:04.5443434Z       |       ^~~~
2025-12-04T12:35:04.5444605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5444705Z  1964 |       0x80,
2025-12-04T12:35:04.5444811Z       |       ^~~~
2025-12-04T12:35:04.5445981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5446099Z  1966 |       0x80,
2025-12-04T12:35:04.5446192Z       |       ^~~~
2025-12-04T12:35:04.5447362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5447476Z  1968 |       0x80,
2025-12-04T12:35:04.5447567Z       |       ^~~~
2025-12-04T12:35:04.5448739Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5448900Z  1970 |       0x80,
2025-12-04T12:35:04.5448993Z       |       ^~~~
2025-12-04T12:35:04.5450218Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5450347Z  1972 |       0x80,
2025-12-04T12:35:04.5450440Z       |       ^~~~
2025-12-04T12:35:04.5451666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5451764Z  1974 |       0x80,
2025-12-04T12:35:04.5451857Z       |       ^~~~
2025-12-04T12:35:04.5453050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5453152Z  1976 |       0x80,
2025-12-04T12:35:04.5453256Z       |       ^~~~
2025-12-04T12:35:04.5454438Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5454541Z  1978 |       0x80,
2025-12-04T12:35:04.5454650Z       |       ^~~~
2025-12-04T12:35:04.5455828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5455938Z  1980 |       0x80,
2025-12-04T12:35:04.5456037Z       |       ^~~~
2025-12-04T12:35:04.5457276Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5457397Z  1982 |       0x80,
2025-12-04T12:35:04.5457491Z       |       ^~~~
2025-12-04T12:35:04.5458679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5458797Z  1984 |       0x80,
2025-12-04T12:35:04.5458891Z       |       ^~~~
2025-12-04T12:35:04.5460084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5460181Z  1986 |       0x80,
2025-12-04T12:35:04.5460276Z       |       ^~~~
2025-12-04T12:35:04.5461466Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5461712Z  1988 |       0x80,
2025-12-04T12:35:04.5461819Z       |       ^~~~
2025-12-04T12:35:04.5463000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5463133Z  1990 |       0x80,
2025-12-04T12:35:04.5463243Z       |       ^~~~
2025-12-04T12:35:04.5464424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5464520Z  1992 |       0x80,
2025-12-04T12:35:04.5464632Z       |       ^~~~
2025-12-04T12:35:04.5465810Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.5465994Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.5466113Z       |                                      ^~~~~~
2025-12-04T12:35:04.5468586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = false; T = unsigned char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.5469189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2116:28:   required from here
2025-12-04T12:35:04.5470376Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5470497Z  1866 |       0x80,
2025-12-04T12:35:04.5470592Z       |       ^~~~
2025-12-04T12:35:04.5471993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5472094Z  1868 |       0x80,
2025-12-04T12:35:04.5472186Z       |       ^~~~
2025-12-04T12:35:04.5473388Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5473485Z  1870 |       0x80,
2025-12-04T12:35:04.5473578Z       |       ^~~~
2025-12-04T12:35:04.5474782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5474885Z  1872 |       0x80,
2025-12-04T12:35:04.5474991Z       |       ^~~~
2025-12-04T12:35:04.5476170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5476272Z  1874 |       0x80,
2025-12-04T12:35:04.5476384Z       |       ^~~~
2025-12-04T12:35:04.5477559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5477664Z  1876 |       0x80,
2025-12-04T12:35:04.5477758Z       |       ^~~~
2025-12-04T12:35:04.5478928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5479146Z  1878 |       0x80,
2025-12-04T12:35:04.5479239Z       |       ^~~~
2025-12-04T12:35:04.5480420Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5480578Z  1880 |       0x80,
2025-12-04T12:35:04.5480669Z       |       ^~~~
2025-12-04T12:35:04.5481867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5481963Z  1882 |       0x80,
2025-12-04T12:35:04.5482055Z       |       ^~~~
2025-12-04T12:35:04.5483241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5483340Z  1884 |       0x80,
2025-12-04T12:35:04.5483435Z       |       ^~~~
2025-12-04T12:35:04.5484677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5484779Z  1886 |       0x80,
2025-12-04T12:35:04.5484890Z       |       ^~~~
2025-12-04T12:35:04.5486115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5486210Z  1888 |       0x80,
2025-12-04T12:35:04.5486317Z       |       ^~~~
2025-12-04T12:35:04.5487502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5487611Z  1890 |       0x80,
2025-12-04T12:35:04.5487705Z       |       ^~~~
2025-12-04T12:35:04.5488882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5488996Z  1892 |       0x80,
2025-12-04T12:35:04.5489089Z       |       ^~~~
2025-12-04T12:35:04.5490263Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5490370Z  1894 |       0x80,
2025-12-04T12:35:04.5490462Z       |       ^~~~
2025-12-04T12:35:04.5491655Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5491750Z  1896 |       0x80,
2025-12-04T12:35:04.5491841Z       |       ^~~~
2025-12-04T12:35:04.5493028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5493127Z  1898 |       0x80,
2025-12-04T12:35:04.5493229Z       |       ^~~~
2025-12-04T12:35:04.5494407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5494499Z  1900 |       0x80,
2025-12-04T12:35:04.5494605Z       |       ^~~~
2025-12-04T12:35:04.5495825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5495918Z  1902 |       0x80,
2025-12-04T12:35:04.5496023Z       |       ^~~~
2025-12-04T12:35:04.5497325Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5497477Z  1904 |       0x80,
2025-12-04T12:35:04.5497570Z       |       ^~~~
2025-12-04T12:35:04.5498789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5498903Z  1906 |       0x80,
2025-12-04T12:35:04.5498997Z       |       ^~~~
2025-12-04T12:35:04.5500195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5500289Z  1908 |       0x80,
2025-12-04T12:35:04.5500380Z       |       ^~~~
2025-12-04T12:35:04.5501576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5501677Z  1910 |       0x80,
2025-12-04T12:35:04.5501768Z       |       ^~~~
2025-12-04T12:35:04.5502961Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5503056Z  1912 |       0x80,
2025-12-04T12:35:04.5503160Z       |       ^~~~
2025-12-04T12:35:04.5504336Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5504430Z  1914 |       0x80,
2025-12-04T12:35:04.5504538Z       |       ^~~~
2025-12-04T12:35:04.5505727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5505839Z  1916 |       0x80,
2025-12-04T12:35:04.5505931Z       |       ^~~~
2025-12-04T12:35:04.5507110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5507217Z  1918 |       0x80,
2025-12-04T12:35:04.5507313Z       |       ^~~~
2025-12-04T12:35:04.5508485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5508594Z  1920 |       0x80,
2025-12-04T12:35:04.5508686Z       |       ^~~~
2025-12-04T12:35:04.5509872Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5509972Z  1922 |       0x80,
2025-12-04T12:35:04.5510063Z       |       ^~~~
2025-12-04T12:35:04.5511256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5511350Z  1924 |       0x80,
2025-12-04T12:35:04.5511512Z       |       ^~~~
2025-12-04T12:35:04.5512692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5512786Z  1926 |       0x80,
2025-12-04T12:35:04.5512892Z       |       ^~~~
2025-12-04T12:35:04.5514098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5514226Z  1928 |       0x80);
2025-12-04T12:35:04.5514333Z       |       ^~~~
2025-12-04T12:35:04.5515544Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5515654Z  1930 |       0x80,
2025-12-04T12:35:04.5515754Z       |       ^~~~
2025-12-04T12:35:04.5516936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5517045Z  1932 |       0x80,
2025-12-04T12:35:04.5517135Z       |       ^~~~
2025-12-04T12:35:04.5518324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5518425Z  1934 |       0x80,
2025-12-04T12:35:04.5518516Z       |       ^~~~
2025-12-04T12:35:04.5519714Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5519810Z  1936 |       0x80,
2025-12-04T12:35:04.5519907Z       |       ^~~~
2025-12-04T12:35:04.5521107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5521201Z  1938 |       0x80,
2025-12-04T12:35:04.5521305Z       |       ^~~~
2025-12-04T12:35:04.5522487Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5522585Z  1940 |       0x80,
2025-12-04T12:35:04.5522690Z       |       ^~~~
2025-12-04T12:35:04.5523867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5523981Z  1942 |       0x80,
2025-12-04T12:35:04.5524073Z       |       ^~~~
2025-12-04T12:35:04.5525242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5525348Z  1944 |       0x80,
2025-12-04T12:35:04.5525440Z       |       ^~~~
2025-12-04T12:35:04.5526621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5526728Z  1946 |       0x80,
2025-12-04T12:35:04.5526820Z       |       ^~~~
2025-12-04T12:35:04.5528019Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5528165Z  1948 |       0x80,
2025-12-04T12:35:04.5528257Z       |       ^~~~
2025-12-04T12:35:04.5529445Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5529540Z  1950 |       0x80,
2025-12-04T12:35:04.5529635Z       |       ^~~~
2025-12-04T12:35:04.5530908Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5531005Z  1952 |       0x80,
2025-12-04T12:35:04.5531116Z       |       ^~~~
2025-12-04T12:35:04.5532341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5532444Z  1954 |       0x80,
2025-12-04T12:35:04.5532555Z       |       ^~~~
2025-12-04T12:35:04.5533724Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5533833Z  1956 |       0x80,
2025-12-04T12:35:04.5533927Z       |       ^~~~
2025-12-04T12:35:04.5535112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5535219Z  1958 |       0x80,
2025-12-04T12:35:04.5535310Z       |       ^~~~
2025-12-04T12:35:04.5536557Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5536674Z  1960 |       0x80,
2025-12-04T12:35:04.5536768Z       |       ^~~~
2025-12-04T12:35:04.5537965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5538062Z  1962 |       0x80,
2025-12-04T12:35:04.5538159Z       |       ^~~~
2025-12-04T12:35:04.5539354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5539449Z  1964 |       0x80,
2025-12-04T12:35:04.5539561Z       |       ^~~~
2025-12-04T12:35:04.5540730Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5540830Z  1966 |       0x80,
2025-12-04T12:35:04.5540947Z       |       ^~~~
2025-12-04T12:35:04.5542117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5542211Z  1968 |       0x80,
2025-12-04T12:35:04.5542324Z       |       ^~~~
2025-12-04T12:35:04.5543502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5543611Z  1970 |       0x80,
2025-12-04T12:35:04.5543711Z       |       ^~~~
2025-12-04T12:35:04.5544886Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5545061Z  1972 |       0x80,
2025-12-04T12:35:04.5545156Z       |       ^~~~
2025-12-04T12:35:04.5546348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5546444Z  1974 |       0x80,
2025-12-04T12:35:04.5546606Z       |       ^~~~
2025-12-04T12:35:04.5547796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5547892Z  1976 |       0x80,
2025-12-04T12:35:04.5548020Z       |       ^~~~
2025-12-04T12:35:04.5549211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5549316Z  1978 |       0x80,
2025-12-04T12:35:04.5549423Z       |       ^~~~
2025-12-04T12:35:04.5550594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5550691Z  1980 |       0x80,
2025-12-04T12:35:04.5550813Z       |       ^~~~
2025-12-04T12:35:04.5551987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5552096Z  1982 |       0x80,
2025-12-04T12:35:04.5552196Z       |       ^~~~
2025-12-04T12:35:04.5553368Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5553488Z  1984 |       0x80,
2025-12-04T12:35:04.5553582Z       |       ^~~~
2025-12-04T12:35:04.5554755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5554873Z  1986 |       0x80,
2025-12-04T12:35:04.5554974Z       |       ^~~~
2025-12-04T12:35:04.5556158Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5556260Z  1988 |       0x80,
2025-12-04T12:35:04.5556355Z       |       ^~~~
2025-12-04T12:35:04.5557549Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5557691Z  1990 |       0x80,
2025-12-04T12:35:04.5557801Z       |       ^~~~
2025-12-04T12:35:04.5558981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5559086Z  1992 |       0x80,
2025-12-04T12:35:04.5559232Z       |       ^~~~
2025-12-04T12:35:04.5560416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.5560584Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.5560719Z       |                                      ^~~~~~
2025-12-04T12:35:04.5561225Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:16,
2025-12-04T12:35:04.5561617Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.5562061Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.5562506Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.5562987Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.5563688Z                  from /tmp/zNkm53/tmpp0pxkc5y/data/aotinductor/model/cuzep6c3r5og2e4er75yunzp3ebrphkoo5sxbhxof3er5uux4ih3.wrapper.cpp:750:
2025-12-04T12:35:04.5565184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = signed char; int64_t = long int]’:
2025-12-04T12:35:04.5565765Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:696:31:   required from here
2025-12-04T12:35:04.5566972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5567094Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5567185Z       |       ^~~~
2025-12-04T12:35:04.5568386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5568500Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5568600Z       |             ^~~~
2025-12-04T12:35:04.5569804Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5569924Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5570037Z       |                   ^~~~
2025-12-04T12:35:04.5571395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5571517Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5571633Z       |                         ^~~~
2025-12-04T12:35:04.5572824Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5572953Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5573136Z       |       ^~~~
2025-12-04T12:35:04.5574323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5574448Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5574548Z       |             ^~~~
2025-12-04T12:35:04.5575750Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5575912Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5576014Z       |                   ^~~~
2025-12-04T12:35:04.5577297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5577418Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5577519Z       |                         ^~~~
2025-12-04T12:35:04.5578715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5578900Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5579014Z       |       ^~~~
2025-12-04T12:35:04.5580198Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5580360Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5580472Z       |             ^~~~
2025-12-04T12:35:04.5581665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5581795Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5581897Z       |                   ^~~~
2025-12-04T12:35:04.5583081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5583212Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5583313Z       |                         ^~~~
2025-12-04T12:35:04.5584492Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5584619Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5584714Z       |       ^~~~
2025-12-04T12:35:04.5585906Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5586025Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5586123Z       |             ^~~~
2025-12-04T12:35:04.5587322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5587441Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5587554Z       |                   ^~~~
2025-12-04T12:35:04.5588743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5588857Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5589018Z       |                         ^~~~
2025-12-04T12:35:04.5590202Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5590329Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5590423Z       |       ^~~~
2025-12-04T12:35:04.5591636Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5591813Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5591910Z       |             ^~~~
2025-12-04T12:35:04.5593125Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5593258Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5593358Z       |                   ^~~~
2025-12-04T12:35:04.5594546Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5594658Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5594770Z       |                         ^~~~
2025-12-04T12:35:04.5595960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5596079Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5596185Z       |       ^~~~
2025-12-04T12:35:04.5597365Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5597481Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5597591Z       |             ^~~~
2025-12-04T12:35:04.5598776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5598906Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5599007Z       |                   ^~~~
2025-12-04T12:35:04.5600198Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5600323Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5600425Z       |                         ^~~~
2025-12-04T12:35:04.5601601Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5601733Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5601824Z       |       ^~~~
2025-12-04T12:35:04.5603020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5603137Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5603231Z       |             ^~~~
2025-12-04T12:35:04.5604429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5604540Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5604696Z       |                   ^~~~
2025-12-04T12:35:04.5605875Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5605986Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5606102Z       |                         ^~~~
2025-12-04T12:35:04.5607355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5607483Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5607579Z       |       ^~~~
2025-12-04T12:35:04.5608801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5608931Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5609028Z       |             ^~~~
2025-12-04T12:35:04.5610216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5610340Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5610451Z       |                   ^~~~
2025-12-04T12:35:04.5611647Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5611764Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5611865Z       |                         ^~~~
2025-12-04T12:35:04.5613056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5613174Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5613281Z       |       ^~~~
2025-12-04T12:35:04.5614462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5614580Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5614691Z       |             ^~~~
2025-12-04T12:35:04.5615877Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5616004Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5616105Z       |                   ^~~~
2025-12-04T12:35:04.5617359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5617494Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5617599Z       |                         ^~~~
2025-12-04T12:35:04.5618781Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5618918Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5619011Z       |       ^~~~
2025-12-04T12:35:04.5620217Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5620330Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5620469Z       |             ^~~~
2025-12-04T12:35:04.5621667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5621781Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5621895Z       |                   ^~~~
2025-12-04T12:35:04.5623119Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5623264Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5623411Z       |                         ^~~~
2025-12-04T12:35:04.5624589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5624720Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5624813Z       |       ^~~~
2025-12-04T12:35:04.5625995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5626126Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5626232Z       |             ^~~~
2025-12-04T12:35:04.5627416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5627548Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5627649Z       |                   ^~~~
2025-12-04T12:35:04.5628838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5628958Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5629061Z       |                         ^~~~
2025-12-04T12:35:04.5630256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5630373Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5630481Z       |       ^~~~
2025-12-04T12:35:04.5631668Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5631780Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5631893Z       |             ^~~~
2025-12-04T12:35:04.5633075Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5633209Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5633311Z       |                   ^~~~
2025-12-04T12:35:04.5634510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5634642Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5634748Z       |                         ^~~~
2025-12-04T12:35:04.5636221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = unsigned char; int64_t = long int]’:
2025-12-04T12:35:04.5636816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:933:31:   required from here
2025-12-04T12:35:04.5638038Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5638168Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5638332Z       |       ^~~~
2025-12-04T12:35:04.5639517Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5639681Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5639781Z       |             ^~~~
2025-12-04T12:35:04.5640985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5641106Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5641208Z       |                   ^~~~
2025-12-04T12:35:04.5642413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5642534Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5642650Z       |                         ^~~~
2025-12-04T12:35:04.5643832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5643946Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5644057Z       |       ^~~~
2025-12-04T12:35:04.5645247Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5645387Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5645485Z       |             ^~~~
2025-12-04T12:35:04.5646675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5646809Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5646910Z       |                   ^~~~
2025-12-04T12:35:04.5648096Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5648224Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5648327Z       |                         ^~~~
2025-12-04T12:35:04.5649555Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5649667Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5649761Z       |       ^~~~
2025-12-04T12:35:04.5650967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5651138Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5651253Z       |             ^~~~
2025-12-04T12:35:04.5652441Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5652557Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5652676Z       |                   ^~~~
2025-12-04T12:35:04.5653857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5653982Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5654127Z       |                         ^~~~
2025-12-04T12:35:04.5655306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5655474Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5655569Z       |       ^~~~
2025-12-04T12:35:04.5656825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5656959Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5657107Z       |             ^~~~
2025-12-04T12:35:04.5658311Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5658430Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5658529Z       |                   ^~~~
2025-12-04T12:35:04.5659728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5659839Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5659953Z       |                         ^~~~
2025-12-04T12:35:04.5661124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5661242Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5661346Z       |       ^~~~
2025-12-04T12:35:04.5662527Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5662656Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5662756Z       |             ^~~~
2025-12-04T12:35:04.5663940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5664066Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5664166Z       |                   ^~~~
2025-12-04T12:35:04.5665406Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5665531Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5665635Z       |                         ^~~~
2025-12-04T12:35:04.5666834Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5666987Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5667083Z       |       ^~~~
2025-12-04T12:35:04.5668290Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5668408Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5668517Z       |             ^~~~
2025-12-04T12:35:04.5669705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5669816Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5669973Z       |                   ^~~~
2025-12-04T12:35:04.5671359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5671554Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5671672Z       |                         ^~~~
2025-12-04T12:35:04.5673341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5673480Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5673574Z       |       ^~~~
2025-12-04T12:35:04.5674773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5674905Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5675004Z       |             ^~~~
2025-12-04T12:35:04.5676204Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5676316Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5676418Z       |                   ^~~~
2025-12-04T12:35:04.5677614Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5677738Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5677851Z       |                         ^~~~
2025-12-04T12:35:04.5679033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5679153Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5679261Z       |       ^~~~
2025-12-04T12:35:04.5680451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5680563Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5680756Z       |             ^~~~
2025-12-04T12:35:04.5681950Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5682077Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5682178Z       |                   ^~~~
2025-12-04T12:35:04.5683404Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5683577Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5683678Z       |                         ^~~~
2025-12-04T12:35:04.5684905Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5685024Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5685119Z       |       ^~~~
2025-12-04T12:35:04.5686319Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5686433Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5686558Z       |             ^~~~
2025-12-04T12:35:04.5687739Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5687856Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5687972Z       |                   ^~~~
2025-12-04T12:35:04.5689154Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5689272Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5689386Z       |                         ^~~~
2025-12-04T12:35:04.5690573Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5690709Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5690801Z       |       ^~~~
2025-12-04T12:35:04.5691982Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5692107Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5692204Z       |             ^~~~
2025-12-04T12:35:04.5693403Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5693521Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5693621Z       |                   ^~~~
2025-12-04T12:35:04.5694821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5694938Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5695038Z       |                         ^~~~
2025-12-04T12:35:04.5696230Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5696421Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5696580Z       |       ^~~~
2025-12-04T12:35:04.5697776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5697887Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5697999Z       |             ^~~~
2025-12-04T12:35:04.5699217Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5699378Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5699479Z       |                   ^~~~
2025-12-04T12:35:04.5700697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5700828Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5700930Z       |                         ^~~~
2025-12-04T12:35:04.5702126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5702244Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5702344Z       |       ^~~~
2025-12-04T12:35:04.5703537Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5703654Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5703750Z       |             ^~~~
2025-12-04T12:35:04.5704947Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5705065Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5705181Z       |                   ^~~~
2025-12-04T12:35:04.5706374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5706492Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.5706609Z       |                         ^~~~
2025-12-04T12:35:04.5706717Z PASSED [9.3053s] [ 39%]
2025-12-04T12:35:04.5707758Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_multiple_methods In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_float.h:12,
2025-12-04T12:35:04.5708198Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:11,
2025-12-04T12:35:04.5708573Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.5709035Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.5709442Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.5709921Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.5710568Z                  from /tmp/hdMAUq/tmp2gno_q_y/data/aotinductor/model1/cji6fcfpjxr5ad3oypbruxr5r26niflgwwkmd5rthzuhxclq6uis.wrapper.cpp:751:
2025-12-04T12:35:04.5711168Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/sleef.h:192:10: warning: ISO C++ prohibits anonymous structs [-Wpedantic]
2025-12-04T12:35:04.5711279Z   192 |   struct {
2025-12-04T12:35:04.5711434Z       |          ^
2025-12-04T12:35:04.5711950Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15,
2025-12-04T12:35:04.5712317Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.5712786Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.5713231Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.5713692Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.5714369Z                  from /tmp/hdMAUq/tmp2gno_q_y/data/aotinductor/model1/cji6fcfpjxr5ad3oypbruxr5r26niflgwwkmd5rthzuhxclq6uis.wrapper.cpp:751:
2025-12-04T12:35:04.5716605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<short int>&, const at::vec::CPU_CAPABILITY::Vectorized<short int>&, const at::vec::CPU_CAPABILITY::Vectorized<short int>&)’:
2025-12-04T12:35:04.5717797Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:544:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.5717968Z   544 |     auto msb_one = _mm512_set1_epi16(0xFFFF);
2025-12-04T12:35:04.5718086Z       |                                      ^~~~~~
2025-12-04T12:35:04.5718608Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15,
2025-12-04T12:35:04.5718978Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.5719424Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.5719844Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.5720312Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.5720980Z                  from /tmp/hdMAUq/tmp2gno_q_y/data/aotinductor/model1/cji6fcfpjxr5ad3oypbruxr5r26niflgwwkmd5rthzuhxclq6uis.wrapper.cpp:751:
2025-12-04T12:35:04.5722610Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.5723793Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:697:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.5724009Z   697 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.5724139Z       |                                                      ^~~~~~
2025-12-04T12:35:04.5725774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.5726950Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:701:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.5727172Z   701 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.5727341Z       |                                                      ^~~~~~
2025-12-04T12:35:04.5728982Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.5730187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:705:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.5730426Z   705 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.5730568Z       |                                                      ^~~~~~
2025-12-04T12:35:04.5732187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.5733368Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:709:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.5733587Z   709 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.5733712Z       |                                                      ^~~~~~
2025-12-04T12:35:04.5735342Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator>(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.5736584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:713:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.5736818Z   713 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.5736945Z       |                                                      ^~~~~~
2025-12-04T12:35:04.5738588Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator>=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.5740321Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:717:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.5740529Z   717 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.5740677Z       |                                                      ^~~~~~
2025-12-04T12:35:04.5742969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&, const at::vec::CPU_CAPABILITY::Vectorized<signed char>&, const at::vec::CPU_CAPABILITY::Vectorized<signed char>&)’:
2025-12-04T12:35:04.5744194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1153:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5744351Z  1153 |     auto msb_one = _mm512_set1_epi8(0xFF);
2025-12-04T12:35:04.5744481Z       |                                     ^~~~
2025-12-04T12:35:04.5746486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.5748695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1166:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5748960Z  1166 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.5749088Z       |                                                     ^~~~
2025-12-04T12:35:04.5750825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.5752822Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1170:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5753814Z  1170 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.5753962Z       |                                                     ^~~~
2025-12-04T12:35:04.5755657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.5757143Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1174:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5757349Z  1174 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.5757889Z       |                                                     ^~~~
2025-12-04T12:35:04.5760826Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.5762216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1178:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5762475Z  1178 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.5762605Z       |                                                     ^~~~
2025-12-04T12:35:04.5766033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&, const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&, const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&)’:
2025-12-04T12:35:04.5767259Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1207:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5767428Z  1207 |     auto msb_one = _mm512_set1_epi8(0xFF);
2025-12-04T12:35:04.5767543Z       |                                     ^~~~
2025-12-04T12:35:04.5771323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.5773850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1220:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5774058Z  1220 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.5774289Z       |                                                     ^~~~
2025-12-04T12:35:04.5779134Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.5782764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1224:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5784003Z  1224 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.5784130Z       |                                                     ^~~~
2025-12-04T12:35:04.5791057Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.5793260Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1228:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5793483Z  1228 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.5793609Z       |                                                     ^~~~
2025-12-04T12:35:04.5797214Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.5798858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1232:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.5799071Z  1232 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.5799209Z       |                                                     ^~~~
2025-12-04T12:35:04.5803164Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = true; T = signed char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.5804696Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2074:27:   required from here
2025-12-04T12:35:04.5806698Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5807441Z  1866 |       0x80,
2025-12-04T12:35:04.5807556Z       |       ^~~~
2025-12-04T12:35:04.5808902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5809014Z  1868 |       0x80,
2025-12-04T12:35:04.5809114Z       |       ^~~~
2025-12-04T12:35:04.5810405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5810518Z  1870 |       0x80,
2025-12-04T12:35:04.5811265Z       |       ^~~~
2025-12-04T12:35:04.5812543Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5812647Z  1872 |       0x80,
2025-12-04T12:35:04.5812739Z       |       ^~~~
2025-12-04T12:35:04.5815302Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5815406Z  1874 |       0x80,
2025-12-04T12:35:04.5815630Z       |       ^~~~
2025-12-04T12:35:04.5817639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5817739Z  1876 |       0x80,
2025-12-04T12:35:04.5817846Z       |       ^~~~
2025-12-04T12:35:04.5819477Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5819699Z  1878 |       0x80,
2025-12-04T12:35:04.5819809Z       |       ^~~~
2025-12-04T12:35:04.5821683Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5821802Z  1880 |       0x80,
2025-12-04T12:35:04.5821897Z       |       ^~~~
2025-12-04T12:35:04.5823746Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5823862Z  1882 |       0x80,
2025-12-04T12:35:04.5824037Z       |       ^~~~
2025-12-04T12:35:04.5825234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5825353Z  1884 |       0x80,
2025-12-04T12:35:04.5825447Z       |       ^~~~
2025-12-04T12:35:04.5826780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5826951Z  1886 |       0x80,
2025-12-04T12:35:04.5827046Z       |       ^~~~
2025-12-04T12:35:04.5829475Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5829578Z  1888 |       0x80,
2025-12-04T12:35:04.5829674Z       |       ^~~~
2025-12-04T12:35:04.5832419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5832519Z  1890 |       0x80,
2025-12-04T12:35:04.5832629Z       |       ^~~~
2025-12-04T12:35:04.5834022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5834127Z  1892 |       0x80,
2025-12-04T12:35:04.5834463Z       |       ^~~~
2025-12-04T12:35:04.5835834Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5835947Z  1894 |       0x80,
2025-12-04T12:35:04.5836042Z       |       ^~~~
2025-12-04T12:35:04.5837294Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5837404Z  1896 |       0x80,
2025-12-04T12:35:04.5839181Z       |       ^~~~
2025-12-04T12:35:04.5840437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5840556Z  1898 |       0x80,
2025-12-04T12:35:04.5840650Z       |       ^~~~
2025-12-04T12:35:04.5842762Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5842963Z  1900 |       0x80,
2025-12-04T12:35:04.5843067Z       |       ^~~~
2025-12-04T12:35:04.5844809Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5844907Z  1902 |       0x80,
2025-12-04T12:35:04.5845023Z       |       ^~~~
2025-12-04T12:35:04.5847700Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5847809Z  1904 |       0x80,
2025-12-04T12:35:04.5848545Z       |       ^~~~
2025-12-04T12:35:04.5850519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5850633Z  1906 |       0x80,
2025-12-04T12:35:04.5850733Z       |       ^~~~
2025-12-04T12:35:04.5851939Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5852050Z  1908 |       0x80,
2025-12-04T12:35:04.5852151Z       |       ^~~~
2025-12-04T12:35:04.5853322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5853502Z  1910 |       0x80,
2025-12-04T12:35:04.5853593Z       |       ^~~~
2025-12-04T12:35:04.5854788Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5854883Z  1912 |       0x80,
2025-12-04T12:35:04.5855048Z       |       ^~~~
2025-12-04T12:35:04.5856233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5856402Z  1914 |       0x80,
2025-12-04T12:35:04.5856538Z       |       ^~~~
2025-12-04T12:35:04.5857740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5857843Z  1916 |       0x80,
2025-12-04T12:35:04.5857949Z       |       ^~~~
2025-12-04T12:35:04.5859121Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5859215Z  1918 |       0x80,
2025-12-04T12:35:04.5859335Z       |       ^~~~
2025-12-04T12:35:04.5860504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5860614Z  1920 |       0x80,
2025-12-04T12:35:04.5860712Z       |       ^~~~
2025-12-04T12:35:04.5861883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5861998Z  1922 |       0x80,
2025-12-04T12:35:04.5862091Z       |       ^~~~
2025-12-04T12:35:04.5863275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5863382Z  1924 |       0x80,
2025-12-04T12:35:04.5863486Z       |       ^~~~
2025-12-04T12:35:04.5864676Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5864774Z  1926 |       0x80,
2025-12-04T12:35:04.5864871Z       |       ^~~~
2025-12-04T12:35:04.5866060Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5866166Z  1928 |       0x80);
2025-12-04T12:35:04.5866272Z       |       ^~~~
2025-12-04T12:35:04.5867443Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5867544Z  1930 |       0x80,
2025-12-04T12:35:04.5867658Z       |       ^~~~
2025-12-04T12:35:04.5868830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5868930Z  1932 |       0x80,
2025-12-04T12:35:04.5869039Z       |       ^~~~
2025-12-04T12:35:04.5870206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5870353Z  1934 |       0x80,
2025-12-04T12:35:04.5881158Z       |       ^~~~
2025-12-04T12:35:04.5882586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5882989Z  1936 |       0x80,
2025-12-04T12:35:04.5883088Z       |       ^~~~
2025-12-04T12:35:04.5884497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5884694Z  1938 |       0x80,
2025-12-04T12:35:04.5885061Z       |       ^~~~
2025-12-04T12:35:04.5887073Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5887190Z  1940 |       0x80,
2025-12-04T12:35:04.5887285Z       |       ^~~~
2025-12-04T12:35:04.5888588Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5888702Z  1942 |       0x80,
2025-12-04T12:35:04.5888796Z       |       ^~~~
2025-12-04T12:35:04.5890740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5890943Z  1944 |       0x80,
2025-12-04T12:35:04.5891055Z       |       ^~~~
2025-12-04T12:35:04.5894292Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5894489Z  1946 |       0x80,
2025-12-04T12:35:04.5894601Z       |       ^~~~
2025-12-04T12:35:04.5898562Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5899445Z  1948 |       0x80,
2025-12-04T12:35:04.5899544Z       |       ^~~~
2025-12-04T12:35:04.5901673Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5901887Z  1950 |       0x80,
2025-12-04T12:35:04.5901983Z       |       ^~~~
2025-12-04T12:35:04.5903172Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5903294Z  1952 |       0x80,
2025-12-04T12:35:04.5903388Z       |       ^~~~
2025-12-04T12:35:04.5904589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5904692Z  1954 |       0x80,
2025-12-04T12:35:04.5904786Z       |       ^~~~
2025-12-04T12:35:04.5905983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5906080Z  1956 |       0x80,
2025-12-04T12:35:04.5906188Z       |       ^~~~
2025-12-04T12:35:04.5907360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5907636Z  1958 |       0x80,
2025-12-04T12:35:04.5907741Z       |       ^~~~
2025-12-04T12:35:04.5908967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5909096Z  1960 |       0x80,
2025-12-04T12:35:04.5909207Z       |       ^~~~
2025-12-04T12:35:04.5910424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5910536Z  1962 |       0x80,
2025-12-04T12:35:04.5910630Z       |       ^~~~
2025-12-04T12:35:04.5911808Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5911924Z  1964 |       0x80,
2025-12-04T12:35:04.5912016Z       |       ^~~~
2025-12-04T12:35:04.5913208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5913308Z  1966 |       0x80,
2025-12-04T12:35:04.5913397Z       |       ^~~~
2025-12-04T12:35:04.5914586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5914678Z  1968 |       0x80,
2025-12-04T12:35:04.5914769Z       |       ^~~~
2025-12-04T12:35:04.5915952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5916052Z  1970 |       0x80,
2025-12-04T12:35:04.5916154Z       |       ^~~~
2025-12-04T12:35:04.5917334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5917432Z  1972 |       0x80,
2025-12-04T12:35:04.5917537Z       |       ^~~~
2025-12-04T12:35:04.5918718Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5918823Z  1974 |       0x80,
2025-12-04T12:35:04.5921384Z       |       ^~~~
2025-12-04T12:35:04.5922600Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5922719Z  1976 |       0x80,
2025-12-04T12:35:04.5922810Z       |       ^~~~
2025-12-04T12:35:04.5923983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5924095Z  1978 |       0x80,
2025-12-04T12:35:04.5924185Z       |       ^~~~
2025-12-04T12:35:04.5925374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5925469Z  1980 |       0x80,
2025-12-04T12:35:04.5925696Z       |       ^~~~
2025-12-04T12:35:04.5926897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5927059Z  1982 |       0x80,
2025-12-04T12:35:04.5927169Z       |       ^~~~
2025-12-04T12:35:04.5928384Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5928518Z  1984 |       0x80,
2025-12-04T12:35:04.5928624Z       |       ^~~~
2025-12-04T12:35:04.5929837Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5929934Z  1986 |       0x80,
2025-12-04T12:35:04.5930043Z       |       ^~~~
2025-12-04T12:35:04.5931219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5931333Z  1988 |       0x80,
2025-12-04T12:35:04.5931427Z       |       ^~~~
2025-12-04T12:35:04.5932606Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5932719Z  1990 |       0x80,
2025-12-04T12:35:04.5932811Z       |       ^~~~
2025-12-04T12:35:04.5934004Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5934098Z  1992 |       0x80,
2025-12-04T12:35:04.5934191Z       |       ^~~~
2025-12-04T12:35:04.5935394Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.5935560Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.5935678Z       |                                      ^~~~~~
2025-12-04T12:35:04.5938214Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = true; T = unsigned char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.5938805Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2081:27:   required from here
2025-12-04T12:35:04.5940014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5940156Z  1866 |       0x80,
2025-12-04T12:35:04.5940265Z       |       ^~~~
2025-12-04T12:35:04.5941451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5941583Z  1868 |       0x80,
2025-12-04T12:35:04.5941688Z       |       ^~~~
2025-12-04T12:35:04.5942869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5942980Z  1870 |       0x80,
2025-12-04T12:35:04.5943072Z       |       ^~~~
2025-12-04T12:35:04.5944255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5944365Z  1872 |       0x80,
2025-12-04T12:35:04.5944458Z       |       ^~~~
2025-12-04T12:35:04.5945665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5945779Z  1874 |       0x80,
2025-12-04T12:35:04.5945872Z       |       ^~~~
2025-12-04T12:35:04.5947093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5947188Z  1876 |       0x80,
2025-12-04T12:35:04.5947280Z       |       ^~~~
2025-12-04T12:35:04.5948470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5948563Z  1878 |       0x80,
2025-12-04T12:35:04.5948654Z       |       ^~~~
2025-12-04T12:35:04.5949847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5949946Z  1880 |       0x80,
2025-12-04T12:35:04.5950048Z       |       ^~~~
2025-12-04T12:35:04.5953812Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5953912Z  1882 |       0x80,
2025-12-04T12:35:04.5954018Z       |       ^~~~
2025-12-04T12:35:04.5955216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5955325Z  1884 |       0x80,
2025-12-04T12:35:04.5955417Z       |       ^~~~
2025-12-04T12:35:04.5957370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5957489Z  1886 |       0x80,
2025-12-04T12:35:04.5957586Z       |       ^~~~
2025-12-04T12:35:04.5959422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5959537Z  1888 |       0x80,
2025-12-04T12:35:04.5959633Z       |       ^~~~
2025-12-04T12:35:04.5961541Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5961647Z  1890 |       0x80,
2025-12-04T12:35:04.5961739Z       |       ^~~~
2025-12-04T12:35:04.5964368Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5964553Z  1892 |       0x80,
2025-12-04T12:35:04.5964659Z       |       ^~~~
2025-12-04T12:35:04.5966438Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5966537Z  1894 |       0x80,
2025-12-04T12:35:04.5966643Z       |       ^~~~
2025-12-04T12:35:04.5968023Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5968123Z  1896 |       0x80,
2025-12-04T12:35:04.5968227Z       |       ^~~~
2025-12-04T12:35:04.5969467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5969582Z  1898 |       0x80,
2025-12-04T12:35:04.5969675Z       |       ^~~~
2025-12-04T12:35:04.5971241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5971359Z  1900 |       0x80,
2025-12-04T12:35:04.5971459Z       |       ^~~~
2025-12-04T12:35:04.5972796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5972893Z  1902 |       0x80,
2025-12-04T12:35:04.5972983Z       |       ^~~~
2025-12-04T12:35:04.5974181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5974283Z  1904 |       0x80,
2025-12-04T12:35:04.5974375Z       |       ^~~~
2025-12-04T12:35:04.5975566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5975661Z  1906 |       0x80,
2025-12-04T12:35:04.5975772Z       |       ^~~~
2025-12-04T12:35:04.5977016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5977113Z  1908 |       0x80,
2025-12-04T12:35:04.5977221Z       |       ^~~~
2025-12-04T12:35:04.5978412Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5978527Z  1910 |       0x80,
2025-12-04T12:35:04.5978621Z       |       ^~~~
2025-12-04T12:35:04.5979800Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5979906Z  1912 |       0x80,
2025-12-04T12:35:04.5980074Z       |       ^~~~
2025-12-04T12:35:04.5981256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5981361Z  1914 |       0x80,
2025-12-04T12:35:04.5981453Z       |       ^~~~
2025-12-04T12:35:04.5982705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5982848Z  1916 |       0x80,
2025-12-04T12:35:04.5982939Z       |       ^~~~
2025-12-04T12:35:04.5984164Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5984263Z  1918 |       0x80,
2025-12-04T12:35:04.5984366Z       |       ^~~~
2025-12-04T12:35:04.5985554Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5985649Z  1920 |       0x80,
2025-12-04T12:35:04.5985754Z       |       ^~~~
2025-12-04T12:35:04.5986931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5987032Z  1922 |       0x80,
2025-12-04T12:35:04.5987140Z       |       ^~~~
2025-12-04T12:35:04.5988318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5988429Z  1924 |       0x80,
2025-12-04T12:35:04.5988522Z       |       ^~~~
2025-12-04T12:35:04.5989694Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5989800Z  1926 |       0x80,
2025-12-04T12:35:04.5989893Z       |       ^~~~
2025-12-04T12:35:04.5991093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5991193Z  1928 |       0x80);
2025-12-04T12:35:04.5991291Z       |       ^~~~
2025-12-04T12:35:04.5992500Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5992601Z  1930 |       0x80,
2025-12-04T12:35:04.5992695Z       |       ^~~~
2025-12-04T12:35:04.5993887Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5993982Z  1932 |       0x80,
2025-12-04T12:35:04.5994089Z       |       ^~~~
2025-12-04T12:35:04.5995277Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5995375Z  1934 |       0x80,
2025-12-04T12:35:04.5995480Z       |       ^~~~
2025-12-04T12:35:04.5996661Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5996810Z  1936 |       0x80,
2025-12-04T12:35:04.5996907Z       |       ^~~~
2025-12-04T12:35:04.5998083Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5998194Z  1938 |       0x80,
2025-12-04T12:35:04.5998287Z       |       ^~~~
2025-12-04T12:35:04.5999536Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.5999649Z  1940 |       0x80,
2025-12-04T12:35:04.5999743Z       |       ^~~~
2025-12-04T12:35:04.6000974Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6001077Z  1942 |       0x80,
2025-12-04T12:35:04.6001171Z       |       ^~~~
2025-12-04T12:35:04.6002372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6002468Z  1944 |       0x80,
2025-12-04T12:35:04.6002566Z       |       ^~~~
2025-12-04T12:35:04.6003757Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6003853Z  1946 |       0x80,
2025-12-04T12:35:04.6003969Z       |       ^~~~
2025-12-04T12:35:04.6005137Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6005239Z  1948 |       0x80,
2025-12-04T12:35:04.6005347Z       |       ^~~~
2025-12-04T12:35:04.6006529Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6006639Z  1950 |       0x80,
2025-12-04T12:35:04.6006739Z       |       ^~~~
2025-12-04T12:35:04.6007921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6008031Z  1952 |       0x80,
2025-12-04T12:35:04.6008131Z       |       ^~~~
2025-12-04T12:35:04.6009303Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6009421Z  1954 |       0x80,
2025-12-04T12:35:04.6009512Z       |       ^~~~
2025-12-04T12:35:04.6010700Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6010793Z  1956 |       0x80,
2025-12-04T12:35:04.6010900Z       |       ^~~~
2025-12-04T12:35:04.6012094Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6012190Z  1958 |       0x80,
2025-12-04T12:35:04.6012302Z       |       ^~~~
2025-12-04T12:35:04.6013473Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6013604Z  1960 |       0x80,
2025-12-04T12:35:04.6013710Z       |       ^~~~
2025-12-04T12:35:04.6014885Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6014977Z  1962 |       0x80,
2025-12-04T12:35:04.6015153Z       |       ^~~~
2025-12-04T12:35:04.6016391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6016502Z  1964 |       0x80,
2025-12-04T12:35:04.6016635Z       |       ^~~~
2025-12-04T12:35:04.6017819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6017933Z  1966 |       0x80,
2025-12-04T12:35:04.6018024Z       |       ^~~~
2025-12-04T12:35:04.6019225Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6019325Z  1968 |       0x80,
2025-12-04T12:35:04.6019422Z       |       ^~~~
2025-12-04T12:35:04.6020607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6020706Z  1970 |       0x80,
2025-12-04T12:35:04.6020801Z       |       ^~~~
2025-12-04T12:35:04.6021988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6022089Z  1972 |       0x80,
2025-12-04T12:35:04.6022194Z       |       ^~~~
2025-12-04T12:35:04.6023370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6023468Z  1974 |       0x80,
2025-12-04T12:35:04.6023580Z       |       ^~~~
2025-12-04T12:35:04.6024753Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6024866Z  1976 |       0x80,
2025-12-04T12:35:04.6024959Z       |       ^~~~
2025-12-04T12:35:04.6026129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6026245Z  1978 |       0x80,
2025-12-04T12:35:04.6026338Z       |       ^~~~
2025-12-04T12:35:04.6027521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6027642Z  1980 |       0x80,
2025-12-04T12:35:04.6027735Z       |       ^~~~
2025-12-04T12:35:04.6028915Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6029014Z  1982 |       0x80,
2025-12-04T12:35:04.6029106Z       |       ^~~~
2025-12-04T12:35:04.6030289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6030446Z  1984 |       0x80,
2025-12-04T12:35:04.6030537Z       |       ^~~~
2025-12-04T12:35:04.6031719Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6031882Z  1986 |       0x80,
2025-12-04T12:35:04.6031987Z       |       ^~~~
2025-12-04T12:35:04.6033164Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6033299Z  1988 |       0x80,
2025-12-04T12:35:04.6033411Z       |       ^~~~
2025-12-04T12:35:04.6034590Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6034705Z  1990 |       0x80,
2025-12-04T12:35:04.6034797Z       |       ^~~~
2025-12-04T12:35:04.6035969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6036090Z  1992 |       0x80,
2025-12-04T12:35:04.6036182Z       |       ^~~~
2025-12-04T12:35:04.6037361Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.6037542Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.6037660Z       |                                      ^~~~~~
2025-12-04T12:35:04.6040078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = false; T = signed char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.6040673Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2109:28:   required from here
2025-12-04T12:35:04.6041885Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6041982Z  1866 |       0x80,
2025-12-04T12:35:04.6042076Z       |       ^~~~
2025-12-04T12:35:04.6043268Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6043407Z  1868 |       0x80,
2025-12-04T12:35:04.6043512Z       |       ^~~~
2025-12-04T12:35:04.6044693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6044825Z  1870 |       0x80,
2025-12-04T12:35:04.6044929Z       |       ^~~~
2025-12-04T12:35:04.6046107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6046200Z  1872 |       0x80,
2025-12-04T12:35:04.6046308Z       |       ^~~~
2025-12-04T12:35:04.6047479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6047593Z  1874 |       0x80,
2025-12-04T12:35:04.6047684Z       |       ^~~~
2025-12-04T12:35:04.6048891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6049004Z  1876 |       0x80,
2025-12-04T12:35:04.6049098Z       |       ^~~~
2025-12-04T12:35:04.6050321Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6050416Z  1878 |       0x80,
2025-12-04T12:35:04.6050507Z       |       ^~~~
2025-12-04T12:35:04.6051699Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6051798Z  1880 |       0x80,
2025-12-04T12:35:04.6051889Z       |       ^~~~
2025-12-04T12:35:04.6053079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6053178Z  1882 |       0x80,
2025-12-04T12:35:04.6053283Z       |       ^~~~
2025-12-04T12:35:04.6054468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6054562Z  1884 |       0x80,
2025-12-04T12:35:04.6054667Z       |       ^~~~
2025-12-04T12:35:04.6055834Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6055947Z  1886 |       0x80,
2025-12-04T12:35:04.6056038Z       |       ^~~~
2025-12-04T12:35:04.6057290Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6057408Z  1888 |       0x80,
2025-12-04T12:35:04.6057498Z       |       ^~~~
2025-12-04T12:35:04.6058692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6058800Z  1890 |       0x80,
2025-12-04T12:35:04.6058890Z       |       ^~~~
2025-12-04T12:35:04.6060073Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6060215Z  1892 |       0x80,
2025-12-04T12:35:04.6060306Z       |       ^~~~
2025-12-04T12:35:04.6061505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6061637Z  1894 |       0x80,
2025-12-04T12:35:04.6061741Z       |       ^~~~
2025-12-04T12:35:04.6062928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6063020Z  1896 |       0x80,
2025-12-04T12:35:04.6063125Z       |       ^~~~
2025-12-04T12:35:04.6064301Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6064402Z  1898 |       0x80,
2025-12-04T12:35:04.6064507Z       |       ^~~~
2025-12-04T12:35:04.6065711Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6065830Z  1900 |       0x80,
2025-12-04T12:35:04.6065923Z       |       ^~~~
2025-12-04T12:35:04.6067132Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6067244Z  1902 |       0x80,
2025-12-04T12:35:04.6067339Z       |       ^~~~
2025-12-04T12:35:04.6068530Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6068631Z  1904 |       0x80,
2025-12-04T12:35:04.6068722Z       |       ^~~~
2025-12-04T12:35:04.6069914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6070014Z  1906 |       0x80,
2025-12-04T12:35:04.6070106Z       |       ^~~~
2025-12-04T12:35:04.6071502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6071598Z  1908 |       0x80,
2025-12-04T12:35:04.6071703Z       |       ^~~~
2025-12-04T12:35:04.6072877Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6072978Z  1910 |       0x80,
2025-12-04T12:35:04.6073081Z       |       ^~~~
2025-12-04T12:35:04.6074261Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6074373Z  1912 |       0x80,
2025-12-04T12:35:04.6074466Z       |       ^~~~
2025-12-04T12:35:04.6075648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6075757Z  1914 |       0x80,
2025-12-04T12:35:04.6075851Z       |       ^~~~
2025-12-04T12:35:04.6077110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6077220Z  1916 |       0x80,
2025-12-04T12:35:04.6077314Z       |       ^~~~
2025-12-04T12:35:04.6078569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6078711Z  1918 |       0x80,
2025-12-04T12:35:04.6078805Z       |       ^~~~
2025-12-04T12:35:04.6080064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6080162Z  1920 |       0x80,
2025-12-04T12:35:04.6080255Z       |       ^~~~
2025-12-04T12:35:04.6081452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6081549Z  1922 |       0x80,
2025-12-04T12:35:04.6081657Z       |       ^~~~
2025-12-04T12:35:04.6082841Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6082940Z  1924 |       0x80,
2025-12-04T12:35:04.6083050Z       |       ^~~~
2025-12-04T12:35:04.6084234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6084344Z  1926 |       0x80,
2025-12-04T12:35:04.6084438Z       |       ^~~~
2025-12-04T12:35:04.6085625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6085740Z  1928 |       0x80);
2025-12-04T12:35:04.6085834Z       |       ^~~~
2025-12-04T12:35:04.6087016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6087134Z  1930 |       0x80,
2025-12-04T12:35:04.6087230Z       |       ^~~~
2025-12-04T12:35:04.6088427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6088523Z  1932 |       0x80,
2025-12-04T12:35:04.6088616Z       |       ^~~~
2025-12-04T12:35:04.6089813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6089910Z  1934 |       0x80,
2025-12-04T12:35:04.6090019Z       |       ^~~~
2025-12-04T12:35:04.6091204Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6091306Z  1936 |       0x80,
2025-12-04T12:35:04.6091417Z       |       ^~~~
2025-12-04T12:35:04.6092597Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6092691Z  1938 |       0x80,
2025-12-04T12:35:04.6092845Z       |       ^~~~
2025-12-04T12:35:04.6094017Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6094128Z  1940 |       0x80,
2025-12-04T12:35:04.6094219Z       |       ^~~~
2025-12-04T12:35:04.6095434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6095578Z  1942 |       0x80,
2025-12-04T12:35:04.6095671Z       |       ^~~~
2025-12-04T12:35:04.6097021Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6097123Z  1944 |       0x80,
2025-12-04T12:35:04.6097222Z       |       ^~~~
2025-12-04T12:35:04.6098417Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6098513Z  1946 |       0x80,
2025-12-04T12:35:04.6098606Z       |       ^~~~
2025-12-04T12:35:04.6099798Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6099897Z  1948 |       0x80,
2025-12-04T12:35:04.6100003Z       |       ^~~~
2025-12-04T12:35:04.6101181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6101275Z  1950 |       0x80,
2025-12-04T12:35:04.6101386Z       |       ^~~~
2025-12-04T12:35:04.6102558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6102666Z  1952 |       0x80,
2025-12-04T12:35:04.6102758Z       |       ^~~~
2025-12-04T12:35:04.6103932Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6104046Z  1954 |       0x80,
2025-12-04T12:35:04.6104138Z       |       ^~~~
2025-12-04T12:35:04.6105315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6105422Z  1956 |       0x80,
2025-12-04T12:35:04.6105519Z       |       ^~~~
2025-12-04T12:35:04.6106706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6106800Z  1958 |       0x80,
2025-12-04T12:35:04.6106896Z       |       ^~~~
2025-12-04T12:35:04.6108093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6108192Z  1960 |       0x80,
2025-12-04T12:35:04.6108302Z       |       ^~~~
2025-12-04T12:35:04.6109488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6109582Z  1962 |       0x80,
2025-12-04T12:35:04.6109730Z       |       ^~~~
2025-12-04T12:35:04.6110905Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6111002Z  1964 |       0x80,
2025-12-04T12:35:04.6111112Z       |       ^~~~
2025-12-04T12:35:04.6112332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6112476Z  1966 |       0x80,
2025-12-04T12:35:04.6112569Z       |       ^~~~
2025-12-04T12:35:04.6113780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6113895Z  1968 |       0x80,
2025-12-04T12:35:04.6113988Z       |       ^~~~
2025-12-04T12:35:04.6115175Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6115269Z  1970 |       0x80,
2025-12-04T12:35:04.6115360Z       |       ^~~~
2025-12-04T12:35:04.6116559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6116654Z  1972 |       0x80,
2025-12-04T12:35:04.6116746Z       |       ^~~~
2025-12-04T12:35:04.6117940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6118040Z  1974 |       0x80,
2025-12-04T12:35:04.6118144Z       |       ^~~~
2025-12-04T12:35:04.6119315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6119408Z  1976 |       0x80,
2025-12-04T12:35:04.6119515Z       |       ^~~~
2025-12-04T12:35:04.6120700Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6120808Z  1978 |       0x80,
2025-12-04T12:35:04.6120902Z       |       ^~~~
2025-12-04T12:35:04.6122081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6122194Z  1980 |       0x80,
2025-12-04T12:35:04.6122287Z       |       ^~~~
2025-12-04T12:35:04.6123462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6123570Z  1982 |       0x80,
2025-12-04T12:35:04.6123663Z       |       ^~~~
2025-12-04T12:35:04.6124865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6124961Z  1984 |       0x80,
2025-12-04T12:35:04.6125054Z       |       ^~~~
2025-12-04T12:35:04.6126245Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6126402Z  1986 |       0x80,
2025-12-04T12:35:04.6126495Z       |       ^~~~
2025-12-04T12:35:04.6127683Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6127779Z  1988 |       0x80,
2025-12-04T12:35:04.6127924Z       |       ^~~~
2025-12-04T12:35:04.6129130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6129225Z  1990 |       0x80,
2025-12-04T12:35:04.6129370Z       |       ^~~~
2025-12-04T12:35:04.6130552Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6130667Z  1992 |       0x80,
2025-12-04T12:35:04.6130758Z       |       ^~~~
2025-12-04T12:35:04.6131940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.6132116Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.6132239Z       |                                      ^~~~~~
2025-12-04T12:35:04.6134678Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = false; T = unsigned char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.6135263Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2116:28:   required from here
2025-12-04T12:35:04.6136524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6136641Z  1866 |       0x80,
2025-12-04T12:35:04.6136739Z       |       ^~~~
2025-12-04T12:35:04.6137934Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6138034Z  1868 |       0x80,
2025-12-04T12:35:04.6138127Z       |       ^~~~
2025-12-04T12:35:04.6139314Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6139465Z  1870 |       0x80,
2025-12-04T12:35:04.6139573Z       |       ^~~~
2025-12-04T12:35:04.6140749Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6140848Z  1872 |       0x80,
2025-12-04T12:35:04.6140991Z       |       ^~~~
2025-12-04T12:35:04.6142169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6142269Z  1874 |       0x80,
2025-12-04T12:35:04.6142374Z       |       ^~~~
2025-12-04T12:35:04.6143545Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6143658Z  1876 |       0x80,
2025-12-04T12:35:04.6143749Z       |       ^~~~
2025-12-04T12:35:04.6144927Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6145096Z  1878 |       0x80,
2025-12-04T12:35:04.6145189Z       |       ^~~~
2025-12-04T12:35:04.6146373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6146503Z  1880 |       0x80,
2025-12-04T12:35:04.6146596Z       |       ^~~~
2025-12-04T12:35:04.6147781Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6147881Z  1882 |       0x80,
2025-12-04T12:35:04.6147972Z       |       ^~~~
2025-12-04T12:35:04.6149158Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6149263Z  1884 |       0x80,
2025-12-04T12:35:04.6149372Z       |       ^~~~
2025-12-04T12:35:04.6150545Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6150644Z  1886 |       0x80,
2025-12-04T12:35:04.6150752Z       |       ^~~~
2025-12-04T12:35:04.6151922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6152037Z  1888 |       0x80,
2025-12-04T12:35:04.6152130Z       |       ^~~~
2025-12-04T12:35:04.6153297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6153418Z  1890 |       0x80,
2025-12-04T12:35:04.6153514Z       |       ^~~~
2025-12-04T12:35:04.6154693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6154807Z  1892 |       0x80,
2025-12-04T12:35:04.6154901Z       |       ^~~~
2025-12-04T12:35:04.6156091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6156226Z  1894 |       0x80,
2025-12-04T12:35:04.6156319Z       |       ^~~~
2025-12-04T12:35:04.6157519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6157650Z  1896 |       0x80,
2025-12-04T12:35:04.6157741Z       |       ^~~~
2025-12-04T12:35:04.6158931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6159026Z  1898 |       0x80,
2025-12-04T12:35:04.6159134Z       |       ^~~~
2025-12-04T12:35:04.6160309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6160409Z  1900 |       0x80,
2025-12-04T12:35:04.6160513Z       |       ^~~~
2025-12-04T12:35:04.6161737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6161851Z  1902 |       0x80,
2025-12-04T12:35:04.6161942Z       |       ^~~~
2025-12-04T12:35:04.6163237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6163347Z  1904 |       0x80,
2025-12-04T12:35:04.6163439Z       |       ^~~~
2025-12-04T12:35:04.6164627Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6164729Z  1906 |       0x80,
2025-12-04T12:35:04.6164822Z       |       ^~~~
2025-12-04T12:35:04.6166022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6166124Z  1908 |       0x80,
2025-12-04T12:35:04.6166217Z       |       ^~~~
2025-12-04T12:35:04.6167423Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6167519Z  1910 |       0x80,
2025-12-04T12:35:04.6167629Z       |       ^~~~
2025-12-04T12:35:04.6168801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6168902Z  1912 |       0x80,
2025-12-04T12:35:04.6169011Z       |       ^~~~
2025-12-04T12:35:04.6170193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6170313Z  1914 |       0x80,
2025-12-04T12:35:04.6170407Z       |       ^~~~
2025-12-04T12:35:04.6171770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6171883Z  1916 |       0x80,
2025-12-04T12:35:04.6171976Z       |       ^~~~
2025-12-04T12:35:04.6173169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6173365Z  1918 |       0x80,
2025-12-04T12:35:04.6173460Z       |       ^~~~
2025-12-04T12:35:04.6174704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6174845Z  1920 |       0x80,
2025-12-04T12:35:04.6174937Z       |       ^~~~
2025-12-04T12:35:04.6176175Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6176273Z  1922 |       0x80,
2025-12-04T12:35:04.6176435Z       |       ^~~~
2025-12-04T12:35:04.6177640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6177743Z  1924 |       0x80,
2025-12-04T12:35:04.6177852Z       |       ^~~~
2025-12-04T12:35:04.6179030Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6179131Z  1926 |       0x80,
2025-12-04T12:35:04.6179243Z       |       ^~~~
2025-12-04T12:35:04.6180426Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6180541Z  1928 |       0x80);
2025-12-04T12:35:04.6180636Z       |       ^~~~
2025-12-04T12:35:04.6181819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6181938Z  1930 |       0x80,
2025-12-04T12:35:04.6182031Z       |       ^~~~
2025-12-04T12:35:04.6183204Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6183319Z  1932 |       0x80,
2025-12-04T12:35:04.6183410Z       |       ^~~~
2025-12-04T12:35:04.6184594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6184693Z  1934 |       0x80,
2025-12-04T12:35:04.6184787Z       |       ^~~~
2025-12-04T12:35:04.6185979Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6186082Z  1936 |       0x80,
2025-12-04T12:35:04.6186190Z       |       ^~~~
2025-12-04T12:35:04.6187367Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6187471Z  1938 |       0x80,
2025-12-04T12:35:04.6187578Z       |       ^~~~
2025-12-04T12:35:04.6188757Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6188851Z  1940 |       0x80,
2025-12-04T12:35:04.6188959Z       |       ^~~~
2025-12-04T12:35:04.6190130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6190287Z  1942 |       0x80,
2025-12-04T12:35:04.6190379Z       |       ^~~~
2025-12-04T12:35:04.6191584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6191727Z  1944 |       0x80,
2025-12-04T12:35:04.6191820Z       |       ^~~~
2025-12-04T12:35:04.6193040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6193137Z  1946 |       0x80,
2025-12-04T12:35:04.6193231Z       |       ^~~~
2025-12-04T12:35:04.6194420Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6194522Z  1948 |       0x80,
2025-12-04T12:35:04.6194616Z       |       ^~~~
2025-12-04T12:35:04.6195806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6195906Z  1950 |       0x80,
2025-12-04T12:35:04.6196014Z       |       ^~~~
2025-12-04T12:35:04.6197186Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6197280Z  1952 |       0x80,
2025-12-04T12:35:04.6197389Z       |       ^~~~
2025-12-04T12:35:04.6198576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6198684Z  1954 |       0x80,
2025-12-04T12:35:04.6198777Z       |       ^~~~
2025-12-04T12:35:04.6199952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6200070Z  1956 |       0x80,
2025-12-04T12:35:04.6200164Z       |       ^~~~
2025-12-04T12:35:04.6201344Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6201450Z  1958 |       0x80,
2025-12-04T12:35:04.6201543Z       |       ^~~~
2025-12-04T12:35:04.6202730Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6202825Z  1960 |       0x80,
2025-12-04T12:35:04.6202917Z       |       ^~~~
2025-12-04T12:35:04.6204110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6204208Z  1962 |       0x80,
2025-12-04T12:35:04.6204301Z       |       ^~~~
2025-12-04T12:35:04.6205491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6205585Z  1964 |       0x80,
2025-12-04T12:35:04.6205691Z       |       ^~~~
2025-12-04T12:35:04.6206909Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6207003Z  1966 |       0x80,
2025-12-04T12:35:04.6207110Z       |       ^~~~
2025-12-04T12:35:04.6208342Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6208482Z  1968 |       0x80,
2025-12-04T12:35:04.6208575Z       |       ^~~~
2025-12-04T12:35:04.6209784Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6209892Z  1970 |       0x80,
2025-12-04T12:35:04.6209985Z       |       ^~~~
2025-12-04T12:35:04.6211169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6211279Z  1972 |       0x80,
2025-12-04T12:35:04.6211372Z       |       ^~~~
2025-12-04T12:35:04.6212564Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6212666Z  1974 |       0x80,
2025-12-04T12:35:04.6212760Z       |       ^~~~
2025-12-04T12:35:04.6213956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6214051Z  1976 |       0x80,
2025-12-04T12:35:04.6214156Z       |       ^~~~
2025-12-04T12:35:04.6215336Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6215431Z  1978 |       0x80,
2025-12-04T12:35:04.6215539Z       |       ^~~~
2025-12-04T12:35:04.6216787Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6216891Z  1980 |       0x80,
2025-12-04T12:35:04.6217061Z       |       ^~~~
2025-12-04T12:35:04.6218247Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6218357Z  1982 |       0x80,
2025-12-04T12:35:04.6218457Z       |       ^~~~
2025-12-04T12:35:04.6219631Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6219745Z  1984 |       0x80,
2025-12-04T12:35:04.6219837Z       |       ^~~~
2025-12-04T12:35:04.6221031Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6221173Z  1986 |       0x80,
2025-12-04T12:35:04.6221264Z       |       ^~~~
2025-12-04T12:35:04.6222454Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6222548Z  1988 |       0x80,
2025-12-04T12:35:04.6222678Z       |       ^~~~
2025-12-04T12:35:04.6223865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6223959Z  1990 |       0x80,
2025-12-04T12:35:04.6224065Z       |       ^~~~
2025-12-04T12:35:04.6225268Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6225367Z  1992 |       0x80,
2025-12-04T12:35:04.6225473Z       |       ^~~~
2025-12-04T12:35:04.6226695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.6226868Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.6226992Z       |                                      ^~~~~~
2025-12-04T12:35:04.6227498Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:16,
2025-12-04T12:35:04.6227881Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.6228331Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.6228754Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.6229223Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.6229864Z                  from /tmp/hdMAUq/tmp2gno_q_y/data/aotinductor/model1/cji6fcfpjxr5ad3oypbruxr5r26niflgwwkmd5rthzuhxclq6uis.wrapper.cpp:751:
2025-12-04T12:35:04.6231347Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = signed char; int64_t = long int]’:
2025-12-04T12:35:04.6231924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:696:31:   required from here
2025-12-04T12:35:04.6233126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6233245Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6233343Z       |       ^~~~
2025-12-04T12:35:04.6234535Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6234700Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6234812Z       |             ^~~~
2025-12-04T12:35:04.6235996Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6236116Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6236266Z       |                   ^~~~
2025-12-04T12:35:04.6237452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6237587Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6237688Z       |                         ^~~~
2025-12-04T12:35:04.6238867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6239002Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6239095Z       |       ^~~~
2025-12-04T12:35:04.6240321Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6240456Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6240555Z       |             ^~~~
2025-12-04T12:35:04.6241781Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6241895Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6241996Z       |                   ^~~~
2025-12-04T12:35:04.6243200Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6243313Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6243431Z       |                         ^~~~
2025-12-04T12:35:04.6244616Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6244734Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6244839Z       |       ^~~~
2025-12-04T12:35:04.6246024Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6246149Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6246254Z       |             ^~~~
2025-12-04T12:35:04.6247435Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6247560Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6247660Z       |                   ^~~~
2025-12-04T12:35:04.6248852Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6248977Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6249089Z       |                         ^~~~
2025-12-04T12:35:04.6250276Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6250427Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6250521Z       |       ^~~~
2025-12-04T12:35:04.6251719Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6251837Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6251985Z       |             ^~~~
2025-12-04T12:35:04.6253167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6253286Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6253397Z       |                   ^~~~
2025-12-04T12:35:04.6254574Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6254692Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6254819Z       |                         ^~~~
2025-12-04T12:35:04.6256033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6256168Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6256263Z       |       ^~~~
2025-12-04T12:35:04.6257572Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6257704Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6257805Z       |             ^~~~
2025-12-04T12:35:04.6259006Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6259120Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6259221Z       |                   ^~~~
2025-12-04T12:35:04.6260425Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6260548Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6260667Z       |                         ^~~~
2025-12-04T12:35:04.6261854Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6261969Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6262087Z       |       ^~~~
2025-12-04T12:35:04.6263269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6263381Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6263495Z       |             ^~~~
2025-12-04T12:35:04.6264692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6264820Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6264928Z       |                   ^~~~
2025-12-04T12:35:04.6266111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6266301Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6266403Z       |                         ^~~~
2025-12-04T12:35:04.6267602Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6267786Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6267881Z       |       ^~~~
2025-12-04T12:35:04.6269081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6269233Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6269346Z       |             ^~~~
2025-12-04T12:35:04.6270525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6270646Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6270759Z       |                   ^~~~
2025-12-04T12:35:04.6272130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6272248Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6272367Z       |                         ^~~~
2025-12-04T12:35:04.6273554Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6273683Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6273778Z       |       ^~~~
2025-12-04T12:35:04.6274971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6275104Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6275201Z       |             ^~~~
2025-12-04T12:35:04.6276405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6276522Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6276628Z       |                   ^~~~
2025-12-04T12:35:04.6277831Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6277943Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6278050Z       |                         ^~~~
2025-12-04T12:35:04.6279236Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6279347Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6279461Z       |       ^~~~
2025-12-04T12:35:04.6280645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6280756Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6280872Z       |             ^~~~
2025-12-04T12:35:04.6282053Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6282255Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6282356Z       |                   ^~~~
2025-12-04T12:35:04.6283543Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6283759Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6283862Z       |                         ^~~~
2025-12-04T12:35:04.6285112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6285231Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6285324Z       |       ^~~~
2025-12-04T12:35:04.6286528Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6286646Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6286741Z       |             ^~~~
2025-12-04T12:35:04.6287942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6288063Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6288179Z       |                   ^~~~
2025-12-04T12:35:04.6289361Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6289473Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6289590Z       |                         ^~~~
2025-12-04T12:35:04.6290774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6290900Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6290994Z       |       ^~~~
2025-12-04T12:35:04.6292178Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6292310Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6292406Z       |             ^~~~
2025-12-04T12:35:04.6293604Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6293716Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6293823Z       |                   ^~~~
2025-12-04T12:35:04.6295013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6295127Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6295236Z       |                         ^~~~
2025-12-04T12:35:04.6296494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6296609Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6296723Z       |       ^~~~
2025-12-04T12:35:04.6297910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6298071Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6298182Z       |             ^~~~
2025-12-04T12:35:04.6299364Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6299565Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6299668Z       |                   ^~~~
2025-12-04T12:35:04.6300892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6301021Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6301123Z       |                         ^~~~
2025-12-04T12:35:04.6302604Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = unsigned char; int64_t = long int]’:
2025-12-04T12:35:04.6303188Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:933:31:   required from here
2025-12-04T12:35:04.6304379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6304510Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6304605Z       |       ^~~~
2025-12-04T12:35:04.6305811Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6305926Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6306029Z       |             ^~~~
2025-12-04T12:35:04.6307226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6307338Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6307437Z       |                   ^~~~
2025-12-04T12:35:04.6308641Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6308754Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6308876Z       |                         ^~~~
2025-12-04T12:35:04.6310052Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6310169Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6310274Z       |       ^~~~
2025-12-04T12:35:04.6311451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6311585Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6311686Z       |             ^~~~
2025-12-04T12:35:04.6312869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6313001Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6313102Z       |                   ^~~~
2025-12-04T12:35:04.6314291Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6314447Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6314547Z       |                         ^~~~
2025-12-04T12:35:04.6315775Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6315922Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6316015Z       |       ^~~~
2025-12-04T12:35:04.6317259Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6317371Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6317482Z       |             ^~~~
2025-12-04T12:35:04.6318676Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6318794Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6318908Z       |                   ^~~~
2025-12-04T12:35:04.6320100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6320231Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6320332Z       |                         ^~~~
2025-12-04T12:35:04.6321519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6321643Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6321743Z       |       ^~~~
2025-12-04T12:35:04.6322933Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6323044Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6323140Z       |             ^~~~
2025-12-04T12:35:04.6324340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6324452Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6324558Z       |                   ^~~~
2025-12-04T12:35:04.6325756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6325913Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6326025Z       |                         ^~~~
2025-12-04T12:35:04.6327208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6327325Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6327469Z       |       ^~~~
2025-12-04T12:35:04.6328650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6328782Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6328879Z       |             ^~~~
2025-12-04T12:35:04.6330056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6330188Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6330290Z       |                   ^~~~
2025-12-04T12:35:04.6331520Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6331641Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6331742Z       |                         ^~~~
2025-12-04T12:35:04.6332975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6333092Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6333188Z       |       ^~~~
2025-12-04T12:35:04.6334397Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6334511Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6334622Z       |             ^~~~
2025-12-04T12:35:04.6335817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6335936Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6336049Z       |                   ^~~~
2025-12-04T12:35:04.6337326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6337454Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6337561Z       |                         ^~~~
2025-12-04T12:35:04.6338739Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6338865Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6338960Z       |       ^~~~
2025-12-04T12:35:04.6340165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6340278Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6340382Z       |             ^~~~
2025-12-04T12:35:04.6341575Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6341734Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6341835Z       |                   ^~~~
2025-12-04T12:35:04.6343033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6343152Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6343302Z       |                         ^~~~
2025-12-04T12:35:04.6344490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6344602Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6344711Z       |       ^~~~
2025-12-04T12:35:04.6345885Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6346019Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6346117Z       |             ^~~~
2025-12-04T12:35:04.6347338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6347471Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6347573Z       |                   ^~~~
2025-12-04T12:35:04.6348803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6348918Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6349021Z       |                         ^~~~
2025-12-04T12:35:04.6350221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6350336Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6350431Z       |       ^~~~
2025-12-04T12:35:04.6351640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6351761Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6351874Z       |             ^~~~
2025-12-04T12:35:04.6353062Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6353177Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6353302Z       |                   ^~~~
2025-12-04T12:35:04.6354479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6354608Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6354716Z       |                         ^~~~
2025-12-04T12:35:04.6355902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6356035Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6356136Z       |       ^~~~
2025-12-04T12:35:04.6357335Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6357486Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6357584Z       |             ^~~~
2025-12-04T12:35:04.6358776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6358927Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6359061Z       |                   ^~~~
2025-12-04T12:35:04.6360262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6360422Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6360538Z       |                         ^~~~
2025-12-04T12:35:04.6361719Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6361837Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6361944Z       |       ^~~~
2025-12-04T12:35:04.6363130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6363260Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6363355Z       |             ^~~~
2025-12-04T12:35:04.6364538Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6364669Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6364772Z       |                   ^~~~
2025-12-04T12:35:04.6365956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6366082Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6366183Z       |                         ^~~~
2025-12-04T12:35:04.6367374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6367491Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6367584Z       |       ^~~~
2025-12-04T12:35:04.6368780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6368890Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6369008Z       |             ^~~~
2025-12-04T12:35:04.6370188Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6370300Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6370419Z       |                   ^~~~
2025-12-04T12:35:04.6371784Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6371914Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6372023Z       |                         ^~~~
2025-12-04T12:35:04.6372562Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_float.h:12,
2025-12-04T12:35:04.6373011Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:11,
2025-12-04T12:35:04.6373455Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.6373912Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.6374357Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.6374883Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.6375583Z                  from /tmp/GPm4bX/tmp2gno_q_y/data/aotinductor/model2/cyss5jazqjsvp5s2t3ihlofugodyzirark5aiimqjwirn4hylxbp.wrapper.cpp:656:
2025-12-04T12:35:04.6376184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/sleef.h:192:10: warning: ISO C++ prohibits anonymous structs [-Wpedantic]
2025-12-04T12:35:04.6376354Z   192 |   struct {
2025-12-04T12:35:04.6376452Z       |          ^
2025-12-04T12:35:04.6376950Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15,
2025-12-04T12:35:04.6377329Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.6377770Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.6378177Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.6378656Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.6379294Z                  from /tmp/GPm4bX/tmp2gno_q_y/data/aotinductor/model2/cyss5jazqjsvp5s2t3ihlofugodyzirark5aiimqjwirn4hylxbp.wrapper.cpp:656:
2025-12-04T12:35:04.6381548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<short int>&, const at::vec::CPU_CAPABILITY::Vectorized<short int>&, const at::vec::CPU_CAPABILITY::Vectorized<short int>&)’:
2025-12-04T12:35:04.6382742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:544:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.6382908Z   544 |     auto msb_one = _mm512_set1_epi16(0xFFFF);
2025-12-04T12:35:04.6383032Z       |                                      ^~~~~~
2025-12-04T12:35:04.6383533Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15,
2025-12-04T12:35:04.6383920Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.6384357Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.6384770Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.6385236Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.6385882Z                  from /tmp/GPm4bX/tmp2gno_q_y/data/aotinductor/model2/cyss5jazqjsvp5s2t3ihlofugodyzirark5aiimqjwirn4hylxbp.wrapper.cpp:656:
2025-12-04T12:35:04.6387529Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.6388738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:697:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.6388958Z   697 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.6389120Z       |                                                      ^~~~~~
2025-12-04T12:35:04.6390812Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.6391972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:701:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.6392185Z   701 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.6392326Z       |                                                      ^~~~~~
2025-12-04T12:35:04.6393945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.6395287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:705:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.6395500Z   705 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.6395627Z       |                                                      ^~~~~~
2025-12-04T12:35:04.6397264Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.6398435Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:709:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.6398660Z   709 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.6398790Z       |                                                      ^~~~~~
2025-12-04T12:35:04.6400416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator>(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.6401583Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:713:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.6401788Z   713 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.6401928Z       |                                                      ^~~~~~
2025-12-04T12:35:04.6403554Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator>=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.6404731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:717:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.6404933Z   717 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.6405099Z       |                                                      ^~~~~~
2025-12-04T12:35:04.6407429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&, const at::vec::CPU_CAPABILITY::Vectorized<signed char>&, const at::vec::CPU_CAPABILITY::Vectorized<signed char>&)’:
2025-12-04T12:35:04.6408772Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1153:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6408935Z  1153 |     auto msb_one = _mm512_set1_epi8(0xFF);
2025-12-04T12:35:04.6409051Z       |                                     ^~~~
2025-12-04T12:35:04.6410737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.6411937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1166:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6412162Z  1166 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.6412289Z       |                                                     ^~~~
2025-12-04T12:35:04.6413946Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.6415163Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1170:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6415367Z  1170 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.6415514Z       |                                                     ^~~~
2025-12-04T12:35:04.6417233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.6418429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1174:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6418653Z  1174 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.6418775Z       |                                                     ^~~~
2025-12-04T12:35:04.6420451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.6421648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1178:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6421865Z  1178 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.6421988Z       |                                                     ^~~~
2025-12-04T12:35:04.6424357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&, const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&, const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&)’:
2025-12-04T12:35:04.6425633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1207:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6425780Z  1207 |     auto msb_one = _mm512_set1_epi8(0xFF);
2025-12-04T12:35:04.6425939Z       |                                     ^~~~
2025-12-04T12:35:04.6427633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.6428843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1220:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6429051Z  1220 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.6429180Z       |                                                     ^~~~
2025-12-04T12:35:04.6430887Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.6432074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1224:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6432292Z  1224 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.6432415Z       |                                                     ^~~~
2025-12-04T12:35:04.6434106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.6435315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1228:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6435516Z  1228 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.6435658Z       |                                                     ^~~~
2025-12-04T12:35:04.6437349Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.6438555Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1232:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6438760Z  1232 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.6438888Z       |                                                     ^~~~
2025-12-04T12:35:04.6441252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = true; T = signed char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.6441923Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2074:27:   required from here
2025-12-04T12:35:04.6443158Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6443258Z  1866 |       0x80,
2025-12-04T12:35:04.6443402Z       |       ^~~~
2025-12-04T12:35:04.6444583Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6444686Z  1868 |       0x80,
2025-12-04T12:35:04.6444795Z       |       ^~~~
2025-12-04T12:35:04.6445971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6446068Z  1870 |       0x80,
2025-12-04T12:35:04.6446192Z       |       ^~~~
2025-12-04T12:35:04.6447365Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6447474Z  1872 |       0x80,
2025-12-04T12:35:04.6447572Z       |       ^~~~
2025-12-04T12:35:04.6448746Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6448860Z  1874 |       0x80,
2025-12-04T12:35:04.6448954Z       |       ^~~~
2025-12-04T12:35:04.6450136Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6450232Z  1876 |       0x80,
2025-12-04T12:35:04.6450335Z       |       ^~~~
2025-12-04T12:35:04.6451521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6451615Z  1878 |       0x80,
2025-12-04T12:35:04.6451710Z       |       ^~~~
2025-12-04T12:35:04.6452899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6453028Z  1880 |       0x80,
2025-12-04T12:35:04.6453135Z       |       ^~~~
2025-12-04T12:35:04.6454307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6454401Z  1882 |       0x80,
2025-12-04T12:35:04.6454545Z       |       ^~~~
2025-12-04T12:35:04.6455717Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6455822Z  1884 |       0x80,
2025-12-04T12:35:04.6455920Z       |       ^~~~
2025-12-04T12:35:04.6457172Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6457292Z  1886 |       0x80,
2025-12-04T12:35:04.6457385Z       |       ^~~~
2025-12-04T12:35:04.6458565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6458721Z  1888 |       0x80,
2025-12-04T12:35:04.6458822Z       |       ^~~~
2025-12-04T12:35:04.6460010Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6460138Z  1890 |       0x80,
2025-12-04T12:35:04.6460234Z       |       ^~~~
2025-12-04T12:35:04.6461423Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6461524Z  1892 |       0x80,
2025-12-04T12:35:04.6461631Z       |       ^~~~
2025-12-04T12:35:04.6462801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6462908Z  1894 |       0x80,
2025-12-04T12:35:04.6463017Z       |       ^~~~
2025-12-04T12:35:04.6464190Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6464292Z  1896 |       0x80,
2025-12-04T12:35:04.6464407Z       |       ^~~~
2025-12-04T12:35:04.6465580Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6465697Z  1898 |       0x80,
2025-12-04T12:35:04.6465790Z       |       ^~~~
2025-12-04T12:35:04.6466969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6467096Z  1900 |       0x80,
2025-12-04T12:35:04.6467191Z       |       ^~~~
2025-12-04T12:35:04.6468372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6468473Z  1902 |       0x80,
2025-12-04T12:35:04.6468568Z       |       ^~~~
2025-12-04T12:35:04.6469759Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6469892Z  1904 |       0x80,
2025-12-04T12:35:04.6469984Z       |       ^~~~
2025-12-04T12:35:04.6471390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6471577Z  1906 |       0x80,
2025-12-04T12:35:04.6471685Z       |       ^~~~
2025-12-04T12:35:04.6472867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6472971Z  1908 |       0x80,
2025-12-04T12:35:04.6473080Z       |       ^~~~
2025-12-04T12:35:04.6474262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6474378Z  1910 |       0x80,
2025-12-04T12:35:04.6474473Z       |       ^~~~
2025-12-04T12:35:04.6475696Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6475817Z  1912 |       0x80,
2025-12-04T12:35:04.6475913Z       |       ^~~~
2025-12-04T12:35:04.6477137Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6477251Z  1914 |       0x80,
2025-12-04T12:35:04.6477345Z       |       ^~~~
2025-12-04T12:35:04.6478537Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6478640Z  1916 |       0x80,
2025-12-04T12:35:04.6478731Z       |       ^~~~
2025-12-04T12:35:04.6479926Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6480027Z  1918 |       0x80,
2025-12-04T12:35:04.6480117Z       |       ^~~~
2025-12-04T12:35:04.6481303Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6481398Z  1920 |       0x80,
2025-12-04T12:35:04.6481503Z       |       ^~~~
2025-12-04T12:35:04.6482672Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6482772Z  1922 |       0x80,
2025-12-04T12:35:04.6482879Z       |       ^~~~
2025-12-04T12:35:04.6484066Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6484179Z  1924 |       0x80,
2025-12-04T12:35:04.6484271Z       |       ^~~~
2025-12-04T12:35:04.6485447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6485553Z  1926 |       0x80,
2025-12-04T12:35:04.6485642Z       |       ^~~~
2025-12-04T12:35:04.6486816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6486980Z  1928 |       0x80);
2025-12-04T12:35:04.6487073Z       |       ^~~~
2025-12-04T12:35:04.6488309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6488483Z  1930 |       0x80,
2025-12-04T12:35:04.6488581Z       |       ^~~~
2025-12-04T12:35:04.6489815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6489942Z  1932 |       0x80,
2025-12-04T12:35:04.6490067Z       |       ^~~~
2025-12-04T12:35:04.6491253Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6491398Z  1934 |       0x80,
2025-12-04T12:35:04.6491501Z       |       ^~~~
2025-12-04T12:35:04.6492710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6492826Z  1936 |       0x80,
2025-12-04T12:35:04.6492917Z       |       ^~~~
2025-12-04T12:35:04.6494117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6494211Z  1938 |       0x80,
2025-12-04T12:35:04.6494302Z       |       ^~~~
2025-12-04T12:35:04.6495489Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6495589Z  1940 |       0x80,
2025-12-04T12:35:04.6495693Z       |       ^~~~
2025-12-04T12:35:04.6496940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6497043Z  1942 |       0x80,
2025-12-04T12:35:04.6497151Z       |       ^~~~
2025-12-04T12:35:04.6498340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6498436Z  1944 |       0x80,
2025-12-04T12:35:04.6498548Z       |       ^~~~
2025-12-04T12:35:04.6499722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6499835Z  1946 |       0x80,
2025-12-04T12:35:04.6499928Z       |       ^~~~
2025-12-04T12:35:04.6501109Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6501221Z  1948 |       0x80,
2025-12-04T12:35:04.6501313Z       |       ^~~~
2025-12-04T12:35:04.6502508Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6502602Z  1950 |       0x80,
2025-12-04T12:35:04.6502693Z       |       ^~~~
2025-12-04T12:35:04.6503879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6504045Z  1952 |       0x80,
2025-12-04T12:35:04.6504136Z       |       ^~~~
2025-12-04T12:35:04.6505365Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6505493Z  1954 |       0x80,
2025-12-04T12:35:04.6505600Z       |       ^~~~
2025-12-04T12:35:04.6506814Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6506908Z  1956 |       0x80,
2025-12-04T12:35:04.6507016Z       |       ^~~~
2025-12-04T12:35:04.6508193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6508308Z  1958 |       0x80,
2025-12-04T12:35:04.6508400Z       |       ^~~~
2025-12-04T12:35:04.6509574Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6509686Z  1960 |       0x80,
2025-12-04T12:35:04.6509778Z       |       ^~~~
2025-12-04T12:35:04.6510952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6511059Z  1962 |       0x80,
2025-12-04T12:35:04.6511153Z       |       ^~~~
2025-12-04T12:35:04.6512350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6512454Z  1964 |       0x80,
2025-12-04T12:35:04.6512544Z       |       ^~~~
2025-12-04T12:35:04.6513740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6513839Z  1966 |       0x80,
2025-12-04T12:35:04.6513932Z       |       ^~~~
2025-12-04T12:35:04.6515125Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6515220Z  1968 |       0x80,
2025-12-04T12:35:04.6515325Z       |       ^~~~
2025-12-04T12:35:04.6516503Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6516607Z  1970 |       0x80,
2025-12-04T12:35:04.6516713Z       |       ^~~~
2025-12-04T12:35:04.6517905Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6518015Z  1972 |       0x80,
2025-12-04T12:35:04.6518106Z       |       ^~~~
2025-12-04T12:35:04.6519293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6519402Z  1974 |       0x80,
2025-12-04T12:35:04.6519493Z       |       ^~~~
2025-12-04T12:35:04.6520720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6520826Z  1976 |       0x80,
2025-12-04T12:35:04.6520919Z       |       ^~~~
2025-12-04T12:35:04.6522144Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6522273Z  1978 |       0x80,
2025-12-04T12:35:04.6522365Z       |       ^~~~
2025-12-04T12:35:04.6523594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6523692Z  1980 |       0x80,
2025-12-04T12:35:04.6523796Z       |       ^~~~
2025-12-04T12:35:04.6524984Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6525079Z  1982 |       0x80,
2025-12-04T12:35:04.6525187Z       |       ^~~~
2025-12-04T12:35:04.6526382Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6526485Z  1984 |       0x80,
2025-12-04T12:35:04.6526589Z       |       ^~~~
2025-12-04T12:35:04.6527789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6527900Z  1986 |       0x80,
2025-12-04T12:35:04.6527996Z       |       ^~~~
2025-12-04T12:35:04.6529190Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6529300Z  1988 |       0x80,
2025-12-04T12:35:04.6529394Z       |       ^~~~
2025-12-04T12:35:04.6530595Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6530697Z  1990 |       0x80,
2025-12-04T12:35:04.6530789Z       |       ^~~~
2025-12-04T12:35:04.6531982Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6532075Z  1992 |       0x80,
2025-12-04T12:35:04.6532169Z       |       ^~~~
2025-12-04T12:35:04.6533388Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.6533550Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.6533680Z       |                                      ^~~~~~
2025-12-04T12:35:04.6536105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = true; T = unsigned char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.6536773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2081:27:   required from here
2025-12-04T12:35:04.6538048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6538143Z  1866 |       0x80,
2025-12-04T12:35:04.6538251Z       |       ^~~~
2025-12-04T12:35:04.6539477Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6539619Z  1868 |       0x80,
2025-12-04T12:35:04.6539711Z       |       ^~~~
2025-12-04T12:35:04.6540935Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6541046Z  1870 |       0x80,
2025-12-04T12:35:04.6541146Z       |       ^~~~
2025-12-04T12:35:04.6542328Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6542438Z  1872 |       0x80,
2025-12-04T12:35:04.6542530Z       |       ^~~~
2025-12-04T12:35:04.6543730Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6543832Z  1874 |       0x80,
2025-12-04T12:35:04.6543923Z       |       ^~~~
2025-12-04T12:35:04.6545127Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6545222Z  1876 |       0x80,
2025-12-04T12:35:04.6545333Z       |       ^~~~
2025-12-04T12:35:04.6546522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6546618Z  1878 |       0x80,
2025-12-04T12:35:04.6546726Z       |       ^~~~
2025-12-04T12:35:04.6547912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6548013Z  1880 |       0x80,
2025-12-04T12:35:04.6548119Z       |       ^~~~
2025-12-04T12:35:04.6549305Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6549458Z  1882 |       0x80,
2025-12-04T12:35:04.6549552Z       |       ^~~~
2025-12-04T12:35:04.6550727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6550836Z  1884 |       0x80,
2025-12-04T12:35:04.6550930Z       |       ^~~~
2025-12-04T12:35:04.6552165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6552261Z  1886 |       0x80,
2025-12-04T12:35:04.6552355Z       |       ^~~~
2025-12-04T12:35:04.6553550Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6553647Z  1888 |       0x80,
2025-12-04T12:35:04.6553741Z       |       ^~~~
2025-12-04T12:35:04.6554929Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6555022Z  1890 |       0x80,
2025-12-04T12:35:04.6555130Z       |       ^~~~
2025-12-04T12:35:04.6556352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6556448Z  1892 |       0x80,
2025-12-04T12:35:04.6556556Z       |       ^~~~
2025-12-04T12:35:04.6557798Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6557915Z  1894 |       0x80,
2025-12-04T12:35:04.6558006Z       |       ^~~~
2025-12-04T12:35:04.6559184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6559290Z  1896 |       0x80,
2025-12-04T12:35:04.6559382Z       |       ^~~~
2025-12-04T12:35:04.6560567Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6560673Z  1898 |       0x80,
2025-12-04T12:35:04.6560764Z       |       ^~~~
2025-12-04T12:35:04.6561958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6562066Z  1900 |       0x80,
2025-12-04T12:35:04.6562157Z       |       ^~~~
2025-12-04T12:35:04.6563343Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6563439Z  1902 |       0x80,
2025-12-04T12:35:04.6563535Z       |       ^~~~
2025-12-04T12:35:04.6564730Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6564824Z  1904 |       0x80,
2025-12-04T12:35:04.6564934Z       |       ^~~~
2025-12-04T12:35:04.6566100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6566237Z  1906 |       0x80,
2025-12-04T12:35:04.6566340Z       |       ^~~~
2025-12-04T12:35:04.6567509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6567614Z  1908 |       0x80,
2025-12-04T12:35:04.6567712Z       |       ^~~~
2025-12-04T12:35:04.6568926Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6569032Z  1910 |       0x80,
2025-12-04T12:35:04.6569129Z       |       ^~~~
2025-12-04T12:35:04.6570304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6570419Z  1912 |       0x80,
2025-12-04T12:35:04.6570510Z       |       ^~~~
2025-12-04T12:35:04.6571892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6571985Z  1914 |       0x80,
2025-12-04T12:35:04.6572179Z       |       ^~~~
2025-12-04T12:35:04.6573382Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6573478Z  1916 |       0x80,
2025-12-04T12:35:04.6573632Z       |       ^~~~
2025-12-04T12:35:04.6574816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6574918Z  1918 |       0x80,
2025-12-04T12:35:04.6575025Z       |       ^~~~
2025-12-04T12:35:04.6576204Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6576352Z  1920 |       0x80,
2025-12-04T12:35:04.6576479Z       |       ^~~~
2025-12-04T12:35:04.6577667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6577773Z  1922 |       0x80,
2025-12-04T12:35:04.6577870Z       |       ^~~~
2025-12-04T12:35:04.6579042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6579158Z  1924 |       0x80,
2025-12-04T12:35:04.6579249Z       |       ^~~~
2025-12-04T12:35:04.6580437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6580539Z  1926 |       0x80,
2025-12-04T12:35:04.6580636Z       |       ^~~~
2025-12-04T12:35:04.6581826Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6581929Z  1928 |       0x80);
2025-12-04T12:35:04.6582022Z       |       ^~~~
2025-12-04T12:35:04.6583208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6583362Z  1930 |       0x80,
2025-12-04T12:35:04.6583467Z       |       ^~~~
2025-12-04T12:35:04.6584645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6584785Z  1932 |       0x80,
2025-12-04T12:35:04.6584943Z       |       ^~~~
2025-12-04T12:35:04.6586126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6586271Z  1934 |       0x80,
2025-12-04T12:35:04.6586366Z       |       ^~~~
2025-12-04T12:35:04.6587543Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6587665Z  1936 |       0x80,
2025-12-04T12:35:04.6587758Z       |       ^~~~
2025-12-04T12:35:04.6588937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6589057Z  1938 |       0x80,
2025-12-04T12:35:04.6589149Z       |       ^~~~
2025-12-04T12:35:04.6590342Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6590441Z  1940 |       0x80,
2025-12-04T12:35:04.6590532Z       |       ^~~~
2025-12-04T12:35:04.6591724Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6591822Z  1942 |       0x80,
2025-12-04T12:35:04.6591916Z       |       ^~~~
2025-12-04T12:35:04.6593103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6593209Z  1944 |       0x80,
2025-12-04T12:35:04.6593314Z       |       ^~~~
2025-12-04T12:35:04.6594490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6594589Z  1946 |       0x80,
2025-12-04T12:35:04.6594695Z       |       ^~~~
2025-12-04T12:35:04.6595867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6595979Z  1948 |       0x80,
2025-12-04T12:35:04.6596071Z       |       ^~~~
2025-12-04T12:35:04.6597243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6597362Z  1950 |       0x80,
2025-12-04T12:35:04.6597453Z       |       ^~~~
2025-12-04T12:35:04.6598632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6598732Z  1952 |       0x80,
2025-12-04T12:35:04.6598823Z       |       ^~~~
2025-12-04T12:35:04.6600008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6600150Z  1954 |       0x80,
2025-12-04T12:35:04.6600240Z       |       ^~~~
2025-12-04T12:35:04.6601469Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6601596Z  1956 |       0x80,
2025-12-04T12:35:04.6601700Z       |       ^~~~
2025-12-04T12:35:04.6602911Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6603005Z  1958 |       0x80,
2025-12-04T12:35:04.6603108Z       |       ^~~~
2025-12-04T12:35:04.6604279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6604391Z  1960 |       0x80,
2025-12-04T12:35:04.6604483Z       |       ^~~~
2025-12-04T12:35:04.6605662Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6605773Z  1962 |       0x80,
2025-12-04T12:35:04.6605866Z       |       ^~~~
2025-12-04T12:35:04.6607044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6607152Z  1964 |       0x80,
2025-12-04T12:35:04.6607243Z       |       ^~~~
2025-12-04T12:35:04.6608425Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6608524Z  1966 |       0x80,
2025-12-04T12:35:04.6608615Z       |       ^~~~
2025-12-04T12:35:04.6609802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6609902Z  1968 |       0x80,
2025-12-04T12:35:04.6609992Z       |       ^~~~
2025-12-04T12:35:04.6611184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6611278Z  1970 |       0x80,
2025-12-04T12:35:04.6611382Z       |       ^~~~
2025-12-04T12:35:04.6612558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6612658Z  1972 |       0x80,
2025-12-04T12:35:04.6612766Z       |       ^~~~
2025-12-04T12:35:04.6613946Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6614057Z  1974 |       0x80,
2025-12-04T12:35:04.6614149Z       |       ^~~~
2025-12-04T12:35:04.6615329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6615439Z  1976 |       0x80,
2025-12-04T12:35:04.6615533Z       |       ^~~~
2025-12-04T12:35:04.6616773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6616942Z  1978 |       0x80,
2025-12-04T12:35:04.6617037Z       |       ^~~~
2025-12-04T12:35:04.6618266Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6618412Z  1980 |       0x80,
2025-12-04T12:35:04.6618510Z       |       ^~~~
2025-12-04T12:35:04.6619738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6619834Z  1982 |       0x80,
2025-12-04T12:35:04.6619939Z       |       ^~~~
2025-12-04T12:35:04.6621114Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6621217Z  1984 |       0x80,
2025-12-04T12:35:04.6621323Z       |       ^~~~
2025-12-04T12:35:04.6622504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6622603Z  1986 |       0x80,
2025-12-04T12:35:04.6622710Z       |       ^~~~
2025-12-04T12:35:04.6623891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6623997Z  1988 |       0x80,
2025-12-04T12:35:04.6624090Z       |       ^~~~
2025-12-04T12:35:04.6625270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6625384Z  1990 |       0x80,
2025-12-04T12:35:04.6625474Z       |       ^~~~
2025-12-04T12:35:04.6626664Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6626763Z  1992 |       0x80,
2025-12-04T12:35:04.6626856Z       |       ^~~~
2025-12-04T12:35:04.6628073Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.6628234Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.6628351Z       |                                      ^~~~~~
2025-12-04T12:35:04.6630764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = false; T = signed char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.6631352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2109:28:   required from here
2025-12-04T12:35:04.6632556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6632654Z  1866 |       0x80,
2025-12-04T12:35:04.6632765Z       |       ^~~~
2025-12-04T12:35:04.6633942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6634082Z  1868 |       0x80,
2025-12-04T12:35:04.6634191Z       |       ^~~~
2025-12-04T12:35:04.6635418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6635562Z  1870 |       0x80,
2025-12-04T12:35:04.6635659Z       |       ^~~~
2025-12-04T12:35:04.6636868Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6636979Z  1872 |       0x80,
2025-12-04T12:35:04.6637073Z       |       ^~~~
2025-12-04T12:35:04.6638254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6638364Z  1874 |       0x80,
2025-12-04T12:35:04.6638455Z       |       ^~~~
2025-12-04T12:35:04.6639646Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6639748Z  1876 |       0x80,
2025-12-04T12:35:04.6639838Z       |       ^~~~
2025-12-04T12:35:04.6641037Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6641131Z  1878 |       0x80,
2025-12-04T12:35:04.6641241Z       |       ^~~~
2025-12-04T12:35:04.6642420Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6642512Z  1880 |       0x80,
2025-12-04T12:35:04.6642619Z       |       ^~~~
2025-12-04T12:35:04.6643796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6643899Z  1882 |       0x80,
2025-12-04T12:35:04.6644007Z       |       ^~~~
2025-12-04T12:35:04.6645196Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6645307Z  1884 |       0x80,
2025-12-04T12:35:04.6645397Z       |       ^~~~
2025-12-04T12:35:04.6646615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6646720Z  1886 |       0x80,
2025-12-04T12:35:04.6646810Z       |       ^~~~
2025-12-04T12:35:04.6648004Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6648132Z  1888 |       0x80,
2025-12-04T12:35:04.6648223Z       |       ^~~~
2025-12-04T12:35:04.6649429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6649524Z  1890 |       0x80,
2025-12-04T12:35:04.6649614Z       |       ^~~~
2025-12-04T12:35:04.6650813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6650908Z  1892 |       0x80,
2025-12-04T12:35:04.6651010Z       |       ^~~~
2025-12-04T12:35:04.6652228Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6652333Z  1894 |       0x80,
2025-12-04T12:35:04.6652438Z       |       ^~~~
2025-12-04T12:35:04.6653663Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6653770Z  1896 |       0x80,
2025-12-04T12:35:04.6653865Z       |       ^~~~
2025-12-04T12:35:04.6655040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6655146Z  1898 |       0x80,
2025-12-04T12:35:04.6655238Z       |       ^~~~
2025-12-04T12:35:04.6656494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6656609Z  1900 |       0x80,
2025-12-04T12:35:04.6656699Z       |       ^~~~
2025-12-04T12:35:04.6657899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6657994Z  1902 |       0x80,
2025-12-04T12:35:04.6658092Z       |       ^~~~
2025-12-04T12:35:04.6659286Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6659382Z  1904 |       0x80,
2025-12-04T12:35:04.6659487Z       |       ^~~~
2025-12-04T12:35:04.6660662Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6660763Z  1906 |       0x80,
2025-12-04T12:35:04.6660875Z       |       ^~~~
2025-12-04T12:35:04.6662052Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6662146Z  1908 |       0x80,
2025-12-04T12:35:04.6662300Z       |       ^~~~
2025-12-04T12:35:04.6663489Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6663597Z  1910 |       0x80,
2025-12-04T12:35:04.6663688Z       |       ^~~~
2025-12-04T12:35:04.6664870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6665095Z  1912 |       0x80,
2025-12-04T12:35:04.6665188Z       |       ^~~~
2025-12-04T12:35:04.6666392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6666491Z  1914 |       0x80,
2025-12-04T12:35:04.6666591Z       |       ^~~~
2025-12-04T12:35:04.6667778Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6667874Z  1916 |       0x80,
2025-12-04T12:35:04.6667969Z       |       ^~~~
2025-12-04T12:35:04.6669243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6669346Z  1918 |       0x80,
2025-12-04T12:35:04.6669451Z       |       ^~~~
2025-12-04T12:35:04.6670663Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6670762Z  1920 |       0x80,
2025-12-04T12:35:04.6670877Z       |       ^~~~
2025-12-04T12:35:04.6672200Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6672312Z  1922 |       0x80,
2025-12-04T12:35:04.6672405Z       |       ^~~~
2025-12-04T12:35:04.6673586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6673703Z  1924 |       0x80,
2025-12-04T12:35:04.6673795Z       |       ^~~~
2025-12-04T12:35:04.6674981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6675100Z  1926 |       0x80,
2025-12-04T12:35:04.6675192Z       |       ^~~~
2025-12-04T12:35:04.6676375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6676471Z  1928 |       0x80);
2025-12-04T12:35:04.6676563Z       |       ^~~~
2025-12-04T12:35:04.6677760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6677855Z  1930 |       0x80,
2025-12-04T12:35:04.6677968Z       |       ^~~~
2025-12-04T12:35:04.6679147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6679328Z  1932 |       0x80,
2025-12-04T12:35:04.6679432Z       |       ^~~~
2025-12-04T12:35:04.6680620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6680715Z  1934 |       0x80,
2025-12-04T12:35:04.6680822Z       |       ^~~~
2025-12-04T12:35:04.6682118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6682228Z  1936 |       0x80,
2025-12-04T12:35:04.6682319Z       |       ^~~~
2025-12-04T12:35:04.6683542Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6683656Z  1938 |       0x80,
2025-12-04T12:35:04.6683748Z       |       ^~~~
2025-12-04T12:35:04.6684945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6685038Z  1940 |       0x80,
2025-12-04T12:35:04.6685129Z       |       ^~~~
2025-12-04T12:35:04.6686324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6686417Z  1942 |       0x80,
2025-12-04T12:35:04.6686508Z       |       ^~~~
2025-12-04T12:35:04.6687706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6687807Z  1944 |       0x80,
2025-12-04T12:35:04.6687912Z       |       ^~~~
2025-12-04T12:35:04.6689087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6689180Z  1946 |       0x80,
2025-12-04T12:35:04.6689291Z       |       ^~~~
2025-12-04T12:35:04.6690469Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6690579Z  1948 |       0x80,
2025-12-04T12:35:04.6690680Z       |       ^~~~
2025-12-04T12:35:04.6691859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6691971Z  1950 |       0x80,
2025-12-04T12:35:04.6692063Z       |       ^~~~
2025-12-04T12:35:04.6693239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6693347Z  1952 |       0x80,
2025-12-04T12:35:04.6693454Z       |       ^~~~
2025-12-04T12:35:04.6694642Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6694736Z  1954 |       0x80,
2025-12-04T12:35:04.6694834Z       |       ^~~~
2025-12-04T12:35:04.6696025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6696158Z  1956 |       0x80,
2025-12-04T12:35:04.6696261Z       |       ^~~~
2025-12-04T12:35:04.6697541Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6697635Z  1958 |       0x80,
2025-12-04T12:35:04.6697822Z       |       ^~~~
2025-12-04T12:35:04.6699000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6699093Z  1960 |       0x80,
2025-12-04T12:35:04.6699233Z       |       ^~~~
2025-12-04T12:35:04.6700411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6708776Z  1962 |       0x80,
2025-12-04T12:35:04.6708946Z       |       ^~~~
2025-12-04T12:35:04.6710279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6710391Z  1964 |       0x80,
2025-12-04T12:35:04.6710512Z       |       ^~~~
2025-12-04T12:35:04.6711702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6711813Z  1966 |       0x80,
2025-12-04T12:35:04.6711914Z       |       ^~~~
2025-12-04T12:35:04.6713111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6713213Z  1968 |       0x80,
2025-12-04T12:35:04.6713305Z       |       ^~~~
2025-12-04T12:35:04.6714496Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6714599Z  1970 |       0x80,
2025-12-04T12:35:04.6714712Z       |       ^~~~
2025-12-04T12:35:04.6715882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6715983Z  1972 |       0x80,
2025-12-04T12:35:04.6716094Z       |       ^~~~
2025-12-04T12:35:04.6717271Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6717384Z  1974 |       0x80,
2025-12-04T12:35:04.6717478Z       |       ^~~~
2025-12-04T12:35:04.6718652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6718767Z  1976 |       0x80,
2025-12-04T12:35:04.6718865Z       |       ^~~~
2025-12-04T12:35:04.6720036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6720157Z  1978 |       0x80,
2025-12-04T12:35:04.6720248Z       |       ^~~~
2025-12-04T12:35:04.6721429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6721623Z  1980 |       0x80,
2025-12-04T12:35:04.6721712Z       |       ^~~~
2025-12-04T12:35:04.6722902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6723082Z  1982 |       0x80,
2025-12-04T12:35:04.6723173Z       |       ^~~~
2025-12-04T12:35:04.6724370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6724502Z  1984 |       0x80,
2025-12-04T12:35:04.6724611Z       |       ^~~~
2025-12-04T12:35:04.6725787Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6725887Z  1986 |       0x80,
2025-12-04T12:35:04.6725994Z       |       ^~~~
2025-12-04T12:35:04.6727167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6727284Z  1988 |       0x80,
2025-12-04T12:35:04.6727374Z       |       ^~~~
2025-12-04T12:35:04.6728542Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6728652Z  1990 |       0x80,
2025-12-04T12:35:04.6728741Z       |       ^~~~
2025-12-04T12:35:04.6729918Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6730027Z  1992 |       0x80,
2025-12-04T12:35:04.6730117Z       |       ^~~~
2025-12-04T12:35:04.6731306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.6731478Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.6731597Z       |                                      ^~~~~~
2025-12-04T12:35:04.6734044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = false; T = unsigned char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.6734630Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2116:28:   required from here
2025-12-04T12:35:04.6735836Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6735933Z  1866 |       0x80,
2025-12-04T12:35:04.6736041Z       |       ^~~~
2025-12-04T12:35:04.6737313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6737407Z  1868 |       0x80,
2025-12-04T12:35:04.6737510Z       |       ^~~~
2025-12-04T12:35:04.6738692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6738849Z  1870 |       0x80,
2025-12-04T12:35:04.6738941Z       |       ^~~~
2025-12-04T12:35:04.6740155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6740296Z  1872 |       0x80,
2025-12-04T12:35:04.6740386Z       |       ^~~~
2025-12-04T12:35:04.6741595Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6741704Z  1874 |       0x80,
2025-12-04T12:35:04.6741799Z       |       ^~~~
2025-12-04T12:35:04.6742978Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6743077Z  1876 |       0x80,
2025-12-04T12:35:04.6743171Z       |       ^~~~
2025-12-04T12:35:04.6744359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6744459Z  1878 |       0x80,
2025-12-04T12:35:04.6744552Z       |       ^~~~
2025-12-04T12:35:04.6745743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6745834Z  1880 |       0x80,
2025-12-04T12:35:04.6745939Z       |       ^~~~
2025-12-04T12:35:04.6747110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6747211Z  1882 |       0x80,
2025-12-04T12:35:04.6747317Z       |       ^~~~
2025-12-04T12:35:04.6748495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6748609Z  1884 |       0x80,
2025-12-04T12:35:04.6748704Z       |       ^~~~
2025-12-04T12:35:04.6749883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6749986Z  1886 |       0x80,
2025-12-04T12:35:04.6750078Z       |       ^~~~
2025-12-04T12:35:04.6751248Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6751417Z  1888 |       0x80,
2025-12-04T12:35:04.6751506Z       |       ^~~~
2025-12-04T12:35:04.6752697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6752832Z  1890 |       0x80,
2025-12-04T12:35:04.6752928Z       |       ^~~~
2025-12-04T12:35:04.6754123Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6754215Z  1892 |       0x80,
2025-12-04T12:35:04.6754323Z       |       ^~~~
2025-12-04T12:35:04.6755499Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6755598Z  1894 |       0x80,
2025-12-04T12:35:04.6755700Z       |       ^~~~
2025-12-04T12:35:04.6756912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6757025Z  1896 |       0x80,
2025-12-04T12:35:04.6757117Z       |       ^~~~
2025-12-04T12:35:04.6758331Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6758436Z  1898 |       0x80,
2025-12-04T12:35:04.6758524Z       |       ^~~~
2025-12-04T12:35:04.6759703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6759812Z  1900 |       0x80,
2025-12-04T12:35:04.6759900Z       |       ^~~~
2025-12-04T12:35:04.6761087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6761191Z  1902 |       0x80,
2025-12-04T12:35:04.6761285Z       |       ^~~~
2025-12-04T12:35:04.6762470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6762564Z  1904 |       0x80,
2025-12-04T12:35:04.6762655Z       |       ^~~~
2025-12-04T12:35:04.6763834Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6763931Z  1906 |       0x80,
2025-12-04T12:35:04.6764035Z       |       ^~~~
2025-12-04T12:35:04.6765212Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6765310Z  1908 |       0x80,
2025-12-04T12:35:04.6765414Z       |       ^~~~
2025-12-04T12:35:04.6766598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6766704Z  1910 |       0x80,
2025-12-04T12:35:04.6766792Z       |       ^~~~
2025-12-04T12:35:04.6767960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6768136Z  1912 |       0x80,
2025-12-04T12:35:04.6768232Z       |       ^~~~
2025-12-04T12:35:04.6769409Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6769566Z  1914 |       0x80,
2025-12-04T12:35:04.6769657Z       |       ^~~~
2025-12-04T12:35:04.6770856Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6771136Z  1916 |       0x80,
2025-12-04T12:35:04.6771228Z       |       ^~~~
2025-12-04T12:35:04.6772434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6772526Z  1918 |       0x80,
2025-12-04T12:35:04.6772637Z       |       ^~~~
2025-12-04T12:35:04.6774431Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6774549Z  1920 |       0x80,
2025-12-04T12:35:04.6774655Z       |       ^~~~
2025-12-04T12:35:04.6775895Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6776001Z  1922 |       0x80,
2025-12-04T12:35:04.6776092Z       |       ^~~~
2025-12-04T12:35:04.6777348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6777455Z  1924 |       0x80,
2025-12-04T12:35:04.6777546Z       |       ^~~~
2025-12-04T12:35:04.6778728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6778843Z  1926 |       0x80,
2025-12-04T12:35:04.6778938Z       |       ^~~~
2025-12-04T12:35:04.6780126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6780220Z  1928 |       0x80);
2025-12-04T12:35:04.6780311Z       |       ^~~~
2025-12-04T12:35:04.6781494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6781588Z  1930 |       0x80,
2025-12-04T12:35:04.6781678Z       |       ^~~~
2025-12-04T12:35:04.6782872Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6782971Z  1932 |       0x80,
2025-12-04T12:35:04.6783077Z       |       ^~~~
2025-12-04T12:35:04.6784257Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6784348Z  1934 |       0x80,
2025-12-04T12:35:04.6784453Z       |       ^~~~
2025-12-04T12:35:04.6785692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6785799Z  1936 |       0x80,
2025-12-04T12:35:04.6785889Z       |       ^~~~
2025-12-04T12:35:04.6787098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6787257Z  1938 |       0x80,
2025-12-04T12:35:04.6787348Z       |       ^~~~
2025-12-04T12:35:04.6788566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6788674Z  1940 |       0x80,
2025-12-04T12:35:04.6788774Z       |       ^~~~
2025-12-04T12:35:04.6789964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6790059Z  1942 |       0x80,
2025-12-04T12:35:04.6790150Z       |       ^~~~
2025-12-04T12:35:04.6791341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6791442Z  1944 |       0x80,
2025-12-04T12:35:04.6791543Z       |       ^~~~
2025-12-04T12:35:04.6792719Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6792813Z  1946 |       0x80,
2025-12-04T12:35:04.6792923Z       |       ^~~~
2025-12-04T12:35:04.6794098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6794203Z  1948 |       0x80,
2025-12-04T12:35:04.6794290Z       |       ^~~~
2025-12-04T12:35:04.6795473Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6795589Z  1950 |       0x80,
2025-12-04T12:35:04.6795682Z       |       ^~~~
2025-12-04T12:35:04.6796859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6796963Z  1952 |       0x80,
2025-12-04T12:35:04.6797059Z       |       ^~~~
2025-12-04T12:35:04.6798242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6798337Z  1954 |       0x80,
2025-12-04T12:35:04.6798427Z       |       ^~~~
2025-12-04T12:35:04.6799617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6799716Z  1956 |       0x80,
2025-12-04T12:35:04.6799808Z       |       ^~~~
2025-12-04T12:35:04.6800996Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6801091Z  1958 |       0x80,
2025-12-04T12:35:04.6801231Z       |       ^~~~
2025-12-04T12:35:04.6802407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6802498Z  1960 |       0x80,
2025-12-04T12:35:04.6802605Z       |       ^~~~
2025-12-04T12:35:04.6803817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6803956Z  1962 |       0x80,
2025-12-04T12:35:04.6804046Z       |       ^~~~
2025-12-04T12:35:04.6805272Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6805385Z  1964 |       0x80,
2025-12-04T12:35:04.6805476Z       |       ^~~~
2025-12-04T12:35:04.6806646Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6806751Z  1966 |       0x80,
2025-12-04T12:35:04.6806841Z       |       ^~~~
2025-12-04T12:35:04.6808040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6808131Z  1968 |       0x80,
2025-12-04T12:35:04.6808220Z       |       ^~~~
2025-12-04T12:35:04.6809425Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6809527Z  1970 |       0x80,
2025-12-04T12:35:04.6809630Z       |       ^~~~
2025-12-04T12:35:04.6810808Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6810903Z  1972 |       0x80,
2025-12-04T12:35:04.6811004Z       |       ^~~~
2025-12-04T12:35:04.6812194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6812296Z  1974 |       0x80,
2025-12-04T12:35:04.6812385Z       |       ^~~~
2025-12-04T12:35:04.6813563Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6813670Z  1976 |       0x80,
2025-12-04T12:35:04.6813759Z       |       ^~~~
2025-12-04T12:35:04.6814933Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6815039Z  1978 |       0x80,
2025-12-04T12:35:04.6815129Z       |       ^~~~
2025-12-04T12:35:04.6816393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6816490Z  1980 |       0x80,
2025-12-04T12:35:04.6816584Z       |       ^~~~
2025-12-04T12:35:04.6817782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6817928Z  1982 |       0x80,
2025-12-04T12:35:04.6818019Z       |       ^~~~
2025-12-04T12:35:04.6819209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6819301Z  1984 |       0x80,
2025-12-04T12:35:04.6819447Z       |       ^~~~
2025-12-04T12:35:04.6820651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6820748Z  1986 |       0x80,
2025-12-04T12:35:04.6820893Z       |       ^~~~
2025-12-04T12:35:04.6822070Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6822185Z  1988 |       0x80,
2025-12-04T12:35:04.6822277Z       |       ^~~~
2025-12-04T12:35:04.6823456Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6823565Z  1990 |       0x80,
2025-12-04T12:35:04.6823662Z       |       ^~~~
2025-12-04T12:35:04.6824831Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.6824936Z  1992 |       0x80,
2025-12-04T12:35:04.6825034Z       |       ^~~~
2025-12-04T12:35:04.6826227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.6826392Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.6826510Z       |                                      ^~~~~~
2025-12-04T12:35:04.6827030Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:16,
2025-12-04T12:35:04.6827404Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.6827862Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.6828268Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.6828742Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.6829401Z                  from /tmp/GPm4bX/tmp2gno_q_y/data/aotinductor/model2/cyss5jazqjsvp5s2t3ihlofugodyzirark5aiimqjwirn4hylxbp.wrapper.cpp:656:
2025-12-04T12:35:04.6830914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = signed char; int64_t = long int]’:
2025-12-04T12:35:04.6831511Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:696:31:   required from here
2025-12-04T12:35:04.6832746Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6832885Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6832979Z       |       ^~~~
2025-12-04T12:35:04.6834166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6834297Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6834398Z       |             ^~~~
2025-12-04T12:35:04.6835621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6835753Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6835848Z       |                   ^~~~
2025-12-04T12:35:04.6837082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6837192Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6837290Z       |                         ^~~~
2025-12-04T12:35:04.6838500Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6838618Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6838720Z       |       ^~~~
2025-12-04T12:35:04.6839908Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6840029Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6840140Z       |             ^~~~
2025-12-04T12:35:04.6841331Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6841455Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6841563Z       |                   ^~~~
2025-12-04T12:35:04.6842746Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6842873Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6842975Z       |                         ^~~~
2025-12-04T12:35:04.6844160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6844283Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6844376Z       |       ^~~~
2025-12-04T12:35:04.6845569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6845721Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6845817Z       |             ^~~~
2025-12-04T12:35:04.6847017Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6847135Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6847283Z       |                   ^~~~
2025-12-04T12:35:04.6848470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6848587Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6848694Z       |                         ^~~~
2025-12-04T12:35:04.6849863Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6849990Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6850082Z       |       ^~~~
2025-12-04T12:35:04.6851296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6851430Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6851526Z       |             ^~~~
2025-12-04T12:35:04.6852748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6852866Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6852968Z       |                   ^~~~
2025-12-04T12:35:04.6854160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6854280Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6854380Z       |                         ^~~~
2025-12-04T12:35:04.6855582Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6855700Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6855803Z       |       ^~~~
2025-12-04T12:35:04.6857059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6857174Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6857295Z       |             ^~~~
2025-12-04T12:35:04.6858481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6858604Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6858705Z       |                   ^~~~
2025-12-04T12:35:04.6859898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6860038Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6860146Z       |                         ^~~~
2025-12-04T12:35:04.6861340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6861503Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6861596Z       |       ^~~~
2025-12-04T12:35:04.6862797Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6862916Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6863071Z       |             ^~~~
2025-12-04T12:35:04.6864261Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6864381Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6864498Z       |                   ^~~~
2025-12-04T12:35:04.6865682Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6865803Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6865921Z       |                         ^~~~
2025-12-04T12:35:04.6867162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6867298Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6867393Z       |       ^~~~
2025-12-04T12:35:04.6868621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6868754Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6868855Z       |             ^~~~
2025-12-04T12:35:04.6870055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6870178Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6870280Z       |                   ^~~~
2025-12-04T12:35:04.6871669Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6871792Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6871909Z       |                         ^~~~
2025-12-04T12:35:04.6873101Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6873216Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6873331Z       |       ^~~~
2025-12-04T12:35:04.6874513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6874628Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6874741Z       |             ^~~~
2025-12-04T12:35:04.6875940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6876069Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6876179Z       |                   ^~~~
2025-12-04T12:35:04.6877360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6877582Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6877684Z       |                         ^~~~
2025-12-04T12:35:04.6878882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6879049Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6879186Z       |       ^~~~
2025-12-04T12:35:04.6880392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6880553Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6880665Z       |             ^~~~
2025-12-04T12:35:04.6881848Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6881963Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6882077Z       |                   ^~~~
2025-12-04T12:35:04.6883262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6883378Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6883493Z       |                         ^~~~
2025-12-04T12:35:04.6884673Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6884796Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6884888Z       |       ^~~~
2025-12-04T12:35:04.6886078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6886212Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6886307Z       |             ^~~~
2025-12-04T12:35:04.6887502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6887620Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6887718Z       |                   ^~~~
2025-12-04T12:35:04.6888924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6889034Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6889142Z       |                         ^~~~
2025-12-04T12:35:04.6890331Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6890444Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6890557Z       |       ^~~~
2025-12-04T12:35:04.6891751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6891864Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6891981Z       |             ^~~~
2025-12-04T12:35:04.6893166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6893340Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6893441Z       |                   ^~~~
2025-12-04T12:35:04.6894623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6894789Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6894924Z       |                         ^~~~
2025-12-04T12:35:04.6896115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6896264Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6896422Z       |       ^~~~
2025-12-04T12:35:04.6897630Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6897749Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6897845Z       |             ^~~~
2025-12-04T12:35:04.6899047Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6899166Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6899281Z       |                   ^~~~
2025-12-04T12:35:04.6900470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6900586Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6900703Z       |                         ^~~~
2025-12-04T12:35:04.6902176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = unsigned char; int64_t = long int]’:
2025-12-04T12:35:04.6902770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:933:31:   required from here
2025-12-04T12:35:04.6903965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6904080Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6904189Z       |       ^~~~
2025-12-04T12:35:04.6905377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6905513Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6905611Z       |             ^~~~
2025-12-04T12:35:04.6906799Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6906935Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6907043Z       |                   ^~~~
2025-12-04T12:35:04.6908237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6908358Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6908461Z       |                         ^~~~
2025-12-04T12:35:04.6909652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6909823Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6909929Z       |       ^~~~
2025-12-04T12:35:04.6911156Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6911306Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6911421Z       |             ^~~~
2025-12-04T12:35:04.6912640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6912755Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6912867Z       |                   ^~~~
2025-12-04T12:35:04.6914057Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6914191Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6914291Z       |                         ^~~~
2025-12-04T12:35:04.6915476Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6915610Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6915707Z       |       ^~~~
2025-12-04T12:35:04.6916915Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6917027Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6917129Z       |             ^~~~
2025-12-04T12:35:04.6918324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6918439Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6918559Z       |                   ^~~~
2025-12-04T12:35:04.6919760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6919872Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6919991Z       |                         ^~~~
2025-12-04T12:35:04.6921170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6921289Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6921396Z       |       ^~~~
2025-12-04T12:35:04.6922580Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6922717Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6922821Z       |             ^~~~
2025-12-04T12:35:04.6923999Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6924132Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6924232Z       |                   ^~~~
2025-12-04T12:35:04.6925422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6925660Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6925762Z       |                         ^~~~
2025-12-04T12:35:04.6927036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6927180Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6927272Z       |       ^~~~
2025-12-04T12:35:04.6928512Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6928627Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6928737Z       |             ^~~~
2025-12-04T12:35:04.6929919Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6930037Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6930152Z       |                   ^~~~
2025-12-04T12:35:04.6931339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6931469Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6931570Z       |                         ^~~~
2025-12-04T12:35:04.6932755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6932882Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6932982Z       |       ^~~~
2025-12-04T12:35:04.6934176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6934288Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6934385Z       |             ^~~~
2025-12-04T12:35:04.6935596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6935709Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6935815Z       |                   ^~~~
2025-12-04T12:35:04.6937100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6937266Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6937382Z       |                         ^~~~
2025-12-04T12:35:04.6938576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6938696Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6938844Z       |       ^~~~
2025-12-04T12:35:04.6940031Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6940165Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6940263Z       |             ^~~~
2025-12-04T12:35:04.6941445Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6941576Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6941676Z       |                   ^~~~
2025-12-04T12:35:04.6942910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6943028Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6943129Z       |                         ^~~~
2025-12-04T12:35:04.6944360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6944473Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6944567Z       |       ^~~~
2025-12-04T12:35:04.6945770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6945891Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6946003Z       |             ^~~~
2025-12-04T12:35:04.6947191Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6947309Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6947422Z       |                   ^~~~
2025-12-04T12:35:04.6948620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6948746Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6948852Z       |                         ^~~~
2025-12-04T12:35:04.6950029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6950157Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6950248Z       |       ^~~~
2025-12-04T12:35:04.6951463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6951578Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6951678Z       |             ^~~~
2025-12-04T12:35:04.6952873Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6953030Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6953129Z       |                   ^~~~
2025-12-04T12:35:04.6954329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6954451Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6954603Z       |                         ^~~~
2025-12-04T12:35:04.6955789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6955907Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6956010Z       |       ^~~~
2025-12-04T12:35:04.6957193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6957324Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6957421Z       |             ^~~~
2025-12-04T12:35:04.6958643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6958774Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6958873Z       |                   ^~~~
2025-12-04T12:35:04.6960105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6960220Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6960322Z       |                         ^~~~
2025-12-04T12:35:04.6961517Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6961629Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6961723Z       |       ^~~~
2025-12-04T12:35:04.6962922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6963039Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6963149Z       |             ^~~~
2025-12-04T12:35:04.6964339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6964451Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6964573Z       |                   ^~~~
2025-12-04T12:35:04.6965753Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6965881Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6965984Z       |                         ^~~~
2025-12-04T12:35:04.6967168Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6967293Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6967392Z       |       ^~~~
2025-12-04T12:35:04.6968591Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6968742Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6968838Z       |             ^~~~
2025-12-04T12:35:04.6970033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6970177Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6970310Z       |                   ^~~~
2025-12-04T12:35:04.6971674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.6971868Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.6971982Z       |                         ^~~~
2025-12-04T12:35:04.6972529Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_float.h:12,
2025-12-04T12:35:04.6972972Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:11,
2025-12-04T12:35:04.6973360Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.6973811Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.6974234Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.6974698Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.6975342Z                  from /tmp/wzlxAD/tmp2gno_q_y/data/aotinductor/model1/cji6fcfpjxr5ad3oypbruxr5r26niflgwwkmd5rthzuhxclq6uis.wrapper.cpp:751:
2025-12-04T12:35:04.6975957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/sleef.h:192:10: warning: ISO C++ prohibits anonymous structs [-Wpedantic]
2025-12-04T12:35:04.6976059Z   192 |   struct {
2025-12-04T12:35:04.6976166Z       |          ^
2025-12-04T12:35:04.6976723Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15,
2025-12-04T12:35:04.6977096Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.6977555Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.6977961Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.6978445Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.6979086Z                  from /tmp/wzlxAD/tmp2gno_q_y/data/aotinductor/model1/cji6fcfpjxr5ad3oypbruxr5r26niflgwwkmd5rthzuhxclq6uis.wrapper.cpp:751:
2025-12-04T12:35:04.6981328Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<short int>&, const at::vec::CPU_CAPABILITY::Vectorized<short int>&, const at::vec::CPU_CAPABILITY::Vectorized<short int>&)’:
2025-12-04T12:35:04.6982524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:544:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.6982685Z   544 |     auto msb_one = _mm512_set1_epi16(0xFFFF);
2025-12-04T12:35:04.6982821Z       |                                      ^~~~~~
2025-12-04T12:35:04.6983324Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15,
2025-12-04T12:35:04.6983783Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.6984243Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.6984687Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.6985207Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.6985875Z                  from /tmp/wzlxAD/tmp2gno_q_y/data/aotinductor/model1/cji6fcfpjxr5ad3oypbruxr5r26niflgwwkmd5rthzuhxclq6uis.wrapper.cpp:751:
2025-12-04T12:35:04.6987531Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.6988706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:697:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.6988922Z   697 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.6989075Z       |                                                      ^~~~~~
2025-12-04T12:35:04.6990705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.6991883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:701:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.6992096Z   701 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.6992226Z       |                                                      ^~~~~~
2025-12-04T12:35:04.6993857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.6995022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:705:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.6995239Z   705 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.6995364Z       |                                                      ^~~~~~
2025-12-04T12:35:04.6996998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.6998162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:709:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.6998369Z   709 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.6998506Z       |                                                      ^~~~~~
2025-12-04T12:35:04.7000124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator>(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.7001341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:713:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.7001543Z   713 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.7001702Z       |                                                      ^~~~~~
2025-12-04T12:35:04.7003390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator>=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.7004553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:717:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.7004773Z   717 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.7004896Z       |                                                      ^~~~~~
2025-12-04T12:35:04.7007169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&, const at::vec::CPU_CAPABILITY::Vectorized<signed char>&, const at::vec::CPU_CAPABILITY::Vectorized<signed char>&)’:
2025-12-04T12:35:04.7008373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1153:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7008534Z  1153 |     auto msb_one = _mm512_set1_epi8(0xFF);
2025-12-04T12:35:04.7008658Z       |                                     ^~~~
2025-12-04T12:35:04.7010327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.7011540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1166:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7011749Z  1166 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.7011892Z       |                                                     ^~~~
2025-12-04T12:35:04.7013542Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.7014739Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1170:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7014960Z  1170 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.7015091Z       |                                                     ^~~~
2025-12-04T12:35:04.7016835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.7018106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1174:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7018383Z  1174 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.7018513Z       |                                                     ^~~~
2025-12-04T12:35:04.7020206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.7021490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1178:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7021696Z  1178 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.7021835Z       |                                                     ^~~~
2025-12-04T12:35:04.7024183Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&, const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&, const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&)’:
2025-12-04T12:35:04.7025396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1207:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7025554Z  1207 |     auto msb_one = _mm512_set1_epi8(0xFF);
2025-12-04T12:35:04.7025672Z       |                                     ^~~~
2025-12-04T12:35:04.7027392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.7028842Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1220:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7029069Z  1220 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.7029193Z       |                                                     ^~~~
2025-12-04T12:35:04.7030925Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.7032122Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1224:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7032332Z  1224 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.7032470Z       |                                                     ^~~~
2025-12-04T12:35:04.7034185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.7035897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1228:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7036111Z  1228 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.7036313Z       |                                                     ^~~~
2025-12-04T12:35:04.7038089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.7039319Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1232:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7039591Z  1232 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.7039717Z       |                                                     ^~~~
2025-12-04T12:35:04.7042100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = true; T = signed char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.7042696Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2074:27:   required from here
2025-12-04T12:35:04.7043910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7044023Z  1866 |       0x80,
2025-12-04T12:35:04.7044121Z       |       ^~~~
2025-12-04T12:35:04.7045329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7045431Z  1868 |       0x80,
2025-12-04T12:35:04.7045524Z       |       ^~~~
2025-12-04T12:35:04.7046729Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7046833Z  1870 |       0x80,
2025-12-04T12:35:04.7046938Z       |       ^~~~
2025-12-04T12:35:04.7048132Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7048226Z  1872 |       0x80,
2025-12-04T12:35:04.7048331Z       |       ^~~~
2025-12-04T12:35:04.7049508Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7049624Z  1874 |       0x80,
2025-12-04T12:35:04.7049716Z       |       ^~~~
2025-12-04T12:35:04.7050926Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7051028Z  1876 |       0x80,
2025-12-04T12:35:04.7051135Z       |       ^~~~
2025-12-04T12:35:04.7052315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7052410Z  1878 |       0x80,
2025-12-04T12:35:04.7052521Z       |       ^~~~
2025-12-04T12:35:04.7053704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7053855Z  1880 |       0x80,
2025-12-04T12:35:04.7053947Z       |       ^~~~
2025-12-04T12:35:04.7055159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7055305Z  1882 |       0x80,
2025-12-04T12:35:04.7055399Z       |       ^~~~
2025-12-04T12:35:04.7056708Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7056808Z  1884 |       0x80,
2025-12-04T12:35:04.7056901Z       |       ^~~~
2025-12-04T12:35:04.7058102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7058207Z  1886 |       0x80,
2025-12-04T12:35:04.7058302Z       |       ^~~~
2025-12-04T12:35:04.7059503Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7059606Z  1888 |       0x80,
2025-12-04T12:35:04.7059712Z       |       ^~~~
2025-12-04T12:35:04.7060900Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7060995Z  1890 |       0x80,
2025-12-04T12:35:04.7061102Z       |       ^~~~
2025-12-04T12:35:04.7062277Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7062395Z  1892 |       0x80,
2025-12-04T12:35:04.7062489Z       |       ^~~~
2025-12-04T12:35:04.7063668Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7063788Z  1894 |       0x80,
2025-12-04T12:35:04.7063881Z       |       ^~~~
2025-12-04T12:35:04.7065063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7065174Z  1896 |       0x80,
2025-12-04T12:35:04.7065267Z       |       ^~~~
2025-12-04T12:35:04.7066469Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7066604Z  1898 |       0x80,
2025-12-04T12:35:04.7066695Z       |       ^~~~
2025-12-04T12:35:04.7067896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7068028Z  1900 |       0x80,
2025-12-04T12:35:04.7068120Z       |       ^~~~
2025-12-04T12:35:04.7069316Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7069413Z  1902 |       0x80,
2025-12-04T12:35:04.7069522Z       |       ^~~~
2025-12-04T12:35:04.7070703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7070806Z  1904 |       0x80,
2025-12-04T12:35:04.7070909Z       |       ^~~~
2025-12-04T12:35:04.7072395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7072514Z  1906 |       0x80,
2025-12-04T12:35:04.7072606Z       |       ^~~~
2025-12-04T12:35:04.7073832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7073949Z  1908 |       0x80,
2025-12-04T12:35:04.7074041Z       |       ^~~~
2025-12-04T12:35:04.7075227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7075339Z  1910 |       0x80,
2025-12-04T12:35:04.7075429Z       |       ^~~~
2025-12-04T12:35:04.7076619Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7076719Z  1912 |       0x80,
2025-12-04T12:35:04.7076811Z       |       ^~~~
2025-12-04T12:35:04.7078000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7078096Z  1914 |       0x80,
2025-12-04T12:35:04.7078204Z       |       ^~~~
2025-12-04T12:35:04.7079383Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7079490Z  1916 |       0x80,
2025-12-04T12:35:04.7079597Z       |       ^~~~
2025-12-04T12:35:04.7080789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7080892Z  1918 |       0x80,
2025-12-04T12:35:04.7080997Z       |       ^~~~
2025-12-04T12:35:04.7082197Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7082305Z  1920 |       0x80,
2025-12-04T12:35:04.7082399Z       |       ^~~~
2025-12-04T12:35:04.7083579Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7083741Z  1922 |       0x80,
2025-12-04T12:35:04.7083832Z       |       ^~~~
2025-12-04T12:35:04.7085036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7085180Z  1924 |       0x80,
2025-12-04T12:35:04.7085274Z       |       ^~~~
2025-12-04T12:35:04.7086481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7086576Z  1926 |       0x80,
2025-12-04T12:35:04.7086668Z       |       ^~~~
2025-12-04T12:35:04.7087866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7087968Z  1928 |       0x80);
2025-12-04T12:35:04.7088074Z       |       ^~~~
2025-12-04T12:35:04.7089291Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7089397Z  1930 |       0x80,
2025-12-04T12:35:04.7089504Z       |       ^~~~
2025-12-04T12:35:04.7090716Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7090827Z  1932 |       0x80,
2025-12-04T12:35:04.7090919Z       |       ^~~~
2025-12-04T12:35:04.7092115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7092223Z  1934 |       0x80,
2025-12-04T12:35:04.7092315Z       |       ^~~~
2025-12-04T12:35:04.7093493Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7093606Z  1936 |       0x80,
2025-12-04T12:35:04.7093698Z       |       ^~~~
2025-12-04T12:35:04.7094890Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7094987Z  1938 |       0x80,
2025-12-04T12:35:04.7095080Z       |       ^~~~
2025-12-04T12:35:04.7096273Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7096439Z  1940 |       0x80,
2025-12-04T12:35:04.7096535Z       |       ^~~~
2025-12-04T12:35:04.7097737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7097838Z  1942 |       0x80,
2025-12-04T12:35:04.7097949Z       |       ^~~~
2025-12-04T12:35:04.7099133Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7099228Z  1944 |       0x80,
2025-12-04T12:35:04.7099336Z       |       ^~~~
2025-12-04T12:35:04.7100561Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7100669Z  1946 |       0x80,
2025-12-04T12:35:04.7100759Z       |       ^~~~
2025-12-04T12:35:04.7101965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7102126Z  1948 |       0x80,
2025-12-04T12:35:04.7102216Z       |       ^~~~
2025-12-04T12:35:04.7103432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7103542Z  1950 |       0x80,
2025-12-04T12:35:04.7103635Z       |       ^~~~
2025-12-04T12:35:04.7104835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7104932Z  1952 |       0x80,
2025-12-04T12:35:04.7105023Z       |       ^~~~
2025-12-04T12:35:04.7106221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7106321Z  1954 |       0x80,
2025-12-04T12:35:04.7106428Z       |       ^~~~
2025-12-04T12:35:04.7107603Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7107696Z  1956 |       0x80,
2025-12-04T12:35:04.7107807Z       |       ^~~~
2025-12-04T12:35:04.7108980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7109088Z  1958 |       0x80,
2025-12-04T12:35:04.7109180Z       |       ^~~~
2025-12-04T12:35:04.7110365Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7110480Z  1960 |       0x80,
2025-12-04T12:35:04.7110570Z       |       ^~~~
2025-12-04T12:35:04.7111756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7111862Z  1962 |       0x80,
2025-12-04T12:35:04.7111959Z       |       ^~~~
2025-12-04T12:35:04.7113148Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7113243Z  1964 |       0x80,
2025-12-04T12:35:04.7113334Z       |       ^~~~
2025-12-04T12:35:04.7114525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7114624Z  1966 |       0x80,
2025-12-04T12:35:04.7114714Z       |       ^~~~
2025-12-04T12:35:04.7115917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7116011Z  1968 |       0x80,
2025-12-04T12:35:04.7116155Z       |       ^~~~
2025-12-04T12:35:04.7117334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7117428Z  1970 |       0x80,
2025-12-04T12:35:04.7117532Z       |       ^~~~
2025-12-04T12:35:04.7118745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7118886Z  1972 |       0x80,
2025-12-04T12:35:04.7118976Z       |       ^~~~
2025-12-04T12:35:04.7120185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7120295Z  1974 |       0x80,
2025-12-04T12:35:04.7120396Z       |       ^~~~
2025-12-04T12:35:04.7121569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7121677Z  1976 |       0x80,
2025-12-04T12:35:04.7121769Z       |       ^~~~
2025-12-04T12:35:04.7122963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7123062Z  1978 |       0x80,
2025-12-04T12:35:04.7123152Z       |       ^~~~
2025-12-04T12:35:04.7124360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7124462Z  1980 |       0x80,
2025-12-04T12:35:04.7124576Z       |       ^~~~
2025-12-04T12:35:04.7125746Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7125844Z  1982 |       0x80,
2025-12-04T12:35:04.7125954Z       |       ^~~~
2025-12-04T12:35:04.7127129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7127228Z  1984 |       0x80,
2025-12-04T12:35:04.7127338Z       |       ^~~~
2025-12-04T12:35:04.7128519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7128633Z  1986 |       0x80,
2025-12-04T12:35:04.7128726Z       |       ^~~~
2025-12-04T12:35:04.7129900Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7130010Z  1988 |       0x80,
2025-12-04T12:35:04.7130101Z       |       ^~~~
2025-12-04T12:35:04.7131300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7131396Z  1990 |       0x80,
2025-12-04T12:35:04.7131492Z       |       ^~~~
2025-12-04T12:35:04.7132693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7132830Z  1992 |       0x80,
2025-12-04T12:35:04.7132921Z       |       ^~~~
2025-12-04T12:35:04.7134118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.7134276Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.7134480Z       |                                      ^~~~~~
2025-12-04T12:35:04.7137020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = true; T = unsigned char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.7137622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2081:27:   required from here
2025-12-04T12:35:04.7138823Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7138919Z  1866 |       0x80,
2025-12-04T12:35:04.7139035Z       |       ^~~~
2025-12-04T12:35:04.7140215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7140328Z  1868 |       0x80,
2025-12-04T12:35:04.7140427Z       |       ^~~~
2025-12-04T12:35:04.7141598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7141713Z  1870 |       0x80,
2025-12-04T12:35:04.7141807Z       |       ^~~~
2025-12-04T12:35:04.7142980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7143096Z  1872 |       0x80,
2025-12-04T12:35:04.7143202Z       |       ^~~~
2025-12-04T12:35:04.7144392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7144486Z  1874 |       0x80,
2025-12-04T12:35:04.7144585Z       |       ^~~~
2025-12-04T12:35:04.7145779Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7145881Z  1876 |       0x80,
2025-12-04T12:35:04.7145976Z       |       ^~~~
2025-12-04T12:35:04.7147167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7147265Z  1878 |       0x80,
2025-12-04T12:35:04.7147384Z       |       ^~~~
2025-12-04T12:35:04.7148558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7148656Z  1880 |       0x80,
2025-12-04T12:35:04.7148772Z       |       ^~~~
2025-12-04T12:35:04.7149947Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7150097Z  1882 |       0x80,
2025-12-04T12:35:04.7150193Z       |       ^~~~
2025-12-04T12:35:04.7151370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7151517Z  1884 |       0x80,
2025-12-04T12:35:04.7151645Z       |       ^~~~
2025-12-04T12:35:04.7152835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7152991Z  1886 |       0x80,
2025-12-04T12:35:04.7153086Z       |       ^~~~
2025-12-04T12:35:04.7154284Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7154387Z  1888 |       0x80,
2025-12-04T12:35:04.7154481Z       |       ^~~~
2025-12-04T12:35:04.7155674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7155774Z  1890 |       0x80,
2025-12-04T12:35:04.7155886Z       |       ^~~~
2025-12-04T12:35:04.7157060Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7157166Z  1892 |       0x80,
2025-12-04T12:35:04.7157271Z       |       ^~~~
2025-12-04T12:35:04.7158454Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7158560Z  1894 |       0x80,
2025-12-04T12:35:04.7158666Z       |       ^~~~
2025-12-04T12:35:04.7159847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7159970Z  1896 |       0x80,
2025-12-04T12:35:04.7160064Z       |       ^~~~
2025-12-04T12:35:04.7161241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7161354Z  1898 |       0x80,
2025-12-04T12:35:04.7161444Z       |       ^~~~
2025-12-04T12:35:04.7162624Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7162759Z  1900 |       0x80,
2025-12-04T12:35:04.7162850Z       |       ^~~~
2025-12-04T12:35:04.7164039Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7164197Z  1902 |       0x80,
2025-12-04T12:35:04.7164287Z       |       ^~~~
2025-12-04T12:35:04.7165488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7165589Z  1904 |       0x80,
2025-12-04T12:35:04.7165694Z       |       ^~~~
2025-12-04T12:35:04.7166863Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7166967Z  1906 |       0x80,
2025-12-04T12:35:04.7167078Z       |       ^~~~
2025-12-04T12:35:04.7168246Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7168398Z  1908 |       0x80,
2025-12-04T12:35:04.7168495Z       |       ^~~~
2025-12-04T12:35:04.7169673Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7169893Z  1910 |       0x80,
2025-12-04T12:35:04.7169987Z       |       ^~~~
2025-12-04T12:35:04.7171352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7171469Z  1912 |       0x80,
2025-12-04T12:35:04.7171563Z       |       ^~~~
2025-12-04T12:35:04.7172773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7172875Z  1914 |       0x80,
2025-12-04T12:35:04.7172969Z       |       ^~~~
2025-12-04T12:35:04.7174164Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7174260Z  1916 |       0x80,
2025-12-04T12:35:04.7174364Z       |       ^~~~
2025-12-04T12:35:04.7175534Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7175634Z  1918 |       0x80,
2025-12-04T12:35:04.7175742Z       |       ^~~~
2025-12-04T12:35:04.7176981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7177083Z  1920 |       0x80,
2025-12-04T12:35:04.7177190Z       |       ^~~~
2025-12-04T12:35:04.7178374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7178483Z  1922 |       0x80,
2025-12-04T12:35:04.7178575Z       |       ^~~~
2025-12-04T12:35:04.7179740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7179940Z  1924 |       0x80,
2025-12-04T12:35:04.7180033Z       |       ^~~~
2025-12-04T12:35:04.7181233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7181374Z  1926 |       0x80,
2025-12-04T12:35:04.7181466Z       |       ^~~~
2025-12-04T12:35:04.7182668Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7182767Z  1928 |       0x80);
2025-12-04T12:35:04.7182861Z       |       ^~~~
2025-12-04T12:35:04.7184059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7184158Z  1930 |       0x80,
2025-12-04T12:35:04.7184268Z       |       ^~~~
2025-12-04T12:35:04.7185486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7185588Z  1932 |       0x80,
2025-12-04T12:35:04.7185697Z       |       ^~~~
2025-12-04T12:35:04.7186921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7187031Z  1934 |       0x80,
2025-12-04T12:35:04.7187125Z       |       ^~~~
2025-12-04T12:35:04.7188302Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7188420Z  1936 |       0x80,
2025-12-04T12:35:04.7188512Z       |       ^~~~
2025-12-04T12:35:04.7189692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7189804Z  1938 |       0x80,
2025-12-04T12:35:04.7189899Z       |       ^~~~
2025-12-04T12:35:04.7191092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7191189Z  1940 |       0x80,
2025-12-04T12:35:04.7191280Z       |       ^~~~
2025-12-04T12:35:04.7192471Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7192572Z  1942 |       0x80,
2025-12-04T12:35:04.7192665Z       |       ^~~~
2025-12-04T12:35:04.7193856Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7193956Z  1944 |       0x80,
2025-12-04T12:35:04.7194064Z       |       ^~~~
2025-12-04T12:35:04.7195375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7195470Z  1946 |       0x80,
2025-12-04T12:35:04.7195577Z       |       ^~~~
2025-12-04T12:35:04.7196767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7196922Z  1948 |       0x80,
2025-12-04T12:35:04.7197014Z       |       ^~~~
2025-12-04T12:35:04.7198229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7198370Z  1950 |       0x80,
2025-12-04T12:35:04.7198463Z       |       ^~~~
2025-12-04T12:35:04.7199681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7199791Z  1952 |       0x80,
2025-12-04T12:35:04.7199883Z       |       ^~~~
2025-12-04T12:35:04.7201068Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7201168Z  1954 |       0x80,
2025-12-04T12:35:04.7201258Z       |       ^~~~
2025-12-04T12:35:04.7202459Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7202558Z  1956 |       0x80,
2025-12-04T12:35:04.7202664Z       |       ^~~~
2025-12-04T12:35:04.7203846Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7203942Z  1958 |       0x80,
2025-12-04T12:35:04.7204047Z       |       ^~~~
2025-12-04T12:35:04.7205221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7205322Z  1960 |       0x80,
2025-12-04T12:35:04.7205430Z       |       ^~~~
2025-12-04T12:35:04.7206609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7206722Z  1962 |       0x80,
2025-12-04T12:35:04.7206814Z       |       ^~~~
2025-12-04T12:35:04.7207990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7208097Z  1964 |       0x80,
2025-12-04T12:35:04.7208187Z       |       ^~~~
2025-12-04T12:35:04.7209370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7209476Z  1966 |       0x80,
2025-12-04T12:35:04.7209567Z       |       ^~~~
2025-12-04T12:35:04.7210767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7210867Z  1968 |       0x80,
2025-12-04T12:35:04.7210959Z       |       ^~~~
2025-12-04T12:35:04.7212159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7212254Z  1970 |       0x80,
2025-12-04T12:35:04.7212358Z       |       ^~~~
2025-12-04T12:35:04.7213532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7213666Z  1972 |       0x80,
2025-12-04T12:35:04.7213772Z       |       ^~~~
2025-12-04T12:35:04.7214979Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7215120Z  1974 |       0x80,
2025-12-04T12:35:04.7215211Z       |       ^~~~
2025-12-04T12:35:04.7216496Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7216609Z  1976 |       0x80,
2025-12-04T12:35:04.7216701Z       |       ^~~~
2025-12-04T12:35:04.7218227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7218340Z  1978 |       0x80,
2025-12-04T12:35:04.7218434Z       |       ^~~~
2025-12-04T12:35:04.7219639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7219740Z  1980 |       0x80,
2025-12-04T12:35:04.7219833Z       |       ^~~~
2025-12-04T12:35:04.7221035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7221129Z  1982 |       0x80,
2025-12-04T12:35:04.7221220Z       |       ^~~~
2025-12-04T12:35:04.7222416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7222511Z  1984 |       0x80,
2025-12-04T12:35:04.7222617Z       |       ^~~~
2025-12-04T12:35:04.7223791Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7223891Z  1986 |       0x80,
2025-12-04T12:35:04.7223994Z       |       ^~~~
2025-12-04T12:35:04.7225170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7225277Z  1988 |       0x80,
2025-12-04T12:35:04.7225368Z       |       ^~~~
2025-12-04T12:35:04.7226565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7226674Z  1990 |       0x80,
2025-12-04T12:35:04.7226765Z       |       ^~~~
2025-12-04T12:35:04.7227963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7228066Z  1992 |       0x80,
2025-12-04T12:35:04.7228158Z       |       ^~~~
2025-12-04T12:35:04.7229364Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.7230656Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.7231163Z       |                                      ^~~~~~
2025-12-04T12:35:04.7233876Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = false; T = signed char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.7236582Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2109:28:   required from here
2025-12-04T12:35:04.7238579Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7239785Z  1866 |       0x80,
2025-12-04T12:35:04.7240056Z       |       ^~~~
2025-12-04T12:35:04.7241410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7242636Z  1868 |       0x80,
2025-12-04T12:35:04.7242885Z       |       ^~~~
2025-12-04T12:35:04.7244235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7245460Z  1870 |       0x80,
2025-12-04T12:35:04.7245702Z       |       ^~~~
2025-12-04T12:35:04.7247049Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7248259Z  1872 |       0x80,
2025-12-04T12:35:04.7248522Z       |       ^~~~
2025-12-04T12:35:04.7249850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7251062Z  1874 |       0x80,
2025-12-04T12:35:04.7251316Z       |       ^~~~
2025-12-04T12:35:04.7252662Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7253870Z  1876 |       0x80,
2025-12-04T12:35:04.7254128Z       |       ^~~~
2025-12-04T12:35:04.7255479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7256755Z  1878 |       0x80,
2025-12-04T12:35:04.7257020Z       |       ^~~~
2025-12-04T12:35:04.7258373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7259718Z  1880 |       0x80,
2025-12-04T12:35:04.7259964Z       |       ^~~~
2025-12-04T12:35:04.7261324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7262601Z  1882 |       0x80,
2025-12-04T12:35:04.7262861Z       |       ^~~~
2025-12-04T12:35:04.7264216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7265477Z  1884 |       0x80,
2025-12-04T12:35:04.7265732Z       |       ^~~~
2025-12-04T12:35:04.7267061Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7268277Z  1886 |       0x80,
2025-12-04T12:35:04.7268533Z       |       ^~~~
2025-12-04T12:35:04.7269941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7271311Z  1888 |       0x80,
2025-12-04T12:35:04.7271573Z       |       ^~~~
2025-12-04T12:35:04.7273005Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7274243Z  1890 |       0x80,
2025-12-04T12:35:04.7274491Z       |       ^~~~
2025-12-04T12:35:04.7275849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7277062Z  1892 |       0x80,
2025-12-04T12:35:04.7277306Z       |       ^~~~
2025-12-04T12:35:04.7278665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7279888Z  1894 |       0x80,
2025-12-04T12:35:04.7280142Z       |       ^~~~
2025-12-04T12:35:04.7281475Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7282699Z  1896 |       0x80,
2025-12-04T12:35:04.7282956Z       |       ^~~~
2025-12-04T12:35:04.7284282Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7285495Z  1898 |       0x80,
2025-12-04T12:35:04.7285760Z       |       ^~~~
2025-12-04T12:35:04.7287113Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7288313Z  1900 |       0x80,
2025-12-04T12:35:04.7288574Z       |       ^~~~
2025-12-04T12:35:04.7289923Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7291224Z  1902 |       0x80,
2025-12-04T12:35:04.7291469Z       |       ^~~~
2025-12-04T12:35:04.7292825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7294034Z  1904 |       0x80,
2025-12-04T12:35:04.7294285Z       |       ^~~~
2025-12-04T12:35:04.7295679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7296997Z  1906 |       0x80,
2025-12-04T12:35:04.7297265Z       |       ^~~~
2025-12-04T12:35:04.7298600Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7299814Z  1908 |       0x80,
2025-12-04T12:35:04.7300072Z       |       ^~~~
2025-12-04T12:35:04.7301414Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7302629Z  1910 |       0x80,
2025-12-04T12:35:04.7302960Z       |       ^~~~
2025-12-04T12:35:04.7304313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7305522Z  1912 |       0x80,
2025-12-04T12:35:04.7305811Z       |       ^~~~
2025-12-04T12:35:04.7307159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7308377Z  1914 |       0x80,
2025-12-04T12:35:04.7308619Z       |       ^~~~
2025-12-04T12:35:04.7309961Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7311165Z  1916 |       0x80,
2025-12-04T12:35:04.7311438Z       |       ^~~~
2025-12-04T12:35:04.7312764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7313971Z  1918 |       0x80,
2025-12-04T12:35:04.7314236Z       |       ^~~~
2025-12-04T12:35:04.7315557Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7316777Z  1920 |       0x80,
2025-12-04T12:35:04.7317040Z       |       ^~~~
2025-12-04T12:35:04.7318382Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7319585Z  1922 |       0x80,
2025-12-04T12:35:04.7319855Z       |       ^~~~
2025-12-04T12:35:04.7321598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7322828Z  1924 |       0x80,
2025-12-04T12:35:04.7323078Z       |       ^~~~
2025-12-04T12:35:04.7324434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7325719Z  1926 |       0x80,
2025-12-04T12:35:04.7325964Z       |       ^~~~
2025-12-04T12:35:04.7327305Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7328741Z  1928 |       0x80);
2025-12-04T12:35:04.7329241Z       |       ^~~~
2025-12-04T12:35:04.7330849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7332596Z  1930 |       0x80,
2025-12-04T12:35:04.7333055Z       |       ^~~~
2025-12-04T12:35:04.7334956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7336682Z  1932 |       0x80,
2025-12-04T12:35:04.7337119Z       |       ^~~~
2025-12-04T12:35:04.7340967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7342369Z  1934 |       0x80,
2025-12-04T12:35:04.7342644Z       |       ^~~~
2025-12-04T12:35:04.7345551Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7346908Z  1936 |       0x80,
2025-12-04T12:35:04.7347163Z       |       ^~~~
2025-12-04T12:35:04.7348566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7349812Z  1938 |       0x80,
2025-12-04T12:35:04.7350116Z       |       ^~~~
2025-12-04T12:35:04.7351570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7352799Z  1940 |       0x80,
2025-12-04T12:35:04.7353054Z       |       ^~~~
2025-12-04T12:35:04.7354380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7355599Z  1942 |       0x80,
2025-12-04T12:35:04.7355855Z       |       ^~~~
2025-12-04T12:35:04.7357361Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7358635Z  1944 |       0x80,
2025-12-04T12:35:04.7358893Z       |       ^~~~
2025-12-04T12:35:04.7360240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7361982Z  1946 |       0x80,
2025-12-04T12:35:04.7362391Z       |       ^~~~
2025-12-04T12:35:04.7364291Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7365572Z  1948 |       0x80,
2025-12-04T12:35:04.7365820Z       |       ^~~~
2025-12-04T12:35:04.7367242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7368602Z  1950 |       0x80,
2025-12-04T12:35:04.7368888Z       |       ^~~~
2025-12-04T12:35:04.7370327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7372549Z  1952 |       0x80,
2025-12-04T12:35:04.7372972Z       |       ^~~~
2025-12-04T12:35:04.7374576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7375799Z  1954 |       0x80,
2025-12-04T12:35:04.7376104Z       |       ^~~~
2025-12-04T12:35:04.7377679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7379041Z  1956 |       0x80,
2025-12-04T12:35:04.7379357Z       |       ^~~~
2025-12-04T12:35:04.7380722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7382568Z  1958 |       0x80,
2025-12-04T12:35:04.7382814Z       |       ^~~~
2025-12-04T12:35:04.7384239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7386040Z  1960 |       0x80,
2025-12-04T12:35:04.7386284Z       |       ^~~~
2025-12-04T12:35:04.7387980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7389624Z  1962 |       0x80,
2025-12-04T12:35:04.7389887Z       |       ^~~~
2025-12-04T12:35:04.7391527Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7393464Z  1964 |       0x80,
2025-12-04T12:35:04.7393722Z       |       ^~~~
2025-12-04T12:35:04.7395621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7397192Z  1966 |       0x80,
2025-12-04T12:35:04.7397448Z       |       ^~~~
2025-12-04T12:35:04.7398865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7400623Z  1968 |       0x80,
2025-12-04T12:35:04.7401054Z       |       ^~~~
2025-12-04T12:35:04.7403155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7404376Z  1970 |       0x80,
2025-12-04T12:35:04.7404637Z       |       ^~~~
2025-12-04T12:35:04.7406254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7407757Z  1972 |       0x80,
2025-12-04T12:35:04.7408006Z       |       ^~~~
2025-12-04T12:35:04.7410111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7412157Z  1974 |       0x80,
2025-12-04T12:35:04.7412460Z       |       ^~~~
2025-12-04T12:35:04.7414334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7415674Z  1976 |       0x80,
2025-12-04T12:35:04.7415938Z       |       ^~~~
2025-12-04T12:35:04.7417418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7418633Z  1978 |       0x80,
2025-12-04T12:35:04.7418894Z       |       ^~~~
2025-12-04T12:35:04.7420229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7421456Z  1980 |       0x80,
2025-12-04T12:35:04.7421715Z       |       ^~~~
2025-12-04T12:35:04.7423066Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7424262Z  1982 |       0x80,
2025-12-04T12:35:04.7424527Z       |       ^~~~
2025-12-04T12:35:04.7425875Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7427097Z  1984 |       0x80,
2025-12-04T12:35:04.7427337Z       |       ^~~~
2025-12-04T12:35:04.7428672Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7429888Z  1986 |       0x80,
2025-12-04T12:35:04.7430127Z       |       ^~~~
2025-12-04T12:35:04.7431472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7432685Z  1988 |       0x80,
2025-12-04T12:35:04.7432941Z       |       ^~~~
2025-12-04T12:35:04.7434265Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7435489Z  1990 |       0x80,
2025-12-04T12:35:04.7435743Z       |       ^~~~
2025-12-04T12:35:04.7437079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7438289Z  1992 |       0x80,
2025-12-04T12:35:04.7438544Z       |       ^~~~
2025-12-04T12:35:04.7439898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.7441157Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.7441564Z       |                                      ^~~~~~
2025-12-04T12:35:04.7444270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = false; T = unsigned char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.7446957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2116:28:   required from here
2025-12-04T12:35:04.7448908Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7450147Z  1866 |       0x80,
2025-12-04T12:35:04.7450407Z       |       ^~~~
2025-12-04T12:35:04.7451791Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7453014Z  1868 |       0x80,
2025-12-04T12:35:04.7453261Z       |       ^~~~
2025-12-04T12:35:04.7454609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7455822Z  1870 |       0x80,
2025-12-04T12:35:04.7456063Z       |       ^~~~
2025-12-04T12:35:04.7457490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7458707Z  1872 |       0x80,
2025-12-04T12:35:04.7458971Z       |       ^~~~
2025-12-04T12:35:04.7460307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7461521Z  1874 |       0x80,
2025-12-04T12:35:04.7461785Z       |       ^~~~
2025-12-04T12:35:04.7463126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7464726Z  1876 |       0x80,
2025-12-04T12:35:04.7465135Z       |       ^~~~
2025-12-04T12:35:04.7467492Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7468838Z  1878 |       0x80,
2025-12-04T12:35:04.7469082Z       |       ^~~~
2025-12-04T12:35:04.7470773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7472358Z  1880 |       0x80,
2025-12-04T12:35:04.7472770Z       |       ^~~~
2025-12-04T12:35:04.7474526Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7475921Z  1882 |       0x80,
2025-12-04T12:35:04.7476251Z       |       ^~~~
2025-12-04T12:35:04.7477605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7478930Z  1884 |       0x80,
2025-12-04T12:35:04.7479188Z       |       ^~~~
2025-12-04T12:35:04.7481116Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7483213Z  1886 |       0x80,
2025-12-04T12:35:04.7483628Z       |       ^~~~
2025-12-04T12:35:04.7485142Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7486485Z  1888 |       0x80,
2025-12-04T12:35:04.7486764Z       |       ^~~~
2025-12-04T12:35:04.7488674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7490271Z  1890 |       0x80,
2025-12-04T12:35:04.7490521Z       |       ^~~~
2025-12-04T12:35:04.7492584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7494281Z  1892 |       0x80,
2025-12-04T12:35:04.7494688Z       |       ^~~~
2025-12-04T12:35:04.7496482Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7509109Z  1894 |       0x80,
2025-12-04T12:35:04.7509413Z       |       ^~~~
2025-12-04T12:35:04.7510931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7512628Z  1896 |       0x80,
2025-12-04T12:35:04.7513056Z       |       ^~~~
2025-12-04T12:35:04.7515120Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7516809Z  1898 |       0x80,
2025-12-04T12:35:04.7517084Z       |       ^~~~
2025-12-04T12:35:04.7518521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7519926Z  1900 |       0x80,
2025-12-04T12:35:04.7520345Z       |       ^~~~
2025-12-04T12:35:04.7522378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7524295Z  1902 |       0x80,
2025-12-04T12:35:04.7524630Z       |       ^~~~
2025-12-04T12:35:04.7526133Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7527615Z  1904 |       0x80,
2025-12-04T12:35:04.7527982Z       |       ^~~~
2025-12-04T12:35:04.7529865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7531606Z  1906 |       0x80,
2025-12-04T12:35:04.7532028Z       |       ^~~~
2025-12-04T12:35:04.7533743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7535355Z  1908 |       0x80,
2025-12-04T12:35:04.7535772Z       |       ^~~~
2025-12-04T12:35:04.7537496Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7538711Z  1910 |       0x80,
2025-12-04T12:35:04.7539044Z       |       ^~~~
2025-12-04T12:35:04.7541371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7543417Z  1912 |       0x80,
2025-12-04T12:35:04.7543675Z       |       ^~~~
2025-12-04T12:35:04.7545250Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7546855Z  1914 |       0x80,
2025-12-04T12:35:04.7547098Z       |       ^~~~
2025-12-04T12:35:04.7548895Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7550126Z  1916 |       0x80,
2025-12-04T12:35:04.7550544Z       |       ^~~~
2025-12-04T12:35:04.7552859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7554783Z  1918 |       0x80,
2025-12-04T12:35:04.7555061Z       |       ^~~~
2025-12-04T12:35:04.7556554Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7558170Z  1920 |       0x80,
2025-12-04T12:35:04.7558431Z       |       ^~~~
2025-12-04T12:35:04.7560133Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7561441Z  1922 |       0x80,
2025-12-04T12:35:04.7561696Z       |       ^~~~
2025-12-04T12:35:04.7563042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7564941Z  1924 |       0x80,
2025-12-04T12:35:04.7565349Z       |       ^~~~
2025-12-04T12:35:04.7567606Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7568899Z  1926 |       0x80,
2025-12-04T12:35:04.7569146Z       |       ^~~~
2025-12-04T12:35:04.7570572Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7572407Z  1928 |       0x80);
2025-12-04T12:35:04.7572724Z       |       ^~~~
2025-12-04T12:35:04.7575093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7576839Z  1930 |       0x80,
2025-12-04T12:35:04.7577115Z       |       ^~~~
2025-12-04T12:35:04.7578858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7580275Z  1932 |       0x80,
2025-12-04T12:35:04.7580536Z       |       ^~~~
2025-12-04T12:35:04.7582189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7583483Z  1934 |       0x80,
2025-12-04T12:35:04.7583745Z       |       ^~~~
2025-12-04T12:35:04.7585098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7586307Z  1936 |       0x80,
2025-12-04T12:35:04.7586550Z       |       ^~~~
2025-12-04T12:35:04.7587896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7589099Z  1938 |       0x80,
2025-12-04T12:35:04.7589342Z       |       ^~~~
2025-12-04T12:35:04.7590687Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7591897Z  1940 |       0x80,
2025-12-04T12:35:04.7592149Z       |       ^~~~
2025-12-04T12:35:04.7593474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7594684Z  1942 |       0x80,
2025-12-04T12:35:04.7594943Z       |       ^~~~
2025-12-04T12:35:04.7596275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7597484Z  1944 |       0x80,
2025-12-04T12:35:04.7597747Z       |       ^~~~
2025-12-04T12:35:04.7599091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7600290Z  1946 |       0x80,
2025-12-04T12:35:04.7600545Z       |       ^~~~
2025-12-04T12:35:04.7601892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7603098Z  1948 |       0x80,
2025-12-04T12:35:04.7603348Z       |       ^~~~
2025-12-04T12:35:04.7604698Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7605909Z  1950 |       0x80,
2025-12-04T12:35:04.7606160Z       |       ^~~~
2025-12-04T12:35:04.7607504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7608796Z  1952 |       0x80,
2025-12-04T12:35:04.7609049Z       |       ^~~~
2025-12-04T12:35:04.7610386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7611593Z  1954 |       0x80,
2025-12-04T12:35:04.7611930Z       |       ^~~~
2025-12-04T12:35:04.7613271Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7614472Z  1956 |       0x80,
2025-12-04T12:35:04.7614771Z       |       ^~~~
2025-12-04T12:35:04.7616117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7617400Z  1958 |       0x80,
2025-12-04T12:35:04.7617661Z       |       ^~~~
2025-12-04T12:35:04.7619016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7620238Z  1960 |       0x80,
2025-12-04T12:35:04.7620499Z       |       ^~~~
2025-12-04T12:35:04.7621851Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7623059Z  1962 |       0x80,
2025-12-04T12:35:04.7623311Z       |       ^~~~
2025-12-04T12:35:04.7624655Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7625880Z  1964 |       0x80,
2025-12-04T12:35:04.7626142Z       |       ^~~~
2025-12-04T12:35:04.7627472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7628692Z  1966 |       0x80,
2025-12-04T12:35:04.7628956Z       |       ^~~~
2025-12-04T12:35:04.7630305Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7631509Z  1968 |       0x80,
2025-12-04T12:35:04.7631769Z       |       ^~~~
2025-12-04T12:35:04.7633119Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7634345Z  1970 |       0x80,
2025-12-04T12:35:04.7634591Z       |       ^~~~
2025-12-04T12:35:04.7635934Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7637154Z  1972 |       0x80,
2025-12-04T12:35:04.7637398Z       |       ^~~~
2025-12-04T12:35:04.7638737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7639948Z  1974 |       0x80,
2025-12-04T12:35:04.7640200Z       |       ^~~~
2025-12-04T12:35:04.7641528Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7642828Z  1976 |       0x80,
2025-12-04T12:35:04.7643085Z       |       ^~~~
2025-12-04T12:35:04.7644403Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7645681Z  1978 |       0x80,
2025-12-04T12:35:04.7645944Z       |       ^~~~
2025-12-04T12:35:04.7647297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7648529Z  1980 |       0x80,
2025-12-04T12:35:04.7648788Z       |       ^~~~
2025-12-04T12:35:04.7650134Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7651344Z  1982 |       0x80,
2025-12-04T12:35:04.7651583Z       |       ^~~~
2025-12-04T12:35:04.7652921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7654139Z  1984 |       0x80,
2025-12-04T12:35:04.7654382Z       |       ^~~~
2025-12-04T12:35:04.7655720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7657004Z  1986 |       0x80,
2025-12-04T12:35:04.7657265Z       |       ^~~~
2025-12-04T12:35:04.7658602Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7659826Z  1988 |       0x80,
2025-12-04T12:35:04.7660082Z       |       ^~~~
2025-12-04T12:35:04.7661427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7662633Z  1990 |       0x80,
2025-12-04T12:35:04.7662891Z       |       ^~~~
2025-12-04T12:35:04.7664235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7665433Z  1992 |       0x80,
2025-12-04T12:35:04.7665689Z       |       ^~~~
2025-12-04T12:35:04.7667030Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.7668306Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.7668701Z       |                                      ^~~~~~
2025-12-04T12:35:04.7669462Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:16,
2025-12-04T12:35:04.7670489Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.7671657Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.7672646Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.7673664Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.7675017Z                  from /tmp/wzlxAD/tmp2gno_q_y/data/aotinductor/model1/cji6fcfpjxr5ad3oypbruxr5r26niflgwwkmd5rthzuhxclq6uis.wrapper.cpp:751:
2025-12-04T12:35:04.7677313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = signed char; int64_t = long int]’:
2025-12-04T12:35:04.7679278Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:696:31:   required from here
2025-12-04T12:35:04.7681251Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7682500Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7682839Z       |       ^~~~
2025-12-04T12:35:04.7684187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7685425Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7685765Z       |             ^~~~
2025-12-04T12:35:04.7687144Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7688370Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7688712Z       |                   ^~~~
2025-12-04T12:35:04.7690113Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7691343Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7691672Z       |                         ^~~~
2025-12-04T12:35:04.7693085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7694316Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7694637Z       |       ^~~~
2025-12-04T12:35:04.7695997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7697305Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7697643Z       |             ^~~~
2025-12-04T12:35:04.7699014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7700299Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7700636Z       |                   ^~~~
2025-12-04T12:35:04.7702026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7703245Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7703637Z       |                         ^~~~
2025-12-04T12:35:04.7705045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7706289Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7706621Z       |       ^~~~
2025-12-04T12:35:04.7707972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7709222Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7709546Z       |             ^~~~
2025-12-04T12:35:04.7710969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7712220Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7712563Z       |                   ^~~~
2025-12-04T12:35:04.7713985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7715225Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7715574Z       |                         ^~~~
2025-12-04T12:35:04.7716990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7718215Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7718552Z       |       ^~~~
2025-12-04T12:35:04.7719916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7721153Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7721479Z       |             ^~~~
2025-12-04T12:35:04.7722869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7724111Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7724442Z       |                   ^~~~
2025-12-04T12:35:04.7725830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7727060Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7727403Z       |                         ^~~~
2025-12-04T12:35:04.7728816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7730048Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7730374Z       |       ^~~~
2025-12-04T12:35:04.7731738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7733008Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7733343Z       |             ^~~~
2025-12-04T12:35:04.7734726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7735959Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7736402Z       |                   ^~~~
2025-12-04T12:35:04.7737817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7739057Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7739383Z       |                         ^~~~
2025-12-04T12:35:04.7740795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7742046Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7742381Z       |       ^~~~
2025-12-04T12:35:04.7743773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7745011Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7745342Z       |             ^~~~
2025-12-04T12:35:04.7746778Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7748007Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7748347Z       |                   ^~~~
2025-12-04T12:35:04.7749736Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7750978Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7751306Z       |                         ^~~~
2025-12-04T12:35:04.7752730Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7753965Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7754281Z       |       ^~~~
2025-12-04T12:35:04.7755648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7756880Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7757221Z       |             ^~~~
2025-12-04T12:35:04.7758579Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7759810Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7760152Z       |                   ^~~~
2025-12-04T12:35:04.7761550Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7762776Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7763117Z       |                         ^~~~
2025-12-04T12:35:04.7764522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7765804Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7766122Z       |       ^~~~
2025-12-04T12:35:04.7767468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7768741Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7769097Z       |             ^~~~
2025-12-04T12:35:04.7770476Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7772010Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7772355Z       |                   ^~~~
2025-12-04T12:35:04.7773741Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7774979Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7775319Z       |                         ^~~~
2025-12-04T12:35:04.7776785Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7776905Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7776998Z       |       ^~~~
2025-12-04T12:35:04.7778208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7778321Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7778419Z       |             ^~~~
2025-12-04T12:35:04.7779618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7779737Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7779853Z       |                   ^~~~
2025-12-04T12:35:04.7781033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7781151Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7781266Z       |                         ^~~~
2025-12-04T12:35:04.7782450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7782576Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7782677Z       |       ^~~~
2025-12-04T12:35:04.7783856Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7783981Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7784079Z       |             ^~~~
2025-12-04T12:35:04.7785281Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7785399Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7785510Z       |                   ^~~~
2025-12-04T12:35:04.7786707Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7786888Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7786990Z       |                         ^~~~
2025-12-04T12:35:04.7788194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7788362Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7788519Z       |       ^~~~
2025-12-04T12:35:04.7789702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7789852Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7789966Z       |             ^~~~
2025-12-04T12:35:04.7791151Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7791281Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7791384Z       |                   ^~~~
2025-12-04T12:35:04.7792565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7792696Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7792798Z       |                         ^~~~
2025-12-04T12:35:04.7793998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7794112Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7794205Z       |       ^~~~
2025-12-04T12:35:04.7795398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7795516Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7795614Z       |             ^~~~
2025-12-04T12:35:04.7796815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7796932Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7797046Z       |                   ^~~~
2025-12-04T12:35:04.7798235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7798347Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7798469Z       |                         ^~~~
2025-12-04T12:35:04.7799957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = unsigned char; int64_t = long int]’:
2025-12-04T12:35:04.7800553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:933:31:   required from here
2025-12-04T12:35:04.7801743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7801866Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7801977Z       |       ^~~~
2025-12-04T12:35:04.7803163Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7803341Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7803441Z       |             ^~~~
2025-12-04T12:35:04.7804661Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7804822Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7804924Z       |                   ^~~~
2025-12-04T12:35:04.7806156Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7806270Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7806373Z       |                         ^~~~
2025-12-04T12:35:04.7807566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7807684Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7807779Z       |       ^~~~
2025-12-04T12:35:04.7808980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7809097Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7809207Z       |             ^~~~
2025-12-04T12:35:04.7810394Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7810506Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7810623Z       |                   ^~~~
2025-12-04T12:35:04.7811817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7811941Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7812040Z       |                         ^~~~
2025-12-04T12:35:04.7813231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7813368Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7813461Z       |       ^~~~
2025-12-04T12:35:04.7814656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7814776Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7814871Z       |             ^~~~
2025-12-04T12:35:04.7816059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7816172Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7816343Z       |                   ^~~~
2025-12-04T12:35:04.7817552Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7817673Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7817790Z       |                         ^~~~
2025-12-04T12:35:04.7818962Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7819123Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7819231Z       |       ^~~~
2025-12-04T12:35:04.7820449Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7820628Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7820725Z       |             ^~~~
2025-12-04T12:35:04.7821942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7822073Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7822174Z       |                   ^~~~
2025-12-04T12:35:04.7823371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7823490Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7823590Z       |                         ^~~~
2025-12-04T12:35:04.7824786Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7824905Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7824998Z       |       ^~~~
2025-12-04T12:35:04.7826203Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7826316Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7826433Z       |             ^~~~
2025-12-04T12:35:04.7827609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7827721Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7827836Z       |                   ^~~~
2025-12-04T12:35:04.7829023Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7829158Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7829258Z       |                         ^~~~
2025-12-04T12:35:04.7830442Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7830573Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7830666Z       |       ^~~~
2025-12-04T12:35:04.7831854Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7831966Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7832110Z       |             ^~~~
2025-12-04T12:35:04.7833316Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7833437Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7833538Z       |                   ^~~~
2025-12-04T12:35:04.7834742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7834894Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7835009Z       |                         ^~~~
2025-12-04T12:35:04.7836219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7836340Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7836447Z       |       ^~~~
2025-12-04T12:35:04.7837670Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7837795Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7837891Z       |             ^~~~
2025-12-04T12:35:04.7839073Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7839208Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7839309Z       |                   ^~~~
2025-12-04T12:35:04.7840490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7840622Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7840727Z       |                         ^~~~
2025-12-04T12:35:04.7841925Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7842039Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7842139Z       |       ^~~~
2025-12-04T12:35:04.7843330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7843443Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7843554Z       |             ^~~~
2025-12-04T12:35:04.7844738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7844855Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7844969Z       |                   ^~~~
2025-12-04T12:35:04.7846155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7846324Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7846426Z       |                         ^~~~
2025-12-04T12:35:04.7847605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7847738Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7847866Z       |       ^~~~
2025-12-04T12:35:04.7849050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7849185Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7849281Z       |             ^~~~
2025-12-04T12:35:04.7850473Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7850592Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7850691Z       |                   ^~~~
2025-12-04T12:35:04.7851925Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7852045Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7852159Z       |                         ^~~~
2025-12-04T12:35:04.7853457Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7853572Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7853682Z       |       ^~~~
2025-12-04T12:35:04.7854865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7854999Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7855100Z       |             ^~~~
2025-12-04T12:35:04.7856349Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7856489Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7856588Z       |                   ^~~~
2025-12-04T12:35:04.7857784Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7857913Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7858020Z       |                         ^~~~
2025-12-04T12:35:04.7859216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7859328Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7859421Z       |       ^~~~
2025-12-04T12:35:04.7860625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7860744Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7860853Z       |             ^~~~
2025-12-04T12:35:04.7862036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7862201Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7862314Z       |                   ^~~~
2025-12-04T12:35:04.7863512Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7863643Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7863780Z       |                         ^~~~
2025-12-04T12:35:04.7864953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7865086Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7865179Z       |       ^~~~
2025-12-04T12:35:04.7866358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7866491Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7866587Z       |             ^~~~
2025-12-04T12:35:04.7867824Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7867944Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7868044Z       |                   ^~~~
2025-12-04T12:35:04.7869273Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7869387Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.7869501Z       |                         ^~~~
2025-12-04T12:35:04.7870045Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_float.h:12,
2025-12-04T12:35:04.7870486Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:11,
2025-12-04T12:35:04.7870867Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.7871554Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.7871979Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.7872448Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.7873091Z                  from /tmp/U7W6v5/tmp2gno_q_y/data/aotinductor/model2/cyss5jazqjsvp5s2t3ihlofugodyzirark5aiimqjwirn4hylxbp.wrapper.cpp:656:
2025-12-04T12:35:04.7873714Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/sleef.h:192:10: warning: ISO C++ prohibits anonymous structs [-Wpedantic]
2025-12-04T12:35:04.7873812Z   192 |   struct {
2025-12-04T12:35:04.7873910Z       |          ^
2025-12-04T12:35:04.7874424Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15,
2025-12-04T12:35:04.7874797Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.7875255Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.7875663Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.7876124Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.7876859Z                  from /tmp/U7W6v5/tmp2gno_q_y/data/aotinductor/model2/cyss5jazqjsvp5s2t3ihlofugodyzirark5aiimqjwirn4hylxbp.wrapper.cpp:656:
2025-12-04T12:35:04.7879194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<short int>&, const at::vec::CPU_CAPABILITY::Vectorized<short int>&, const at::vec::CPU_CAPABILITY::Vectorized<short int>&)’:
2025-12-04T12:35:04.7880481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:544:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.7880638Z   544 |     auto msb_one = _mm512_set1_epi16(0xFFFF);
2025-12-04T12:35:04.7880772Z       |                                      ^~~~~~
2025-12-04T12:35:04.7881281Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15,
2025-12-04T12:35:04.7881649Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.7882104Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.7882518Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.7882993Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.7883639Z                  from /tmp/U7W6v5/tmp2gno_q_y/data/aotinductor/model2/cyss5jazqjsvp5s2t3ihlofugodyzirark5aiimqjwirn4hylxbp.wrapper.cpp:656:
2025-12-04T12:35:04.7885280Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.7886468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:697:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.7886689Z   697 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.7886828Z       |                                                      ^~~~~~
2025-12-04T12:35:04.7888447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.7889621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:701:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.7889831Z   701 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.7889970Z       |                                                      ^~~~~~
2025-12-04T12:35:04.7891604Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.7892775Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:705:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.7892994Z   705 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.7893163Z       |                                                      ^~~~~~
2025-12-04T12:35:04.7894793Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.7895987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:709:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.7896227Z   709 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.7896479Z       |                                                      ^~~~~~
2025-12-04T12:35:04.7898105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator>(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.7899285Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:713:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.7899497Z   713 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.7899628Z       |                                                      ^~~~~~
2025-12-04T12:35:04.7901266Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator>=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.7902431Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:717:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.7902660Z   717 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.7902787Z       |                                                      ^~~~~~
2025-12-04T12:35:04.7905056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&, const at::vec::CPU_CAPABILITY::Vectorized<signed char>&, const at::vec::CPU_CAPABILITY::Vectorized<signed char>&)’:
2025-12-04T12:35:04.7906266Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1153:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7906422Z  1153 |     auto msb_one = _mm512_set1_epi8(0xFF);
2025-12-04T12:35:04.7906554Z       |                                     ^~~~
2025-12-04T12:35:04.7908221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.7909432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1166:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7909648Z  1166 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.7909790Z       |                                                     ^~~~
2025-12-04T12:35:04.7911440Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.7912712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1170:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7912964Z  1170 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.7913090Z       |                                                     ^~~~
2025-12-04T12:35:04.7914796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.7915990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1174:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7916209Z  1174 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.7916329Z       |                                                     ^~~~
2025-12-04T12:35:04.7917988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.7919206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1178:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7919409Z  1178 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.7919552Z       |                                                     ^~~~
2025-12-04T12:35:04.7921888Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&, const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&, const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&)’:
2025-12-04T12:35:04.7923105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1207:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7923257Z  1207 |     auto msb_one = _mm512_set1_epi8(0xFF);
2025-12-04T12:35:04.7923371Z       |                                     ^~~~
2025-12-04T12:35:04.7925087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.7926289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1220:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7926512Z  1220 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.7926637Z       |                                                     ^~~~
2025-12-04T12:35:04.7928332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.7929570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1224:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7929774Z  1224 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.7929953Z       |                                                     ^~~~
2025-12-04T12:35:04.7931718Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.7932925Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1228:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7933130Z  1228 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.7933250Z       |                                                     ^~~~
2025-12-04T12:35:04.7934952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.7936150Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1232:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.7936433Z  1232 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.7936563Z       |                                                     ^~~~
2025-12-04T12:35:04.7938971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = true; T = signed char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.7939570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2074:27:   required from here
2025-12-04T12:35:04.7940761Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7940873Z  1866 |       0x80,
2025-12-04T12:35:04.7940970Z       |       ^~~~
2025-12-04T12:35:04.7942160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7942262Z  1868 |       0x80,
2025-12-04T12:35:04.7942355Z       |       ^~~~
2025-12-04T12:35:04.7943556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7943657Z  1870 |       0x80,
2025-12-04T12:35:04.7943753Z       |       ^~~~
2025-12-04T12:35:04.7944953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7945048Z  1872 |       0x80,
2025-12-04T12:35:04.7945157Z       |       ^~~~
2025-12-04T12:35:04.7946335Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7946499Z  1874 |       0x80,
2025-12-04T12:35:04.7946606Z       |       ^~~~
2025-12-04T12:35:04.7947825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7947969Z  1876 |       0x80,
2025-12-04T12:35:04.7948062Z       |       ^~~~
2025-12-04T12:35:04.7949272Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7949387Z  1878 |       0x80,
2025-12-04T12:35:04.7949480Z       |       ^~~~
2025-12-04T12:35:04.7950650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7950765Z  1880 |       0x80,
2025-12-04T12:35:04.7950857Z       |       ^~~~
2025-12-04T12:35:04.7952048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7952149Z  1882 |       0x80,
2025-12-04T12:35:04.7952242Z       |       ^~~~
2025-12-04T12:35:04.7953431Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7953525Z  1884 |       0x80,
2025-12-04T12:35:04.7953629Z       |       ^~~~
2025-12-04T12:35:04.7954803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7954903Z  1886 |       0x80,
2025-12-04T12:35:04.7955010Z       |       ^~~~
2025-12-04T12:35:04.7956187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7956284Z  1888 |       0x80,
2025-12-04T12:35:04.7956394Z       |       ^~~~
2025-12-04T12:35:04.7957581Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7957690Z  1890 |       0x80,
2025-12-04T12:35:04.7957784Z       |       ^~~~
2025-12-04T12:35:04.7958963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7959072Z  1892 |       0x80,
2025-12-04T12:35:04.7959163Z       |       ^~~~
2025-12-04T12:35:04.7960353Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7960451Z  1894 |       0x80,
2025-12-04T12:35:04.7960542Z       |       ^~~~
2025-12-04T12:35:04.7961738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7961832Z  1896 |       0x80,
2025-12-04T12:35:04.7961922Z       |       ^~~~
2025-12-04T12:35:04.7963152Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7963246Z  1898 |       0x80,
2025-12-04T12:35:04.7963356Z       |       ^~~~
2025-12-04T12:35:04.7964567Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7964692Z  1900 |       0x80,
2025-12-04T12:35:04.7964795Z       |       ^~~~
2025-12-04T12:35:04.7966005Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7966113Z  1902 |       0x80,
2025-12-04T12:35:04.7966205Z       |       ^~~~
2025-12-04T12:35:04.7967385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7967493Z  1904 |       0x80,
2025-12-04T12:35:04.7967584Z       |       ^~~~
2025-12-04T12:35:04.7968760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7968871Z  1906 |       0x80,
2025-12-04T12:35:04.7968962Z       |       ^~~~
2025-12-04T12:35:04.7970152Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7970246Z  1908 |       0x80,
2025-12-04T12:35:04.7970337Z       |       ^~~~
2025-12-04T12:35:04.7971721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7971816Z  1910 |       0x80,
2025-12-04T12:35:04.7971908Z       |       ^~~~
2025-12-04T12:35:04.7973107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7973207Z  1912 |       0x80,
2025-12-04T12:35:04.7973313Z       |       ^~~~
2025-12-04T12:35:04.7974494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7974591Z  1914 |       0x80,
2025-12-04T12:35:04.7974773Z       |       ^~~~
2025-12-04T12:35:04.7975953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7976061Z  1916 |       0x80,
2025-12-04T12:35:04.7976153Z       |       ^~~~
2025-12-04T12:35:04.7977394Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7977561Z  1918 |       0x80,
2025-12-04T12:35:04.7977652Z       |       ^~~~
2025-12-04T12:35:04.7978855Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7978948Z  1920 |       0x80,
2025-12-04T12:35:04.7979046Z       |       ^~~~
2025-12-04T12:35:04.7980231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7980326Z  1922 |       0x80,
2025-12-04T12:35:04.7980417Z       |       ^~~~
2025-12-04T12:35:04.7981652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7981754Z  1924 |       0x80,
2025-12-04T12:35:04.7981859Z       |       ^~~~
2025-12-04T12:35:04.7983087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7983182Z  1926 |       0x80,
2025-12-04T12:35:04.7983294Z       |       ^~~~
2025-12-04T12:35:04.7984472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7984582Z  1928 |       0x80);
2025-12-04T12:35:04.7984677Z       |       ^~~~
2025-12-04T12:35:04.7985854Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7985970Z  1930 |       0x80,
2025-12-04T12:35:04.7986062Z       |       ^~~~
2025-12-04T12:35:04.7987241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7987352Z  1932 |       0x80,
2025-12-04T12:35:04.7987450Z       |       ^~~~
2025-12-04T12:35:04.7988640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7988734Z  1934 |       0x80,
2025-12-04T12:35:04.7988824Z       |       ^~~~
2025-12-04T12:35:04.7990018Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7990118Z  1936 |       0x80,
2025-12-04T12:35:04.7990213Z       |       ^~~~
2025-12-04T12:35:04.7991404Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7991497Z  1938 |       0x80,
2025-12-04T12:35:04.7991643Z       |       ^~~~
2025-12-04T12:35:04.7992822Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7992917Z  1940 |       0x80,
2025-12-04T12:35:04.7993021Z       |       ^~~~
2025-12-04T12:35:04.7994201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7994350Z  1942 |       0x80,
2025-12-04T12:35:04.7994443Z       |       ^~~~
2025-12-04T12:35:04.7995623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7995736Z  1944 |       0x80,
2025-12-04T12:35:04.7995829Z       |       ^~~~
2025-12-04T12:35:04.7997000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7997107Z  1946 |       0x80,
2025-12-04T12:35:04.7997198Z       |       ^~~~
2025-12-04T12:35:04.7998430Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7998527Z  1948 |       0x80,
2025-12-04T12:35:04.7998618Z       |       ^~~~
2025-12-04T12:35:04.7999859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.7999960Z  1950 |       0x80,
2025-12-04T12:35:04.8000066Z       |       ^~~~
2025-12-04T12:35:04.8001241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8001335Z  1952 |       0x80,
2025-12-04T12:35:04.8001443Z       |       ^~~~
2025-12-04T12:35:04.8002628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8002723Z  1954 |       0x80,
2025-12-04T12:35:04.8002830Z       |       ^~~~
2025-12-04T12:35:04.8004010Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8004126Z  1956 |       0x80,
2025-12-04T12:35:04.8004219Z       |       ^~~~
2025-12-04T12:35:04.8005392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8005504Z  1958 |       0x80,
2025-12-04T12:35:04.8005599Z       |       ^~~~
2025-12-04T12:35:04.8006799Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8006897Z  1960 |       0x80,
2025-12-04T12:35:04.8006992Z       |       ^~~~
2025-12-04T12:35:04.8008183Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8008319Z  1962 |       0x80,
2025-12-04T12:35:04.8008411Z       |       ^~~~
2025-12-04T12:35:04.8009605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8009705Z  1964 |       0x80,
2025-12-04T12:35:04.8009852Z       |       ^~~~
2025-12-04T12:35:04.8011063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8011160Z  1966 |       0x80,
2025-12-04T12:35:04.8011306Z       |       ^~~~
2025-12-04T12:35:04.8012480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8012594Z  1968 |       0x80,
2025-12-04T12:35:04.8012686Z       |       ^~~~
2025-12-04T12:35:04.8013863Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8013974Z  1970 |       0x80,
2025-12-04T12:35:04.8014081Z       |       ^~~~
2025-12-04T12:35:04.8015253Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8015360Z  1972 |       0x80,
2025-12-04T12:35:04.8015458Z       |       ^~~~
2025-12-04T12:35:04.8016709Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8016815Z  1974 |       0x80,
2025-12-04T12:35:04.8016908Z       |       ^~~~
2025-12-04T12:35:04.8018103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8018198Z  1976 |       0x80,
2025-12-04T12:35:04.8018305Z       |       ^~~~
2025-12-04T12:35:04.8019493Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8019585Z  1978 |       0x80,
2025-12-04T12:35:04.8019697Z       |       ^~~~
2025-12-04T12:35:04.8020869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8020968Z  1980 |       0x80,
2025-12-04T12:35:04.8021074Z       |       ^~~~
2025-12-04T12:35:04.8022252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8022363Z  1982 |       0x80,
2025-12-04T12:35:04.8022464Z       |       ^~~~
2025-12-04T12:35:04.8023635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8023742Z  1984 |       0x80,
2025-12-04T12:35:04.8023841Z       |       ^~~~
2025-12-04T12:35:04.8025016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8025176Z  1986 |       0x80,
2025-12-04T12:35:04.8025267Z       |       ^~~~
2025-12-04T12:35:04.8026458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8026588Z  1988 |       0x80,
2025-12-04T12:35:04.8026716Z       |       ^~~~
2025-12-04T12:35:04.8027906Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8028032Z  1990 |       0x80,
2025-12-04T12:35:04.8028139Z       |       ^~~~
2025-12-04T12:35:04.8029312Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8029411Z  1992 |       0x80,
2025-12-04T12:35:04.8029516Z       |       ^~~~
2025-12-04T12:35:04.8030702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.8030881Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.8031006Z       |                                      ^~~~~~
2025-12-04T12:35:04.8033413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = true; T = unsigned char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.8034012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2081:27:   required from here
2025-12-04T12:35:04.8035201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8035321Z  1866 |       0x80,
2025-12-04T12:35:04.8035417Z       |       ^~~~
2025-12-04T12:35:04.8036593Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8036709Z  1868 |       0x80,
2025-12-04T12:35:04.8036803Z       |       ^~~~
2025-12-04T12:35:04.8037989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8038088Z  1870 |       0x80,
2025-12-04T12:35:04.8038181Z       |       ^~~~
2025-12-04T12:35:04.8039371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8039471Z  1872 |       0x80,
2025-12-04T12:35:04.8039561Z       |       ^~~~
2025-12-04T12:35:04.8040753Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8040847Z  1874 |       0x80,
2025-12-04T12:35:04.8040953Z       |       ^~~~
2025-12-04T12:35:04.8042126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8042264Z  1876 |       0x80,
2025-12-04T12:35:04.8042368Z       |       ^~~~
2025-12-04T12:35:04.8043574Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8043714Z  1878 |       0x80,
2025-12-04T12:35:04.8043806Z       |       ^~~~
2025-12-04T12:35:04.8045028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8045142Z  1880 |       0x80,
2025-12-04T12:35:04.8045235Z       |       ^~~~
2025-12-04T12:35:04.8046410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8046524Z  1882 |       0x80,
2025-12-04T12:35:04.8046617Z       |       ^~~~
2025-12-04T12:35:04.8047810Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8047910Z  1884 |       0x80,
2025-12-04T12:35:04.8048002Z       |       ^~~~
2025-12-04T12:35:04.8049190Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8049285Z  1886 |       0x80,
2025-12-04T12:35:04.8049388Z       |       ^~~~
2025-12-04T12:35:04.8050559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8050664Z  1888 |       0x80,
2025-12-04T12:35:04.8050772Z       |       ^~~~
2025-12-04T12:35:04.8051949Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8052050Z  1890 |       0x80,
2025-12-04T12:35:04.8052155Z       |       ^~~~
2025-12-04T12:35:04.8053331Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8053438Z  1892 |       0x80,
2025-12-04T12:35:04.8053530Z       |       ^~~~
2025-12-04T12:35:04.8054706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8054858Z  1894 |       0x80,
2025-12-04T12:35:04.8054949Z       |       ^~~~
2025-12-04T12:35:04.8056140Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8056273Z  1896 |       0x80,
2025-12-04T12:35:04.8056436Z       |       ^~~~
2025-12-04T12:35:04.8057643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8057737Z  1898 |       0x80,
2025-12-04T12:35:04.8057831Z       |       ^~~~
2025-12-04T12:35:04.8059019Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8059119Z  1900 |       0x80,
2025-12-04T12:35:04.8059226Z       |       ^~~~
2025-12-04T12:35:04.8060460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8060562Z  1902 |       0x80,
2025-12-04T12:35:04.8060669Z       |       ^~~~
2025-12-04T12:35:04.8061878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8061987Z  1904 |       0x80,
2025-12-04T12:35:04.8062079Z       |       ^~~~
2025-12-04T12:35:04.8063254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8063367Z  1906 |       0x80,
2025-12-04T12:35:04.8063458Z       |       ^~~~
2025-12-04T12:35:04.8064634Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8064746Z  1908 |       0x80,
2025-12-04T12:35:04.8064836Z       |       ^~~~
2025-12-04T12:35:04.8066028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8066122Z  1910 |       0x80,
2025-12-04T12:35:04.8066213Z       |       ^~~~
2025-12-04T12:35:04.8067400Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8067500Z  1912 |       0x80,
2025-12-04T12:35:04.8067592Z       |       ^~~~
2025-12-04T12:35:04.8068783Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8068882Z  1914 |       0x80,
2025-12-04T12:35:04.8068986Z       |       ^~~~
2025-12-04T12:35:04.8070162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8070256Z  1916 |       0x80,
2025-12-04T12:35:04.8070362Z       |       ^~~~
2025-12-04T12:35:04.8071761Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8071953Z  1918 |       0x80,
2025-12-04T12:35:04.8072045Z       |       ^~~~
2025-12-04T12:35:04.8073227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8073389Z  1920 |       0x80,
2025-12-04T12:35:04.8073480Z       |       ^~~~
2025-12-04T12:35:04.8074663Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8074776Z  1922 |       0x80,
2025-12-04T12:35:04.8074868Z       |       ^~~~
2025-12-04T12:35:04.8076062Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8076167Z  1924 |       0x80,
2025-12-04T12:35:04.8076258Z       |       ^~~~
2025-12-04T12:35:04.8077502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8077607Z  1926 |       0x80,
2025-12-04T12:35:04.8077713Z       |       ^~~~
2025-12-04T12:35:04.8078941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8079042Z  1928 |       0x80);
2025-12-04T12:35:04.8079153Z       |       ^~~~
2025-12-04T12:35:04.8080330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8080429Z  1930 |       0x80,
2025-12-04T12:35:04.8080535Z       |       ^~~~
2025-12-04T12:35:04.8081715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8081829Z  1932 |       0x80,
2025-12-04T12:35:04.8081921Z       |       ^~~~
2025-12-04T12:35:04.8083098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8083203Z  1934 |       0x80,
2025-12-04T12:35:04.8083295Z       |       ^~~~
2025-12-04T12:35:04.8084488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8084581Z  1936 |       0x80,
2025-12-04T12:35:04.8084673Z       |       ^~~~
2025-12-04T12:35:04.8085862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8085961Z  1938 |       0x80,
2025-12-04T12:35:04.8086053Z       |       ^~~~
2025-12-04T12:35:04.8087265Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8087359Z  1940 |       0x80,
2025-12-04T12:35:04.8087465Z       |       ^~~~
2025-12-04T12:35:04.8088711Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8088807Z  1942 |       0x80,
2025-12-04T12:35:04.8088915Z       |       ^~~~
2025-12-04T12:35:04.8090133Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8090287Z  1944 |       0x80,
2025-12-04T12:35:04.8090381Z       |       ^~~~
2025-12-04T12:35:04.8091598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8091711Z  1946 |       0x80,
2025-12-04T12:35:04.8091804Z       |       ^~~~
2025-12-04T12:35:04.8092981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8093094Z  1948 |       0x80,
2025-12-04T12:35:04.8093187Z       |       ^~~~
2025-12-04T12:35:04.8094381Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8094483Z  1950 |       0x80,
2025-12-04T12:35:04.8094579Z       |       ^~~~
2025-12-04T12:35:04.8095775Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8095871Z  1952 |       0x80,
2025-12-04T12:35:04.8095979Z       |       ^~~~
2025-12-04T12:35:04.8097224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8097322Z  1954 |       0x80,
2025-12-04T12:35:04.8097429Z       |       ^~~~
2025-12-04T12:35:04.8098618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8098723Z  1956 |       0x80,
2025-12-04T12:35:04.8098832Z       |       ^~~~
2025-12-04T12:35:04.8100013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8100124Z  1958 |       0x80,
2025-12-04T12:35:04.8100225Z       |       ^~~~
2025-12-04T12:35:04.8101398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8101507Z  1960 |       0x80,
2025-12-04T12:35:04.8101602Z       |       ^~~~
2025-12-04T12:35:04.8102791Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8102894Z  1962 |       0x80,
2025-12-04T12:35:04.8102986Z       |       ^~~~
2025-12-04T12:35:04.8104177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8104275Z  1964 |       0x80,
2025-12-04T12:35:04.8104413Z       |       ^~~~
2025-12-04T12:35:04.8105609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8105706Z  1966 |       0x80,
2025-12-04T12:35:04.8105817Z       |       ^~~~
2025-12-04T12:35:04.8107034Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8107247Z  1968 |       0x80,
2025-12-04T12:35:04.8107357Z       |       ^~~~
2025-12-04T12:35:04.8108572Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8108680Z  1970 |       0x80,
2025-12-04T12:35:04.8108777Z       |       ^~~~
2025-12-04T12:35:04.8109952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8110060Z  1972 |       0x80,
2025-12-04T12:35:04.8110150Z       |       ^~~~
2025-12-04T12:35:04.8111328Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8111441Z  1974 |       0x80,
2025-12-04T12:35:04.8111532Z       |       ^~~~
2025-12-04T12:35:04.8112721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8112814Z  1976 |       0x80,
2025-12-04T12:35:04.8112910Z       |       ^~~~
2025-12-04T12:35:04.8114092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8114185Z  1978 |       0x80,
2025-12-04T12:35:04.8114277Z       |       ^~~~
2025-12-04T12:35:04.8115468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8115566Z  1980 |       0x80,
2025-12-04T12:35:04.8115669Z       |       ^~~~
2025-12-04T12:35:04.8116844Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8116942Z  1982 |       0x80,
2025-12-04T12:35:04.8117048Z       |       ^~~~
2025-12-04T12:35:04.8118219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8118328Z  1984 |       0x80,
2025-12-04T12:35:04.8118420Z       |       ^~~~
2025-12-04T12:35:04.8119596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8119707Z  1986 |       0x80,
2025-12-04T12:35:04.8119800Z       |       ^~~~
2025-12-04T12:35:04.8120985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8121137Z  1988 |       0x80,
2025-12-04T12:35:04.8121228Z       |       ^~~~
2025-12-04T12:35:04.8122415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8122509Z  1990 |       0x80,
2025-12-04T12:35:04.8122599Z       |       ^~~~
2025-12-04T12:35:04.8123881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8123976Z  1992 |       0x80,
2025-12-04T12:35:04.8124081Z       |       ^~~~
2025-12-04T12:35:04.8125297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.8125464Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.8125595Z       |                                      ^~~~~~
2025-12-04T12:35:04.8128000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = false; T = signed char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.8128607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2109:28:   required from here
2025-12-04T12:35:04.8129789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8129892Z  1866 |       0x80,
2025-12-04T12:35:04.8130001Z       |       ^~~~
2025-12-04T12:35:04.8131174Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8131285Z  1868 |       0x80,
2025-12-04T12:35:04.8131393Z       |       ^~~~
2025-12-04T12:35:04.8132566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8132678Z  1870 |       0x80,
2025-12-04T12:35:04.8132779Z       |       ^~~~
2025-12-04T12:35:04.8133965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8134066Z  1872 |       0x80,
2025-12-04T12:35:04.8134159Z       |       ^~~~
2025-12-04T12:35:04.8135343Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8135437Z  1874 |       0x80,
2025-12-04T12:35:04.8135540Z       |       ^~~~
2025-12-04T12:35:04.8136819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8136917Z  1876 |       0x80,
2025-12-04T12:35:04.8137031Z       |       ^~~~
2025-12-04T12:35:04.8138213Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8138359Z  1878 |       0x80,
2025-12-04T12:35:04.8138472Z       |       ^~~~
2025-12-04T12:35:04.8139643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8139753Z  1880 |       0x80,
2025-12-04T12:35:04.8139916Z       |       ^~~~
2025-12-04T12:35:04.8141093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8141202Z  1882 |       0x80,
2025-12-04T12:35:04.8141330Z       |       ^~~~
2025-12-04T12:35:04.8142510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8142625Z  1884 |       0x80,
2025-12-04T12:35:04.8142716Z       |       ^~~~
2025-12-04T12:35:04.8143896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8143996Z  1886 |       0x80,
2025-12-04T12:35:04.8144092Z       |       ^~~~
2025-12-04T12:35:04.8145275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8145375Z  1888 |       0x80,
2025-12-04T12:35:04.8145479Z       |       ^~~~
2025-12-04T12:35:04.8146655Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8146756Z  1890 |       0x80,
2025-12-04T12:35:04.8146859Z       |       ^~~~
2025-12-04T12:35:04.8148025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8148124Z  1892 |       0x80,
2025-12-04T12:35:04.8148234Z       |       ^~~~
2025-12-04T12:35:04.8149403Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8149514Z  1894 |       0x80,
2025-12-04T12:35:04.8149607Z       |       ^~~~
2025-12-04T12:35:04.8150780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8151529Z  1896 |       0x80,
2025-12-04T12:35:04.8151625Z       |       ^~~~
2025-12-04T12:35:04.8152823Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8152966Z  1898 |       0x80,
2025-12-04T12:35:04.8153058Z       |       ^~~~
2025-12-04T12:35:04.8154245Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8154344Z  1900 |       0x80,
2025-12-04T12:35:04.8154436Z       |       ^~~~
2025-12-04T12:35:04.8155622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8155721Z  1902 |       0x80,
2025-12-04T12:35:04.8155828Z       |       ^~~~
2025-12-04T12:35:04.8157002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8157138Z  1904 |       0x80,
2025-12-04T12:35:04.8157246Z       |       ^~~~
2025-12-04T12:35:04.8158421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8158567Z  1906 |       0x80,
2025-12-04T12:35:04.8158661Z       |       ^~~~
2025-12-04T12:35:04.8159838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8159950Z  1908 |       0x80,
2025-12-04T12:35:04.8160042Z       |       ^~~~
2025-12-04T12:35:04.8161216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8161328Z  1910 |       0x80,
2025-12-04T12:35:04.8161420Z       |       ^~~~
2025-12-04T12:35:04.8162612Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8162706Z  1912 |       0x80,
2025-12-04T12:35:04.8162798Z       |       ^~~~
2025-12-04T12:35:04.8163989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8164091Z  1914 |       0x80,
2025-12-04T12:35:04.8164184Z       |       ^~~~
2025-12-04T12:35:04.8165380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8165481Z  1916 |       0x80,
2025-12-04T12:35:04.8165585Z       |       ^~~~
2025-12-04T12:35:04.8166765Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8166858Z  1918 |       0x80,
2025-12-04T12:35:04.8166965Z       |       ^~~~
2025-12-04T12:35:04.8168141Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8168287Z  1920 |       0x80,
2025-12-04T12:35:04.8168380Z       |       ^~~~
2025-12-04T12:35:04.8169563Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8169705Z  1922 |       0x80,
2025-12-04T12:35:04.8169797Z       |       ^~~~
2025-12-04T12:35:04.8171191Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8171315Z  1924 |       0x80,
2025-12-04T12:35:04.8171409Z       |       ^~~~
2025-12-04T12:35:04.8172601Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8172703Z  1926 |       0x80,
2025-12-04T12:35:04.8172796Z       |       ^~~~
2025-12-04T12:35:04.8174064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8174169Z  1928 |       0x80);
2025-12-04T12:35:04.8174278Z       |       ^~~~
2025-12-04T12:35:04.8175508Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8175607Z  1930 |       0x80,
2025-12-04T12:35:04.8175719Z       |       ^~~~
2025-12-04T12:35:04.8176959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8177083Z  1932 |       0x80,
2025-12-04T12:35:04.8177192Z       |       ^~~~
2025-12-04T12:35:04.8178389Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8178507Z  1934 |       0x80,
2025-12-04T12:35:04.8178601Z       |       ^~~~
2025-12-04T12:35:04.8179782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8179897Z  1936 |       0x80,
2025-12-04T12:35:04.8179991Z       |       ^~~~
2025-12-04T12:35:04.8181184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8181288Z  1938 |       0x80,
2025-12-04T12:35:04.8181382Z       |       ^~~~
2025-12-04T12:35:04.8182572Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8182674Z  1940 |       0x80,
2025-12-04T12:35:04.8182770Z       |       ^~~~
2025-12-04T12:35:04.8183966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8184064Z  1942 |       0x80,
2025-12-04T12:35:04.8184173Z       |       ^~~~
2025-12-04T12:35:04.8185346Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8185518Z  1944 |       0x80,
2025-12-04T12:35:04.8185626Z       |       ^~~~
2025-12-04T12:35:04.8186845Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8187001Z  1946 |       0x80,
2025-12-04T12:35:04.8187096Z       |       ^~~~
2025-12-04T12:35:04.8188318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8188431Z  1948 |       0x80,
2025-12-04T12:35:04.8188527Z       |       ^~~~
2025-12-04T12:35:04.8189715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8189831Z  1950 |       0x80,
2025-12-04T12:35:04.8189926Z       |       ^~~~
2025-12-04T12:35:04.8191129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8191232Z  1952 |       0x80,
2025-12-04T12:35:04.8191324Z       |       ^~~~
2025-12-04T12:35:04.8192519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8192613Z  1954 |       0x80,
2025-12-04T12:35:04.8192707Z       |       ^~~~
2025-12-04T12:35:04.8193901Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8194004Z  1956 |       0x80,
2025-12-04T12:35:04.8194109Z       |       ^~~~
2025-12-04T12:35:04.8195287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8195389Z  1958 |       0x80,
2025-12-04T12:35:04.8195501Z       |       ^~~~
2025-12-04T12:35:04.8196679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8196788Z  1960 |       0x80,
2025-12-04T12:35:04.8196880Z       |       ^~~~
2025-12-04T12:35:04.8198066Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8198185Z  1962 |       0x80,
2025-12-04T12:35:04.8198275Z       |       ^~~~
2025-12-04T12:35:04.8199465Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8199564Z  1964 |       0x80,
2025-12-04T12:35:04.8199658Z       |       ^~~~
2025-12-04T12:35:04.8200866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8200959Z  1966 |       0x80,
2025-12-04T12:35:04.8201053Z       |       ^~~~
2025-12-04T12:35:04.8202243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8202387Z  1968 |       0x80,
2025-12-04T12:35:04.8202490Z       |       ^~~~
2025-12-04T12:35:04.8203703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8203830Z  1970 |       0x80,
2025-12-04T12:35:04.8203942Z       |       ^~~~
2025-12-04T12:35:04.8205156Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8205267Z  1972 |       0x80,
2025-12-04T12:35:04.8205359Z       |       ^~~~
2025-12-04T12:35:04.8206540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8206646Z  1974 |       0x80,
2025-12-04T12:35:04.8206740Z       |       ^~~~
2025-12-04T12:35:04.8207918Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8208033Z  1976 |       0x80,
2025-12-04T12:35:04.8208127Z       |       ^~~~
2025-12-04T12:35:04.8209324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8209422Z  1978 |       0x80,
2025-12-04T12:35:04.8209516Z       |       ^~~~
2025-12-04T12:35:04.8210711Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8210805Z  1980 |       0x80,
2025-12-04T12:35:04.8210900Z       |       ^~~~
2025-12-04T12:35:04.8212095Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8212194Z  1982 |       0x80,
2025-12-04T12:35:04.8212299Z       |       ^~~~
2025-12-04T12:35:04.8213486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8213582Z  1984 |       0x80,
2025-12-04T12:35:04.8213686Z       |       ^~~~
2025-12-04T12:35:04.8214875Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8214980Z  1986 |       0x80,
2025-12-04T12:35:04.8215071Z       |       ^~~~
2025-12-04T12:35:04.8216254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8216440Z  1988 |       0x80,
2025-12-04T12:35:04.8216534Z       |       ^~~~
2025-12-04T12:35:04.8217734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8217845Z  1990 |       0x80,
2025-12-04T12:35:04.8217939Z       |       ^~~~
2025-12-04T12:35:04.8219197Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8219294Z  1992 |       0x80,
2025-12-04T12:35:04.8219387Z       |       ^~~~
2025-12-04T12:35:04.8220777Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.8220977Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.8221114Z       |                                      ^~~~~~
2025-12-04T12:35:04.8223586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = false; T = unsigned char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.8224177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2116:28:   required from here
2025-12-04T12:35:04.8225387Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8225490Z  1866 |       0x80,
2025-12-04T12:35:04.8225599Z       |       ^~~~
2025-12-04T12:35:04.8226785Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8226879Z  1868 |       0x80,
2025-12-04T12:35:04.8226988Z       |       ^~~~
2025-12-04T12:35:04.8228162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8228272Z  1870 |       0x80,
2025-12-04T12:35:04.8228370Z       |       ^~~~
2025-12-04T12:35:04.8229557Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8229673Z  1872 |       0x80,
2025-12-04T12:35:04.8229767Z       |       ^~~~
2025-12-04T12:35:04.8230966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8231062Z  1874 |       0x80,
2025-12-04T12:35:04.8231160Z       |       ^~~~
2025-12-04T12:35:04.8232348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8232445Z  1876 |       0x80,
2025-12-04T12:35:04.8232541Z       |       ^~~~
2025-12-04T12:35:04.8233736Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8233834Z  1878 |       0x80,
2025-12-04T12:35:04.8233940Z       |       ^~~~
2025-12-04T12:35:04.8235117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8235265Z  1880 |       0x80,
2025-12-04T12:35:04.8235370Z       |       ^~~~
2025-12-04T12:35:04.8236546Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8236654Z  1882 |       0x80,
2025-12-04T12:35:04.8236748Z       |       ^~~~
2025-12-04T12:35:04.8237959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8238102Z  1884 |       0x80,
2025-12-04T12:35:04.8238195Z       |       ^~~~
2025-12-04T12:35:04.8239432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8239551Z  1886 |       0x80,
2025-12-04T12:35:04.8239645Z       |       ^~~~
2025-12-04T12:35:04.8240838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8240933Z  1888 |       0x80,
2025-12-04T12:35:04.8241026Z       |       ^~~~
2025-12-04T12:35:04.8242234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8242330Z  1890 |       0x80,
2025-12-04T12:35:04.8242420Z       |       ^~~~
2025-12-04T12:35:04.8243614Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8243715Z  1892 |       0x80,
2025-12-04T12:35:04.8243821Z       |       ^~~~
2025-12-04T12:35:04.8244992Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8245085Z  1894 |       0x80,
2025-12-04T12:35:04.8245189Z       |       ^~~~
2025-12-04T12:35:04.8246375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8246484Z  1896 |       0x80,
2025-12-04T12:35:04.8246578Z       |       ^~~~
2025-12-04T12:35:04.8247767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8247921Z  1898 |       0x80,
2025-12-04T12:35:04.8248012Z       |       ^~~~
2025-12-04T12:35:04.8249188Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8249296Z  1900 |       0x80,
2025-12-04T12:35:04.8249394Z       |       ^~~~
2025-12-04T12:35:04.8250618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8250713Z  1902 |       0x80,
2025-12-04T12:35:04.8250816Z       |       ^~~~
2025-12-04T12:35:04.8252022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8252127Z  1904 |       0x80,
2025-12-04T12:35:04.8252235Z       |       ^~~~
2025-12-04T12:35:04.8253422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8253517Z  1906 |       0x80,
2025-12-04T12:35:04.8253689Z       |       ^~~~
2025-12-04T12:35:04.8254883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8254977Z  1908 |       0x80,
2025-12-04T12:35:04.8255126Z       |       ^~~~
2025-12-04T12:35:04.8256374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8256493Z  1910 |       0x80,
2025-12-04T12:35:04.8256586Z       |       ^~~~
2025-12-04T12:35:04.8257768Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8257883Z  1912 |       0x80,
2025-12-04T12:35:04.8257991Z       |       ^~~~
2025-12-04T12:35:04.8259182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8259275Z  1914 |       0x80,
2025-12-04T12:35:04.8259372Z       |       ^~~~
2025-12-04T12:35:04.8260566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8260668Z  1916 |       0x80,
2025-12-04T12:35:04.8260759Z       |       ^~~~
2025-12-04T12:35:04.8261954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8262048Z  1918 |       0x80,
2025-12-04T12:35:04.8262170Z       |       ^~~~
2025-12-04T12:35:04.8263355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8263449Z  1920 |       0x80,
2025-12-04T12:35:04.8263565Z       |       ^~~~
2025-12-04T12:35:04.8264751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8264916Z  1922 |       0x80,
2025-12-04T12:35:04.8265009Z       |       ^~~~
2025-12-04T12:35:04.8266188Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8266302Z  1924 |       0x80,
2025-12-04T12:35:04.8266441Z       |       ^~~~
2025-12-04T12:35:04.8267622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8267734Z  1926 |       0x80,
2025-12-04T12:35:04.8267834Z       |       ^~~~
2025-12-04T12:35:04.8269019Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8269125Z  1928 |       0x80);
2025-12-04T12:35:04.8269218Z       |       ^~~~
2025-12-04T12:35:04.8270410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8270542Z  1930 |       0x80,
2025-12-04T12:35:04.8270656Z       |       ^~~~
2025-12-04T12:35:04.8272036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8272206Z  1932 |       0x80,
2025-12-04T12:35:04.8272315Z       |       ^~~~
2025-12-04T12:35:04.8273502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8273603Z  1934 |       0x80,
2025-12-04T12:35:04.8273715Z       |       ^~~~
2025-12-04T12:35:04.8274894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8275016Z  1936 |       0x80,
2025-12-04T12:35:04.8275110Z       |       ^~~~
2025-12-04T12:35:04.8276288Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8276406Z  1938 |       0x80,
2025-12-04T12:35:04.8276500Z       |       ^~~~
2025-12-04T12:35:04.8277692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8277796Z  1940 |       0x80,
2025-12-04T12:35:04.8277892Z       |       ^~~~
2025-12-04T12:35:04.8279089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8279201Z  1942 |       0x80,
2025-12-04T12:35:04.8279295Z       |       ^~~~
2025-12-04T12:35:04.8280480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8280580Z  1944 |       0x80,
2025-12-04T12:35:04.8280689Z       |       ^~~~
2025-12-04T12:35:04.8281858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8282013Z  1946 |       0x80,
2025-12-04T12:35:04.8282124Z       |       ^~~~
2025-12-04T12:35:04.8283438Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8283713Z  1948 |       0x80,
2025-12-04T12:35:04.8283810Z       |       ^~~~
2025-12-04T12:35:04.8284997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8285141Z  1950 |       0x80,
2025-12-04T12:35:04.8285237Z       |       ^~~~
2025-12-04T12:35:04.8286412Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8286527Z  1952 |       0x80,
2025-12-04T12:35:04.8286620Z       |       ^~~~
2025-12-04T12:35:04.8287821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8287921Z  1954 |       0x80,
2025-12-04T12:35:04.8288013Z       |       ^~~~
2025-12-04T12:35:04.8289227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8289322Z  1956 |       0x80,
2025-12-04T12:35:04.8289414Z       |       ^~~~
2025-12-04T12:35:04.8290607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8290707Z  1958 |       0x80,
2025-12-04T12:35:04.8290814Z       |       ^~~~
2025-12-04T12:35:04.8291995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8292095Z  1960 |       0x80,
2025-12-04T12:35:04.8292203Z       |       ^~~~
2025-12-04T12:35:04.8293395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8293502Z  1962 |       0x80,
2025-12-04T12:35:04.8293593Z       |       ^~~~
2025-12-04T12:35:04.8294772Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8294886Z  1964 |       0x80,
2025-12-04T12:35:04.8294977Z       |       ^~~~
2025-12-04T12:35:04.8296159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8296272Z  1966 |       0x80,
2025-12-04T12:35:04.8296433Z       |       ^~~~
2025-12-04T12:35:04.8297638Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8297738Z  1968 |       0x80,
2025-12-04T12:35:04.8297829Z       |       ^~~~
2025-12-04T12:35:04.8299015Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8299169Z  1970 |       0x80,
2025-12-04T12:35:04.8299274Z       |       ^~~~
2025-12-04T12:35:04.8300493Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8300652Z  1972 |       0x80,
2025-12-04T12:35:04.8300757Z       |       ^~~~
2025-12-04T12:35:04.8301962Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8302058Z  1974 |       0x80,
2025-12-04T12:35:04.8302164Z       |       ^~~~
2025-12-04T12:35:04.8303341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8303453Z  1976 |       0x80,
2025-12-04T12:35:04.8303545Z       |       ^~~~
2025-12-04T12:35:04.8304727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8304840Z  1978 |       0x80,
2025-12-04T12:35:04.8304933Z       |       ^~~~
2025-12-04T12:35:04.8306126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8306221Z  1980 |       0x80,
2025-12-04T12:35:04.8306312Z       |       ^~~~
2025-12-04T12:35:04.8307496Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8307597Z  1982 |       0x80,
2025-12-04T12:35:04.8307690Z       |       ^~~~
2025-12-04T12:35:04.8308889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8308989Z  1984 |       0x80,
2025-12-04T12:35:04.8309096Z       |       ^~~~
2025-12-04T12:35:04.8310274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8310368Z  1986 |       0x80,
2025-12-04T12:35:04.8310476Z       |       ^~~~
2025-12-04T12:35:04.8311644Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8311759Z  1988 |       0x80,
2025-12-04T12:35:04.8311852Z       |       ^~~~
2025-12-04T12:35:04.8313030Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8313143Z  1990 |       0x80,
2025-12-04T12:35:04.8313234Z       |       ^~~~
2025-12-04T12:35:04.8314418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8314525Z  1992 |       0x80,
2025-12-04T12:35:04.8314618Z       |       ^~~~
2025-12-04T12:35:04.8315801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.8316006Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.8316122Z       |                                      ^~~~~~
2025-12-04T12:35:04.8316676Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:16,
2025-12-04T12:35:04.8317081Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.8317541Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.8317974Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.8318437Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.8319099Z                  from /tmp/U7W6v5/tmp2gno_q_y/data/aotinductor/model2/cyss5jazqjsvp5s2t3ihlofugodyzirark5aiimqjwirn4hylxbp.wrapper.cpp:656:
2025-12-04T12:35:04.8320589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = signed char; int64_t = long int]’:
2025-12-04T12:35:04.8321180Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:696:31:   required from here
2025-12-04T12:35:04.8322366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8322485Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8322595Z       |       ^~~~
2025-12-04T12:35:04.8323790Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8323923Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8324027Z       |             ^~~~
2025-12-04T12:35:04.8325221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8325356Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8325461Z       |                   ^~~~
2025-12-04T12:35:04.8326666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8326786Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8326888Z       |                         ^~~~
2025-12-04T12:35:04.8328084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8328198Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8328306Z       |       ^~~~
2025-12-04T12:35:04.8329510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8329629Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8329738Z       |             ^~~~
2025-12-04T12:35:04.8330918Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8331081Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8331194Z       |                   ^~~~
2025-12-04T12:35:04.8332413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8332575Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8332679Z       |                         ^~~~
2025-12-04T12:35:04.8333892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8334020Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8334112Z       |       ^~~~
2025-12-04T12:35:04.8335308Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8335430Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8335527Z       |             ^~~~
2025-12-04T12:35:04.8336803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8336924Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8337026Z       |                   ^~~~
2025-12-04T12:35:04.8338237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8338351Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8338470Z       |                         ^~~~
2025-12-04T12:35:04.8339655Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8339767Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8339877Z       |       ^~~~
2025-12-04T12:35:04.8341069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8341204Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8341301Z       |             ^~~~
2025-12-04T12:35:04.8342485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8342668Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8342769Z       |                   ^~~~
2025-12-04T12:35:04.8343967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8344082Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8344228Z       |                         ^~~~
2025-12-04T12:35:04.8345422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8345540Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8345632Z       |       ^~~~
2025-12-04T12:35:04.8346828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8346945Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8353568Z       |             ^~~~
2025-12-04T12:35:04.8355092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8355227Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8355350Z       |                   ^~~~
2025-12-04T12:35:04.8356597Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8356718Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8356838Z       |                         ^~~~
2025-12-04T12:35:04.8358038Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8358178Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8358274Z       |       ^~~~
2025-12-04T12:35:04.8359467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8359605Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8359704Z       |             ^~~~
2025-12-04T12:35:04.8360910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8361024Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8361128Z       |                   ^~~~
2025-12-04T12:35:04.8362338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8362455Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8362571Z       |                         ^~~~
2025-12-04T12:35:04.8363760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8363881Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8363989Z       |       ^~~~
2025-12-04T12:35:04.8365176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8365340Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8365455Z       |             ^~~~
2025-12-04T12:35:04.8366639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8366770Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8367016Z       |                   ^~~~
2025-12-04T12:35:04.8368201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8368336Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8368440Z       |                         ^~~~
2025-12-04T12:35:04.8369630Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8369749Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8369843Z       |       ^~~~
2025-12-04T12:35:04.8371336Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8371458Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8371572Z       |             ^~~~
2025-12-04T12:35:04.8372823Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8372938Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8373055Z       |                   ^~~~
2025-12-04T12:35:04.8374247Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8374367Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8374482Z       |                         ^~~~
2025-12-04T12:35:04.8375667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8375802Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8375899Z       |       ^~~~
2025-12-04T12:35:04.8377211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8377336Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8377433Z       |             ^~~~
2025-12-04T12:35:04.8378650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8378761Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8378859Z       |                   ^~~~
2025-12-04T12:35:04.8380064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8380183Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8380284Z       |                         ^~~~
2025-12-04T12:35:04.8381478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8381655Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8381763Z       |       ^~~~
2025-12-04T12:35:04.8382945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8383055Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8383258Z       |             ^~~~
2025-12-04T12:35:04.8384448Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8384611Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8384715Z       |                   ^~~~
2025-12-04T12:35:04.8385890Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8386020Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8386121Z       |                         ^~~~
2025-12-04T12:35:04.8387324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8387440Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8387533Z       |       ^~~~
2025-12-04T12:35:04.8388736Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8388848Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8388945Z       |             ^~~~
2025-12-04T12:35:04.8390137Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8390255Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8390371Z       |                   ^~~~
2025-12-04T12:35:04.8391558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8391676Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8391788Z       |                         ^~~~
2025-12-04T12:35:04.8392973Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8393095Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8393194Z       |       ^~~~
2025-12-04T12:35:04.8394370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8394495Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8394593Z       |             ^~~~
2025-12-04T12:35:04.8395794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8395910Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8396010Z       |                   ^~~~
2025-12-04T12:35:04.8397211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8397360Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8397459Z       |                         ^~~~
2025-12-04T12:35:04.8398958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = unsigned char; int64_t = long int]’:
2025-12-04T12:35:04.8399612Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:933:31:   required from here
2025-12-04T12:35:04.8400860Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8400980Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8401090Z       |       ^~~~
2025-12-04T12:35:04.8402282Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8402402Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8402513Z       |             ^~~~
2025-12-04T12:35:04.8403701Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8403820Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8403933Z       |                   ^~~~
2025-12-04T12:35:04.8405117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8405243Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8405344Z       |                         ^~~~
2025-12-04T12:35:04.8406528Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8406653Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8406746Z       |       ^~~~
2025-12-04T12:35:04.8407948Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8408066Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8408161Z       |             ^~~~
2025-12-04T12:35:04.8409357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8409475Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8409573Z       |                   ^~~~
2025-12-04T12:35:04.8410772Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8410884Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8411011Z       |                         ^~~~
2025-12-04T12:35:04.8412184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8412302Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8412408Z       |       ^~~~
2025-12-04T12:35:04.8413589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8413752Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8413848Z       |             ^~~~
2025-12-04T12:35:04.8415068Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8415226Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8415325Z       |                   ^~~~
2025-12-04T12:35:04.8416677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8416793Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8416894Z       |                         ^~~~
2025-12-04T12:35:04.8418105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8418224Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8418317Z       |       ^~~~
2025-12-04T12:35:04.8419515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8419633Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8419745Z       |             ^~~~
2025-12-04T12:35:04.8420930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8421044Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8421156Z       |                   ^~~~
2025-12-04T12:35:04.8422343Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8422466Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8422567Z       |                         ^~~~
2025-12-04T12:35:04.8423752Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8423883Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8423976Z       |       ^~~~
2025-12-04T12:35:04.8425169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8425286Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8425383Z       |             ^~~~
2025-12-04T12:35:04.8426578Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8426687Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8426800Z       |                   ^~~~
2025-12-04T12:35:04.8427998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8428120Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8428234Z       |                         ^~~~
2025-12-04T12:35:04.8429409Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8429578Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8429684Z       |       ^~~~
2025-12-04T12:35:04.8430905Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8431066Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8431165Z       |             ^~~~
2025-12-04T12:35:04.8432388Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8432516Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8432617Z       |                   ^~~~
2025-12-04T12:35:04.8433809Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8433933Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8434031Z       |                         ^~~~
2025-12-04T12:35:04.8435234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8435352Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8435444Z       |       ^~~~
2025-12-04T12:35:04.8436649Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8436760Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8436869Z       |             ^~~~
2025-12-04T12:35:04.8438055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8438167Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8438278Z       |                   ^~~~
2025-12-04T12:35:04.8439460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8439592Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8439694Z       |                         ^~~~
2025-12-04T12:35:04.8440870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8441032Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8441125Z       |       ^~~~
2025-12-04T12:35:04.8442317Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8442429Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8442569Z       |             ^~~~
2025-12-04T12:35:04.8443762Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8443883Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8443983Z       |                   ^~~~
2025-12-04T12:35:04.8445170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8445291Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8445405Z       |                         ^~~~
2025-12-04T12:35:04.8446617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8446737Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8446845Z       |       ^~~~
2025-12-04T12:35:04.8448068Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8448197Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8448296Z       |             ^~~~
2025-12-04T12:35:04.8449485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8449621Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8449722Z       |                   ^~~~
2025-12-04T12:35:04.8450924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8451047Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8451154Z       |                         ^~~~
2025-12-04T12:35:04.8452354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8452469Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8452570Z       |       ^~~~
2025-12-04T12:35:04.8453769Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8453882Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8453995Z       |             ^~~~
2025-12-04T12:35:04.8455185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8455305Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8455423Z       |                   ^~~~
2025-12-04T12:35:04.8456690Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8456866Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8456969Z       |                         ^~~~
2025-12-04T12:35:04.8458166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8458292Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8458436Z       |       ^~~~
2025-12-04T12:35:04.8459633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8459755Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8459854Z       |             ^~~~
2025-12-04T12:35:04.8461059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8461175Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8461276Z       |                   ^~~~
2025-12-04T12:35:04.8462509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8462631Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8462748Z       |                         ^~~~
2025-12-04T12:35:04.8463965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8464081Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8464191Z       |       ^~~~
2025-12-04T12:35:04.8465553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8465692Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8465791Z       |             ^~~~
2025-12-04T12:35:04.8466991Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8467124Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8467222Z       |                   ^~~~
2025-12-04T12:35:04.8468405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8468529Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8468630Z       |                         ^~~~
2025-12-04T12:35:04.8468753Z PASSED [36.3331s] [ 40%]
2025-12-04T12:35:04.8469422Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_package_shared_weights SKIPPED [0.0036s] (No support for cpp only) [ 42%]
2025-12-04T12:35:04.8470099Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_package_user_managed_weight SKIPPED [0.0031s] (No support for cpp only) [ 43%]
2025-12-04T12:35:04.8470830Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_package_weights_on_disk_nested_module SKIPPED [0.0029s] (No support for cpp only) [ 44%]
2025-12-04T12:35:04.8471686Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_package_without_weight SKIPPED [0.0029s] (No support for cpp only) [ 45%]
2025-12-04T12:35:04.8472225Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_remove_intermediate_files PASSED [5.2513s] [ 46%]
2025-12-04T12:35:04.8472689Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_save_buffer PASSED [5.2709s] [ 47%]
2025-12-04T12:35:04.8473815Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_specified_output_dir In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_float.h:12,
2025-12-04T12:35:04.8474268Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:11,
2025-12-04T12:35:04.8474693Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.8475148Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.8475555Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.8476033Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.8476704Z                  from /tmp/uDOzLN/tmpn56tbzg5/data/aotinductor/model1/cwulnadwx3jyqkgl526d3bpo7ziav2n33dginvvv4zbkqn5jle4v.wrapper.cpp:729:
2025-12-04T12:35:04.8477306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/sleef.h:192:10: warning: ISO C++ prohibits anonymous structs [-Wpedantic]
2025-12-04T12:35:04.8477477Z   192 |   struct {
2025-12-04T12:35:04.8477577Z       |          ^
2025-12-04T12:35:04.8478075Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15,
2025-12-04T12:35:04.8478516Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.8478958Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.8479374Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.8479839Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.8480506Z                  from /tmp/uDOzLN/tmpn56tbzg5/data/aotinductor/model1/cwulnadwx3jyqkgl526d3bpo7ziav2n33dginvvv4zbkqn5jle4v.wrapper.cpp:729:
2025-12-04T12:35:04.8482768Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<short int>&, const at::vec::CPU_CAPABILITY::Vectorized<short int>&, const at::vec::CPU_CAPABILITY::Vectorized<short int>&)’:
2025-12-04T12:35:04.8483950Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:544:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.8484124Z   544 |     auto msb_one = _mm512_set1_epi16(0xFFFF);
2025-12-04T12:35:04.8484240Z       |                                      ^~~~~~
2025-12-04T12:35:04.8484755Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15,
2025-12-04T12:35:04.8485128Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.8485572Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.8485993Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.8486454Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.8487134Z                  from /tmp/uDOzLN/tmpn56tbzg5/data/aotinductor/model1/cwulnadwx3jyqkgl526d3bpo7ziav2n33dginvvv4zbkqn5jle4v.wrapper.cpp:729:
2025-12-04T12:35:04.8488802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.8490015Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:697:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.8490240Z   697 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.8490370Z       |                                                      ^~~~~~
2025-12-04T12:35:04.8492003Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.8493176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:701:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.8493434Z   701 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.8493566Z       |                                                      ^~~~~~
2025-12-04T12:35:04.8495223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.8496477Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:705:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.8496694Z   705 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.8496831Z       |                                                      ^~~~~~
2025-12-04T12:35:04.8498465Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.8499647Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:709:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.8499852Z   709 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.8499977Z       |                                                      ^~~~~~
2025-12-04T12:35:04.8501617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator>(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.8502797Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:713:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.8503021Z   713 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.8503151Z       |                                                      ^~~~~~
2025-12-04T12:35:04.8504759Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<short int> at::vec::CPU_CAPABILITY::Vectorized<short int>::operator>=(const at::vec::CPU_CAPABILITY::Vectorized<short int>&) const’:
2025-12-04T12:35:04.8505981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:717:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow]
2025-12-04T12:35:04.8506220Z   717 |     return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF);
2025-12-04T12:35:04.8506389Z       |                                                      ^~~~~~
2025-12-04T12:35:04.8508685Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&, const at::vec::CPU_CAPABILITY::Vectorized<signed char>&, const at::vec::CPU_CAPABILITY::Vectorized<signed char>&)’:
2025-12-04T12:35:04.8509894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1153:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8510048Z  1153 |     auto msb_one = _mm512_set1_epi8(0xFF);
2025-12-04T12:35:04.8510165Z       |                                     ^~~~
2025-12-04T12:35:04.8511838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.8513036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1166:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8513254Z  1166 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.8513386Z       |                                                     ^~~~
2025-12-04T12:35:04.8515047Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.8516245Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1170:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8516448Z  1170 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.8516589Z       |                                                     ^~~~
2025-12-04T12:35:04.8518232Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.8519436Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1174:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8519650Z  1174 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.8519773Z       |                                                     ^~~~
2025-12-04T12:35:04.8521439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<signed char> at::vec::CPU_CAPABILITY::Vectorized<signed char>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<signed char>&) const’:
2025-12-04T12:35:04.8522632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1178:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8522889Z  1178 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.8523012Z       |                                                     ^~~~
2025-12-04T12:35:04.8525420Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::blendv(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&, const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&, const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&)’:
2025-12-04T12:35:04.8526648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1207:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8526815Z  1207 |     auto msb_one = _mm512_set1_epi8(0xFF);
2025-12-04T12:35:04.8526931Z       |                                     ^~~~
2025-12-04T12:35:04.8528634Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator==(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.8529839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1220:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8530050Z  1220 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.8530186Z       |                                                     ^~~~
2025-12-04T12:35:04.8531869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator!=(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.8533067Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1224:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8533282Z  1224 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.8533404Z       |                                                     ^~~~
2025-12-04T12:35:04.8535101Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator<(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.8536348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1228:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8536567Z  1228 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.8536695Z       |                                                     ^~~~
2025-12-04T12:35:04.8538402Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized<unsigned char> at::vec::CPU_CAPABILITY::Vectorized<unsigned char>::operator<=(const at::vec::CPU_CAPABILITY::Vectorized<unsigned char>&) const’:
2025-12-04T12:35:04.8539607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1232:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8539857Z  1232 |     return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF);
2025-12-04T12:35:04.8539992Z       |                                                     ^~~~
2025-12-04T12:35:04.8542392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = true; T = signed char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.8543069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2074:27:   required from here
2025-12-04T12:35:04.8544257Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8544359Z  1866 |       0x80,
2025-12-04T12:35:04.8544466Z       |       ^~~~
2025-12-04T12:35:04.8545643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8545762Z  1868 |       0x80,
2025-12-04T12:35:04.8545859Z       |       ^~~~
2025-12-04T12:35:04.8547033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8547150Z  1870 |       0x80,
2025-12-04T12:35:04.8547244Z       |       ^~~~
2025-12-04T12:35:04.8548421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8548546Z  1872 |       0x80,
2025-12-04T12:35:04.8548639Z       |       ^~~~
2025-12-04T12:35:04.8549825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8549931Z  1874 |       0x80,
2025-12-04T12:35:04.8550022Z       |       ^~~~
2025-12-04T12:35:04.8551206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8551306Z  1876 |       0x80,
2025-12-04T12:35:04.8551396Z       |       ^~~~
2025-12-04T12:35:04.8552589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8552690Z  1878 |       0x80,
2025-12-04T12:35:04.8552792Z       |       ^~~~
2025-12-04T12:35:04.8553960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8554066Z  1880 |       0x80,
2025-12-04T12:35:04.8554174Z       |       ^~~~
2025-12-04T12:35:04.8555349Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8555460Z  1882 |       0x80,
2025-12-04T12:35:04.8555552Z       |       ^~~~
2025-12-04T12:35:04.8556726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8556875Z  1884 |       0x80,
2025-12-04T12:35:04.8556965Z       |       ^~~~
2025-12-04T12:35:04.8558168Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8558336Z  1886 |       0x80,
2025-12-04T12:35:04.8558432Z       |       ^~~~
2025-12-04T12:35:04.8559625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8559761Z  1888 |       0x80,
2025-12-04T12:35:04.8559855Z       |       ^~~~
2025-12-04T12:35:04.8561051Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8561154Z  1890 |       0x80,
2025-12-04T12:35:04.8561262Z       |       ^~~~
2025-12-04T12:35:04.8562440Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8562543Z  1892 |       0x80,
2025-12-04T12:35:04.8562654Z       |       ^~~~
2025-12-04T12:35:04.8563833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8563948Z  1894 |       0x80,
2025-12-04T12:35:04.8564047Z       |       ^~~~
2025-12-04T12:35:04.8565218Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8565340Z  1896 |       0x80,
2025-12-04T12:35:04.8565434Z       |       ^~~~
2025-12-04T12:35:04.8566613Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8566730Z  1898 |       0x80,
2025-12-04T12:35:04.8566823Z       |       ^~~~
2025-12-04T12:35:04.8568022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8568118Z  1900 |       0x80,
2025-12-04T12:35:04.8568212Z       |       ^~~~
2025-12-04T12:35:04.8569402Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8569505Z  1902 |       0x80,
2025-12-04T12:35:04.8569599Z       |       ^~~~
2025-12-04T12:35:04.8570806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8570912Z  1904 |       0x80,
2025-12-04T12:35:04.8571197Z       |       ^~~~
2025-12-04T12:35:04.8572386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8572483Z  1906 |       0x80,
2025-12-04T12:35:04.8572594Z       |       ^~~~
2025-12-04T12:35:04.8573942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8574165Z  1908 |       0x80,
2025-12-04T12:35:04.8574259Z       |       ^~~~
2025-12-04T12:35:04.8575502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8575660Z  1910 |       0x80,
2025-12-04T12:35:04.8575756Z       |       ^~~~
2025-12-04T12:35:04.8577055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8577168Z  1912 |       0x80,
2025-12-04T12:35:04.8577259Z       |       ^~~~
2025-12-04T12:35:04.8578465Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8578566Z  1914 |       0x80,
2025-12-04T12:35:04.8578659Z       |       ^~~~
2025-12-04T12:35:04.8579858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8579956Z  1916 |       0x80,
2025-12-04T12:35:04.8580061Z       |       ^~~~
2025-12-04T12:35:04.8581430Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8581527Z  1918 |       0x80,
2025-12-04T12:35:04.8581637Z       |       ^~~~
2025-12-04T12:35:04.8582875Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8582977Z  1920 |       0x80,
2025-12-04T12:35:04.8583082Z       |       ^~~~
2025-12-04T12:35:04.8584348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8584501Z  1922 |       0x80,
2025-12-04T12:35:04.8584615Z       |       ^~~~
2025-12-04T12:35:04.8585936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8586050Z  1924 |       0x80,
2025-12-04T12:35:04.8586141Z       |       ^~~~
2025-12-04T12:35:04.8587331Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8587484Z  1926 |       0x80,
2025-12-04T12:35:04.8587588Z       |       ^~~~
2025-12-04T12:35:04.8588906Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8589053Z  1928 |       0x80);
2025-12-04T12:35:04.8589144Z       |       ^~~~
2025-12-04T12:35:04.8590415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8590512Z  1930 |       0x80,
2025-12-04T12:35:04.8590615Z       |       ^~~~
2025-12-04T12:35:04.8591862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8591990Z  1932 |       0x80,
2025-12-04T12:35:04.8592119Z       |       ^~~~
2025-12-04T12:35:04.8593348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8593463Z  1934 |       0x80,
2025-12-04T12:35:04.8593557Z       |       ^~~~
2025-12-04T12:35:04.8595200Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8595362Z  1936 |       0x80,
2025-12-04T12:35:04.8595522Z       |       ^~~~
2025-12-04T12:35:04.8597290Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8597473Z  1938 |       0x80,
2025-12-04T12:35:04.8597629Z       |       ^~~~
2025-12-04T12:35:04.8599279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8599444Z  1940 |       0x80,
2025-12-04T12:35:04.8599604Z       |       ^~~~
2025-12-04T12:35:04.8601069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8601167Z  1942 |       0x80,
2025-12-04T12:35:04.8601260Z       |       ^~~~
2025-12-04T12:35:04.8602762Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8602930Z  1944 |       0x80,
2025-12-04T12:35:04.8603058Z       |       ^~~~
2025-12-04T12:35:04.8604612Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8604715Z  1946 |       0x80,
2025-12-04T12:35:04.8604823Z       |       ^~~~
2025-12-04T12:35:04.8606037Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8606208Z  1948 |       0x80,
2025-12-04T12:35:04.8606361Z       |       ^~~~
2025-12-04T12:35:04.8608458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8608572Z  1950 |       0x80,
2025-12-04T12:35:04.8608670Z       |       ^~~~
2025-12-04T12:35:04.8609863Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8610057Z  1952 |       0x80,
2025-12-04T12:35:04.8610151Z       |       ^~~~
2025-12-04T12:35:04.8611806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8611962Z  1954 |       0x80,
2025-12-04T12:35:04.8612121Z       |       ^~~~
2025-12-04T12:35:04.8614025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8614189Z  1956 |       0x80,
2025-12-04T12:35:04.8614360Z       |       ^~~~
2025-12-04T12:35:04.8616125Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8616233Z  1958 |       0x80,
2025-12-04T12:35:04.8616420Z       |       ^~~~
2025-12-04T12:35:04.8617945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8618104Z  1960 |       0x80,
2025-12-04T12:35:04.8618277Z       |       ^~~~
2025-12-04T12:35:04.8620061Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8620235Z  1962 |       0x80,
2025-12-04T12:35:04.8620386Z       |       ^~~~
2025-12-04T12:35:04.8621775Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8621893Z  1964 |       0x80,
2025-12-04T12:35:04.8621985Z       |       ^~~~
2025-12-04T12:35:04.8623963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8624062Z  1966 |       0x80,
2025-12-04T12:35:04.8624156Z       |       ^~~~
2025-12-04T12:35:04.8625474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8625633Z  1968 |       0x80,
2025-12-04T12:35:04.8625789Z       |       ^~~~
2025-12-04T12:35:04.8627491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8627595Z  1970 |       0x80,
2025-12-04T12:35:04.8627700Z       |       ^~~~
2025-12-04T12:35:04.8629666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8629814Z  1972 |       0x80,
2025-12-04T12:35:04.8629949Z       |       ^~~~
2025-12-04T12:35:04.8631334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8631442Z  1974 |       0x80,
2025-12-04T12:35:04.8631535Z       |       ^~~~
2025-12-04T12:35:04.8633067Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8633289Z  1976 |       0x80,
2025-12-04T12:35:04.8633440Z       |       ^~~~
2025-12-04T12:35:04.8635006Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8635115Z  1978 |       0x80,
2025-12-04T12:35:04.8635214Z       |       ^~~~
2025-12-04T12:35:04.8636561Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8636658Z  1980 |       0x80,
2025-12-04T12:35:04.8636749Z       |       ^~~~
2025-12-04T12:35:04.8637950Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8638050Z  1982 |       0x80,
2025-12-04T12:35:04.8638155Z       |       ^~~~
2025-12-04T12:35:04.8639652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8639821Z  1984 |       0x80,
2025-12-04T12:35:04.8639997Z       |       ^~~~
2025-12-04T12:35:04.8642098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8642253Z  1986 |       0x80,
2025-12-04T12:35:04.8642422Z       |       ^~~~
2025-12-04T12:35:04.8643803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8643920Z  1988 |       0x80,
2025-12-04T12:35:04.8644014Z       |       ^~~~
2025-12-04T12:35:04.8645505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8645676Z  1990 |       0x80,
2025-12-04T12:35:04.8645826Z       |       ^~~~
2025-12-04T12:35:04.8647089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8647184Z  1992 |       0x80,
2025-12-04T12:35:04.8647278Z       |       ^~~~
2025-12-04T12:35:04.8648498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.8648663Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.8648791Z       |                                      ^~~~~~
2025-12-04T12:35:04.8651665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = true; T = unsigned char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.8652334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2081:27:   required from here
2025-12-04T12:35:04.8653640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8653737Z  1866 |       0x80,
2025-12-04T12:35:04.8653844Z       |       ^~~~
2025-12-04T12:35:04.8655873Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8656050Z  1868 |       0x80,
2025-12-04T12:35:04.8656220Z       |       ^~~~
2025-12-04T12:35:04.8658328Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8658443Z  1870 |       0x80,
2025-12-04T12:35:04.8658536Z       |       ^~~~
2025-12-04T12:35:04.8659746Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8659858Z  1872 |       0x80,
2025-12-04T12:35:04.8660008Z       |       ^~~~
2025-12-04T12:35:04.8661342Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8661460Z  1874 |       0x80,
2025-12-04T12:35:04.8661553Z       |       ^~~~
2025-12-04T12:35:04.8662976Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8663110Z  1876 |       0x80,
2025-12-04T12:35:04.8663212Z       |       ^~~~
2025-12-04T12:35:04.8664490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8664587Z  1878 |       0x80,
2025-12-04T12:35:04.8664687Z       |       ^~~~
2025-12-04T12:35:04.8665885Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8665988Z  1880 |       0x80,
2025-12-04T12:35:04.8666096Z       |       ^~~~
2025-12-04T12:35:04.8667271Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8667366Z  1882 |       0x80,
2025-12-04T12:35:04.8667481Z       |       ^~~~
2025-12-04T12:35:04.8668670Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8668782Z  1884 |       0x80,
2025-12-04T12:35:04.8668883Z       |       ^~~~
2025-12-04T12:35:04.8670060Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8670253Z  1886 |       0x80,
2025-12-04T12:35:04.8670348Z       |       ^~~~
2025-12-04T12:35:04.8671750Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8671863Z  1888 |       0x80,
2025-12-04T12:35:04.8672109Z       |       ^~~~
2025-12-04T12:35:04.8673924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8674088Z  1890 |       0x80,
2025-12-04T12:35:04.8674331Z       |       ^~~~
2025-12-04T12:35:04.8676427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8676545Z  1892 |       0x80,
2025-12-04T12:35:04.8676657Z       |       ^~~~
2025-12-04T12:35:04.8677842Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8677938Z  1894 |       0x80,
2025-12-04T12:35:04.8678060Z       |       ^~~~
2025-12-04T12:35:04.8679310Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8679407Z  1896 |       0x80,
2025-12-04T12:35:04.8679526Z       |       ^~~~
2025-12-04T12:35:04.8680712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8680823Z  1898 |       0x80,
2025-12-04T12:35:04.8680917Z       |       ^~~~
2025-12-04T12:35:04.8682091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8682203Z  1900 |       0x80,
2025-12-04T12:35:04.8682310Z       |       ^~~~
2025-12-04T12:35:04.8684257Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8684418Z  1902 |       0x80,
2025-12-04T12:35:04.8684555Z       |       ^~~~
2025-12-04T12:35:04.8686501Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8686669Z  1904 |       0x80,
2025-12-04T12:35:04.8686828Z       |       ^~~~
2025-12-04T12:35:04.8688084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8688233Z  1906 |       0x80,
2025-12-04T12:35:04.8688536Z       |       ^~~~
2025-12-04T12:35:04.8689879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8689982Z  1908 |       0x80,
2025-12-04T12:35:04.8690090Z       |       ^~~~
2025-12-04T12:35:04.8691762Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8691964Z  1910 |       0x80,
2025-12-04T12:35:04.8692055Z       |       ^~~~
2025-12-04T12:35:04.8693355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8693598Z  1912 |       0x80,
2025-12-04T12:35:04.8693765Z       |       ^~~~
2025-12-04T12:35:04.8695828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8696063Z  1914 |       0x80,
2025-12-04T12:35:04.8696209Z       |       ^~~~
2025-12-04T12:35:04.8698119Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8698227Z  1916 |       0x80,
2025-12-04T12:35:04.8698318Z       |       ^~~~
2025-12-04T12:35:04.8699688Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8699823Z  1918 |       0x80,
2025-12-04T12:35:04.8699930Z       |       ^~~~
2025-12-04T12:35:04.8701517Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8701620Z  1920 |       0x80,
2025-12-04T12:35:04.8701727Z       |       ^~~~
2025-12-04T12:35:04.8703280Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8703391Z  1922 |       0x80,
2025-12-04T12:35:04.8703499Z       |       ^~~~
2025-12-04T12:35:04.8705029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8705223Z  1924 |       0x80,
2025-12-04T12:35:04.8705371Z       |       ^~~~
2025-12-04T12:35:04.8707471Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8707607Z  1926 |       0x80,
2025-12-04T12:35:04.8707700Z       |       ^~~~
2025-12-04T12:35:04.8709254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8709514Z  1928 |       0x80);
2025-12-04T12:35:04.8709612Z       |       ^~~~
2025-12-04T12:35:04.8710890Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8711036Z  1930 |       0x80,
2025-12-04T12:35:04.8711127Z       |       ^~~~
2025-12-04T12:35:04.8712685Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8712839Z  1932 |       0x80,
2025-12-04T12:35:04.8712948Z       |       ^~~~
2025-12-04T12:35:04.8714134Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8714234Z  1934 |       0x80,
2025-12-04T12:35:04.8714341Z       |       ^~~~
2025-12-04T12:35:04.8716189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8716370Z  1936 |       0x80,
2025-12-04T12:35:04.8716521Z       |       ^~~~
2025-12-04T12:35:04.8718054Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8718170Z  1938 |       0x80,
2025-12-04T12:35:04.8718265Z       |       ^~~~
2025-12-04T12:35:04.8719945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8720066Z  1940 |       0x80,
2025-12-04T12:35:04.8720160Z       |       ^~~~
2025-12-04T12:35:04.8721511Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8721673Z  1942 |       0x80,
2025-12-04T12:35:04.8721831Z       |       ^~~~
2025-12-04T12:35:04.8723289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8723386Z  1944 |       0x80,
2025-12-04T12:35:04.8723478Z       |       ^~~~
2025-12-04T12:35:04.8724912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8725085Z  1946 |       0x80,
2025-12-04T12:35:04.8725226Z       |       ^~~~
2025-12-04T12:35:04.8726412Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8726513Z  1948 |       0x80,
2025-12-04T12:35:04.8726618Z       |       ^~~~
2025-12-04T12:35:04.8728332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8728509Z  1950 |       0x80,
2025-12-04T12:35:04.8728665Z       |       ^~~~
2025-12-04T12:35:04.8730623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8730819Z  1952 |       0x80,
2025-12-04T12:35:04.8730914Z       |       ^~~~
2025-12-04T12:35:04.8732401Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8732619Z  1954 |       0x80,
2025-12-04T12:35:04.8732714Z       |       ^~~~
2025-12-04T12:35:04.8734393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8734517Z  1956 |       0x80,
2025-12-04T12:35:04.8734608Z       |       ^~~~
2025-12-04T12:35:04.8735936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8736099Z  1958 |       0x80,
2025-12-04T12:35:04.8736266Z       |       ^~~~
2025-12-04T12:35:04.8737696Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8737798Z  1960 |       0x80,
2025-12-04T12:35:04.8737905Z       |       ^~~~
2025-12-04T12:35:04.8739594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8739756Z  1962 |       0x80,
2025-12-04T12:35:04.8739932Z       |       ^~~~
2025-12-04T12:35:04.8741906Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8742027Z  1964 |       0x80,
2025-12-04T12:35:04.8742120Z       |       ^~~~
2025-12-04T12:35:04.8743723Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8743843Z  1966 |       0x80,
2025-12-04T12:35:04.8743934Z       |       ^~~~
2025-12-04T12:35:04.8745424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8745588Z  1968 |       0x80,
2025-12-04T12:35:04.8745722Z       |       ^~~~
2025-12-04T12:35:04.8747067Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8747234Z  1970 |       0x80,
2025-12-04T12:35:04.8747387Z       |       ^~~~
2025-12-04T12:35:04.8749060Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8749162Z  1972 |       0x80,
2025-12-04T12:35:04.8749270Z       |       ^~~~
2025-12-04T12:35:04.8750931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8751094Z  1974 |       0x80,
2025-12-04T12:35:04.8751259Z       |       ^~~~
2025-12-04T12:35:04.8753002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8753207Z  1976 |       0x80,
2025-12-04T12:35:04.8753301Z       |       ^~~~
2025-12-04T12:35:04.8755072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8755224Z  1978 |       0x80,
2025-12-04T12:35:04.8755316Z       |       ^~~~
2025-12-04T12:35:04.8756979Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8757162Z  1980 |       0x80,
2025-12-04T12:35:04.8757305Z       |       ^~~~
2025-12-04T12:35:04.8758672Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8758779Z  1982 |       0x80,
2025-12-04T12:35:04.8758928Z       |       ^~~~
2025-12-04T12:35:04.8760812Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8760916Z  1984 |       0x80,
2025-12-04T12:35:04.8761063Z       |       ^~~~
2025-12-04T12:35:04.8762666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8762763Z  1986 |       0x80,
2025-12-04T12:35:04.8762869Z       |       ^~~~
2025-12-04T12:35:04.8764562Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8764734Z  1988 |       0x80,
2025-12-04T12:35:04.8764899Z       |       ^~~~
2025-12-04T12:35:04.8766657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8766776Z  1990 |       0x80,
2025-12-04T12:35:04.8766867Z       |       ^~~~
2025-12-04T12:35:04.8768449Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8768629Z  1992 |       0x80,
2025-12-04T12:35:04.8768785Z       |       ^~~~
2025-12-04T12:35:04.8770357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.8770630Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.8770824Z       |                                      ^~~~~~
2025-12-04T12:35:04.8774355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = false; T = signed char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.8774949Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2109:28:   required from here
2025-12-04T12:35:04.8777008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8777178Z  1866 |       0x80,
2025-12-04T12:35:04.8777306Z       |       ^~~~
2025-12-04T12:35:04.8779138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8779301Z  1868 |       0x80,
2025-12-04T12:35:04.8779411Z       |       ^~~~
2025-12-04T12:35:04.8780654Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8780781Z  1870 |       0x80,
2025-12-04T12:35:04.8780949Z       |       ^~~~
2025-12-04T12:35:04.8783026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8783205Z  1872 |       0x80,
2025-12-04T12:35:04.8783359Z       |       ^~~~
2025-12-04T12:35:04.8784891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8785013Z  1874 |       0x80,
2025-12-04T12:35:04.8785108Z       |       ^~~~
2025-12-04T12:35:04.8786380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8786494Z  1876 |       0x80,
2025-12-04T12:35:04.8786599Z       |       ^~~~
2025-12-04T12:35:04.8787858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8787956Z  1878 |       0x80,
2025-12-04T12:35:04.8788052Z       |       ^~~~
2025-12-04T12:35:04.8789386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8789490Z  1880 |       0x80,
2025-12-04T12:35:04.8789584Z       |       ^~~~
2025-12-04T12:35:04.8790911Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8791009Z  1882 |       0x80,
2025-12-04T12:35:04.8791129Z       |       ^~~~
2025-12-04T12:35:04.8792369Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8792467Z  1884 |       0x80,
2025-12-04T12:35:04.8792579Z       |       ^~~~
2025-12-04T12:35:04.8793760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8793878Z  1886 |       0x80,
2025-12-04T12:35:04.8793973Z       |       ^~~~
2025-12-04T12:35:04.8795152Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8795264Z  1888 |       0x80,
2025-12-04T12:35:04.8795433Z       |       ^~~~
2025-12-04T12:35:04.8796609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8796718Z  1890 |       0x80,
2025-12-04T12:35:04.8796811Z       |       ^~~~
2025-12-04T12:35:04.8798042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8798169Z  1892 |       0x80,
2025-12-04T12:35:04.8798262Z       |       ^~~~
2025-12-04T12:35:04.8799492Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8799590Z  1894 |       0x80,
2025-12-04T12:35:04.8799704Z       |       ^~~~
2025-12-04T12:35:04.8800881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8800978Z  1896 |       0x80,
2025-12-04T12:35:04.8801081Z       |       ^~~~
2025-12-04T12:35:04.8802261Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8802364Z  1898 |       0x80,
2025-12-04T12:35:04.8802468Z       |       ^~~~
2025-12-04T12:35:04.8803642Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8803755Z  1900 |       0x80,
2025-12-04T12:35:04.8803846Z       |       ^~~~
2025-12-04T12:35:04.8805014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8805124Z  1902 |       0x80,
2025-12-04T12:35:04.8805219Z       |       ^~~~
2025-12-04T12:35:04.8806406Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8806505Z  1904 |       0x80,
2025-12-04T12:35:04.8806597Z       |       ^~~~
2025-12-04T12:35:04.8807782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8807915Z  1906 |       0x80,
2025-12-04T12:35:04.8808006Z       |       ^~~~
2025-12-04T12:35:04.8809196Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8809293Z  1908 |       0x80,
2025-12-04T12:35:04.8809399Z       |       ^~~~
2025-12-04T12:35:04.8810614Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8810707Z  1910 |       0x80,
2025-12-04T12:35:04.8810811Z       |       ^~~~
2025-12-04T12:35:04.8811981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8812097Z  1912 |       0x80,
2025-12-04T12:35:04.8812188Z       |       ^~~~
2025-12-04T12:35:04.8813441Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8813552Z  1914 |       0x80,
2025-12-04T12:35:04.8813645Z       |       ^~~~
2025-12-04T12:35:04.8814874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8814984Z  1916 |       0x80,
2025-12-04T12:35:04.8815076Z       |       ^~~~
2025-12-04T12:35:04.8816373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8816479Z  1918 |       0x80,
2025-12-04T12:35:04.8816571Z       |       ^~~~
2025-12-04T12:35:04.8817770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8817864Z  1920 |       0x80,
2025-12-04T12:35:04.8817962Z       |       ^~~~
2025-12-04T12:35:04.8819155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8819250Z  1922 |       0x80,
2025-12-04T12:35:04.8819361Z       |       ^~~~
2025-12-04T12:35:04.8820532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8820631Z  1924 |       0x80,
2025-12-04T12:35:04.8820739Z       |       ^~~~
2025-12-04T12:35:04.8821904Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8822012Z  1926 |       0x80,
2025-12-04T12:35:04.8822111Z       |       ^~~~
2025-12-04T12:35:04.8823288Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8823398Z  1928 |       0x80);
2025-12-04T12:35:04.8823496Z       |       ^~~~
2025-12-04T12:35:04.8824674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8824825Z  1930 |       0x80,
2025-12-04T12:35:04.8824919Z       |       ^~~~
2025-12-04T12:35:04.8826110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8826204Z  1932 |       0x80,
2025-12-04T12:35:04.8826342Z       |       ^~~~
2025-12-04T12:35:04.8827535Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8827629Z  1934 |       0x80,
2025-12-04T12:35:04.8827741Z       |       ^~~~
2025-12-04T12:35:04.8828912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8829011Z  1936 |       0x80,
2025-12-04T12:35:04.8829117Z       |       ^~~~
2025-12-04T12:35:04.8830284Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8830377Z  1938 |       0x80,
2025-12-04T12:35:04.8830550Z       |       ^~~~
2025-12-04T12:35:04.8831724Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8831830Z  1940 |       0x80,
2025-12-04T12:35:04.8831957Z       |       ^~~~
2025-12-04T12:35:04.8833131Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8833244Z  1942 |       0x80,
2025-12-04T12:35:04.8833337Z       |       ^~~~
2025-12-04T12:35:04.8834514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8834614Z  1944 |       0x80,
2025-12-04T12:35:04.8834714Z       |       ^~~~
2025-12-04T12:35:04.8835903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8836002Z  1946 |       0x80,
2025-12-04T12:35:04.8836094Z       |       ^~~~
2025-12-04T12:35:04.8837285Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8837384Z  1948 |       0x80,
2025-12-04T12:35:04.8837492Z       |       ^~~~
2025-12-04T12:35:04.8838661Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8838760Z  1950 |       0x80,
2025-12-04T12:35:04.8838870Z       |       ^~~~
2025-12-04T12:35:04.8840043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8840156Z  1952 |       0x80,
2025-12-04T12:35:04.8840247Z       |       ^~~~
2025-12-04T12:35:04.8841421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8841565Z  1954 |       0x80,
2025-12-04T12:35:04.8841657Z       |       ^~~~
2025-12-04T12:35:04.8842834Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8843013Z  1956 |       0x80,
2025-12-04T12:35:04.8843105Z       |       ^~~~
2025-12-04T12:35:04.8844289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8844417Z  1958 |       0x80,
2025-12-04T12:35:04.8844509Z       |       ^~~~
2025-12-04T12:35:04.8845695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8845797Z  1960 |       0x80,
2025-12-04T12:35:04.8845901Z       |       ^~~~
2025-12-04T12:35:04.8847071Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8847177Z  1962 |       0x80,
2025-12-04T12:35:04.8847287Z       |       ^~~~
2025-12-04T12:35:04.8848453Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8848552Z  1964 |       0x80,
2025-12-04T12:35:04.8848656Z       |       ^~~~
2025-12-04T12:35:04.8849823Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8849936Z  1966 |       0x80,
2025-12-04T12:35:04.8850028Z       |       ^~~~
2025-12-04T12:35:04.8851196Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8851314Z  1968 |       0x80,
2025-12-04T12:35:04.8851408Z       |       ^~~~
2025-12-04T12:35:04.8852591Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8852691Z  1970 |       0x80,
2025-12-04T12:35:04.8852788Z       |       ^~~~
2025-12-04T12:35:04.8853970Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8854071Z  1972 |       0x80,
2025-12-04T12:35:04.8854163Z       |       ^~~~
2025-12-04T12:35:04.8855350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8855450Z  1974 |       0x80,
2025-12-04T12:35:04.8855555Z       |       ^~~~
2025-12-04T12:35:04.8856801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8856899Z  1976 |       0x80,
2025-12-04T12:35:04.8857010Z       |       ^~~~
2025-12-04T12:35:04.8858192Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8858350Z  1978 |       0x80,
2025-12-04T12:35:04.8858443Z       |       ^~~~
2025-12-04T12:35:04.8859654Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8859797Z  1980 |       0x80,
2025-12-04T12:35:04.8859888Z       |       ^~~~
2025-12-04T12:35:04.8861098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8861207Z  1982 |       0x80,
2025-12-04T12:35:04.8861298Z       |       ^~~~
2025-12-04T12:35:04.8862485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8862585Z  1984 |       0x80,
2025-12-04T12:35:04.8862677Z       |       ^~~~
2025-12-04T12:35:04.8863872Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8863973Z  1986 |       0x80,
2025-12-04T12:35:04.8864068Z       |       ^~~~
2025-12-04T12:35:04.8865260Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8865354Z  1988 |       0x80,
2025-12-04T12:35:04.8865459Z       |       ^~~~
2025-12-04T12:35:04.8866622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8866722Z  1990 |       0x80,
2025-12-04T12:35:04.8866830Z       |       ^~~~
2025-12-04T12:35:04.8868008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8868126Z  1992 |       0x80,
2025-12-04T12:35:04.8868219Z       |       ^~~~
2025-12-04T12:35:04.8869396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.8869571Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.8869691Z       |                                      ^~~~~~
2025-12-04T12:35:04.8872324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with bool left_shift = false; T = unsigned char; typename std::enable_if<(is_same_v<T, signed char> || is_same_v<T, unsigned char>), int>::type <anonymous> = 0]’:
2025-12-04T12:35:04.8872922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2116:28:   required from here
2025-12-04T12:35:04.8874112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8874225Z  1866 |       0x80,
2025-12-04T12:35:04.8874319Z       |       ^~~~
2025-12-04T12:35:04.8875514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8875698Z  1868 |       0x80,
2025-12-04T12:35:04.8875792Z       |       ^~~~
2025-12-04T12:35:04.8877041Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8877179Z  1870 |       0x80,
2025-12-04T12:35:04.8877290Z       |       ^~~~
2025-12-04T12:35:04.8878515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8878612Z  1872 |       0x80,
2025-12-04T12:35:04.8878721Z       |       ^~~~
2025-12-04T12:35:04.8879902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8880003Z  1874 |       0x80,
2025-12-04T12:35:04.8880112Z       |       ^~~~
2025-12-04T12:35:04.8881287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8881403Z  1876 |       0x80,
2025-12-04T12:35:04.8881498Z       |       ^~~~
2025-12-04T12:35:04.8882678Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8882788Z  1878 |       0x80,
2025-12-04T12:35:04.8882881Z       |       ^~~~
2025-12-04T12:35:04.8884072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8884176Z  1880 |       0x80,
2025-12-04T12:35:04.8884271Z       |       ^~~~
2025-12-04T12:35:04.8885458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8885562Z  1882 |       0x80,
2025-12-04T12:35:04.8885655Z       |       ^~~~
2025-12-04T12:35:04.8886846Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8886940Z  1884 |       0x80,
2025-12-04T12:35:04.8887048Z       |       ^~~~
2025-12-04T12:35:04.8888220Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8888320Z  1886 |       0x80,
2025-12-04T12:35:04.8888428Z       |       ^~~~
2025-12-04T12:35:04.8889607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8889724Z  1888 |       0x80,
2025-12-04T12:35:04.8889819Z       |       ^~~~
2025-12-04T12:35:04.8890993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8891103Z  1890 |       0x80,
2025-12-04T12:35:04.8891196Z       |       ^~~~
2025-12-04T12:35:04.8892407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8892512Z  1892 |       0x80,
2025-12-04T12:35:04.8892602Z       |       ^~~~
2025-12-04T12:35:04.8893838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8893972Z  1894 |       0x80,
2025-12-04T12:35:04.8894064Z       |       ^~~~
2025-12-04T12:35:04.8895286Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8895381Z  1896 |       0x80,
2025-12-04T12:35:04.8895486Z       |       ^~~~
2025-12-04T12:35:04.8896730Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8896824Z  1898 |       0x80,
2025-12-04T12:35:04.8896933Z       |       ^~~~
2025-12-04T12:35:04.8898116Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8898221Z  1900 |       0x80,
2025-12-04T12:35:04.8898332Z       |       ^~~~
2025-12-04T12:35:04.8899507Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8899616Z  1902 |       0x80,
2025-12-04T12:35:04.8899709Z       |       ^~~~
2025-12-04T12:35:04.8900883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8900992Z  1904 |       0x80,
2025-12-04T12:35:04.8901088Z       |       ^~~~
2025-12-04T12:35:04.8902282Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8902383Z  1906 |       0x80,
2025-12-04T12:35:04.8902474Z       |       ^~~~
2025-12-04T12:35:04.8903664Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8903757Z  1908 |       0x80,
2025-12-04T12:35:04.8903849Z       |       ^~~~
2025-12-04T12:35:04.8905084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8905179Z  1910 |       0x80,
2025-12-04T12:35:04.8905284Z       |       ^~~~
2025-12-04T12:35:04.8906464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8906595Z  1912 |       0x80,
2025-12-04T12:35:04.8906700Z       |       ^~~~
2025-12-04T12:35:04.8907879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8907985Z  1914 |       0x80,
2025-12-04T12:35:04.8908084Z       |       ^~~~
2025-12-04T12:35:04.8909256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8909364Z  1916 |       0x80,
2025-12-04T12:35:04.8909456Z       |       ^~~~
2025-12-04T12:35:04.8910674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8910790Z  1918 |       0x80,
2025-12-04T12:35:04.8910882Z       |       ^~~~
2025-12-04T12:35:04.8912183Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8912280Z  1920 |       0x80,
2025-12-04T12:35:04.8912380Z       |       ^~~~
2025-12-04T12:35:04.8913571Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8913666Z  1922 |       0x80,
2025-12-04T12:35:04.8913760Z       |       ^~~~
2025-12-04T12:35:04.8914952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8915054Z  1924 |       0x80,
2025-12-04T12:35:04.8915163Z       |       ^~~~
2025-12-04T12:35:04.8916347Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8916443Z  1926 |       0x80,
2025-12-04T12:35:04.8916560Z       |       ^~~~
2025-12-04T12:35:04.8917737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8917846Z  1928 |       0x80);
2025-12-04T12:35:04.8917938Z       |       ^~~~
2025-12-04T12:35:04.8919111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8919224Z  1930 |       0x80,
2025-12-04T12:35:04.8919316Z       |       ^~~~
2025-12-04T12:35:04.8920495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8920601Z  1932 |       0x80,
2025-12-04T12:35:04.8920739Z       |       ^~~~
2025-12-04T12:35:04.8921926Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8922020Z  1934 |       0x80,
2025-12-04T12:35:04.8922111Z       |       ^~~~
2025-12-04T12:35:04.8923302Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8923430Z  1936 |       0x80,
2025-12-04T12:35:04.8923533Z       |       ^~~~
2025-12-04T12:35:04.8924711Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8924805Z  1938 |       0x80,
2025-12-04T12:35:04.8924917Z       |       ^~~~
2025-12-04T12:35:04.8926086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8926179Z  1940 |       0x80,
2025-12-04T12:35:04.8926283Z       |       ^~~~
2025-12-04T12:35:04.8927488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8927603Z  1942 |       0x80,
2025-12-04T12:35:04.8927693Z       |       ^~~~
2025-12-04T12:35:04.8928896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8929010Z  1944 |       0x80,
2025-12-04T12:35:04.8929102Z       |       ^~~~
2025-12-04T12:35:04.8930287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8930380Z  1946 |       0x80,
2025-12-04T12:35:04.8930471Z       |       ^~~~
2025-12-04T12:35:04.8931669Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8931763Z  1948 |       0x80,
2025-12-04T12:35:04.8931854Z       |       ^~~~
2025-12-04T12:35:04.8933047Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8933145Z  1950 |       0x80,
2025-12-04T12:35:04.8933249Z       |       ^~~~
2025-12-04T12:35:04.8934419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8934511Z  1952 |       0x80,
2025-12-04T12:35:04.8934616Z       |       ^~~~
2025-12-04T12:35:04.8935795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8935901Z  1954 |       0x80,
2025-12-04T12:35:04.8935991Z       |       ^~~~
2025-12-04T12:35:04.8937240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8937392Z  1956 |       0x80,
2025-12-04T12:35:04.8937484Z       |       ^~~~
2025-12-04T12:35:04.8938659Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8938769Z  1958 |       0x80,
2025-12-04T12:35:04.8938860Z       |       ^~~~
2025-12-04T12:35:04.8940117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8940213Z  1960 |       0x80,
2025-12-04T12:35:04.8940305Z       |       ^~~~
2025-12-04T12:35:04.8941526Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8941628Z  1962 |       0x80,
2025-12-04T12:35:04.8941719Z       |       ^~~~
2025-12-04T12:35:04.8942910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8943004Z  1964 |       0x80,
2025-12-04T12:35:04.8943118Z       |       ^~~~
2025-12-04T12:35:04.8944297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8944394Z  1966 |       0x80,
2025-12-04T12:35:04.8944508Z       |       ^~~~
2025-12-04T12:35:04.8945681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8945797Z  1968 |       0x80,
2025-12-04T12:35:04.8945889Z       |       ^~~~
2025-12-04T12:35:04.8947054Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8947162Z  1970 |       0x80,
2025-12-04T12:35:04.8947266Z       |       ^~~~
2025-12-04T12:35:04.8948438Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8948545Z  1972 |       0x80,
2025-12-04T12:35:04.8948644Z       |       ^~~~
2025-12-04T12:35:04.8949833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8949932Z  1974 |       0x80,
2025-12-04T12:35:04.8950027Z       |       ^~~~
2025-12-04T12:35:04.8951215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8951310Z  1976 |       0x80,
2025-12-04T12:35:04.8951427Z       |       ^~~~
2025-12-04T12:35:04.8952597Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8952692Z  1978 |       0x80,
2025-12-04T12:35:04.8952804Z       |       ^~~~
2025-12-04T12:35:04.8953973Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8954140Z  1980 |       0x80,
2025-12-04T12:35:04.8954232Z       |       ^~~~
2025-12-04T12:35:04.8955406Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8955528Z  1982 |       0x80,
2025-12-04T12:35:04.8955693Z       |       ^~~~
2025-12-04T12:35:04.8956865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8956976Z  1984 |       0x80,
2025-12-04T12:35:04.8957104Z       |       ^~~~
2025-12-04T12:35:04.8958298Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8958399Z  1986 |       0x80,
2025-12-04T12:35:04.8958493Z       |       ^~~~
2025-12-04T12:35:04.8959682Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8959784Z  1988 |       0x80,
2025-12-04T12:35:04.8959883Z       |       ^~~~
2025-12-04T12:35:04.8961067Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8961168Z  1990 |       0x80,
2025-12-04T12:35:04.8961276Z       |       ^~~~
2025-12-04T12:35:04.8962450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow]
2025-12-04T12:35:04.8962551Z  1992 |       0x80,
2025-12-04T12:35:04.8962662Z       |       ^~~~
2025-12-04T12:35:04.8963835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow]
2025-12-04T12:35:04.8964015Z  2002 |   __m512i keep_1 = _mm512_set1_epi16(0xFF00);
2025-12-04T12:35:04.8964142Z       |                                      ^~~~~~
2025-12-04T12:35:04.8964645Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:16,
2025-12-04T12:35:04.8965035Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5,
2025-12-04T12:35:04.8965481Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
2025-12-04T12:35:04.8965903Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
2025-12-04T12:35:04.8966374Z                  from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45,
2025-12-04T12:35:04.8967054Z                  from /tmp/uDOzLN/tmpn56tbzg5/data/aotinductor/model1/cwulnadwx3jyqkgl526d3bpo7ziav2n33dginvvv4zbkqn5jle4v.wrapper.cpp:729:
2025-12-04T12:35:04.8968533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = signed char; int64_t = long int]’:
2025-12-04T12:35:04.8969115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:696:31:   required from here
2025-12-04T12:35:04.8970316Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8970477Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8970572Z       |       ^~~~
2025-12-04T12:35:04.8972082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8972248Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8972364Z       |             ^~~~
2025-12-04T12:35:04.8973615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8973732Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8973851Z       |                   ^~~~
2025-12-04T12:35:04.8975042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8975179Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8975284Z       |                         ^~~~
2025-12-04T12:35:04.8976535Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8976672Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8976768Z       |       ^~~~
2025-12-04T12:35:04.8977964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8978091Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8978194Z       |             ^~~~
2025-12-04T12:35:04.8979390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8979502Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8979601Z       |                   ^~~~
2025-12-04T12:35:04.8980814Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8980928Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8981044Z       |                         ^~~~
2025-12-04T12:35:04.8982219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8982335Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8982445Z       |       ^~~~
2025-12-04T12:35:04.8983618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8983745Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8983848Z       |             ^~~~
2025-12-04T12:35:04.8985032Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8985166Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8985265Z       |                   ^~~~
2025-12-04T12:35:04.8986446Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8986624Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8986728Z       |                         ^~~~
2025-12-04T12:35:04.8987964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8988108Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8988201Z       |       ^~~~
2025-12-04T12:35:04.8989426Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8989540Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8989653Z       |             ^~~~
2025-12-04T12:35:04.8990840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8990960Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8991077Z       |                   ^~~~
2025-12-04T12:35:04.8992264Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8992395Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8992498Z       |                         ^~~~
2025-12-04T12:35:04.8993677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8993801Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8993901Z       |       ^~~~
2025-12-04T12:35:04.8995078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8995202Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8995299Z       |             ^~~~
2025-12-04T12:35:04.8996504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8996616Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8996722Z       |                   ^~~~
2025-12-04T12:35:04.8997916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8998068Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8998183Z       |                         ^~~~
2025-12-04T12:35:04.8999363Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.8999482Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.8999624Z       |       ^~~~
2025-12-04T12:35:04.9000800Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9000919Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9001031Z       |             ^~~~
2025-12-04T12:35:04.9002211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9002342Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9002442Z       |                   ^~~~
2025-12-04T12:35:04.9003659Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9003795Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9003894Z       |                         ^~~~
2025-12-04T12:35:04.9005143Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9005259Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9005350Z       |       ^~~~
2025-12-04T12:35:04.9006544Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9006656Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9006766Z       |             ^~~~
2025-12-04T12:35:04.9007951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9008069Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9008182Z       |                   ^~~~
2025-12-04T12:35:04.9009367Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9009480Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9009604Z       |                         ^~~~
2025-12-04T12:35:04.9010788Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9010916Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9011007Z       |       ^~~~
2025-12-04T12:35:04.9012209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9012333Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9012435Z       |             ^~~~
2025-12-04T12:35:04.9013624Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9013776Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9013874Z       |                   ^~~~
2025-12-04T12:35:04.9015072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9015189Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9015338Z       |                         ^~~~
2025-12-04T12:35:04.9016598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9016711Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9016818Z       |       ^~~~
2025-12-04T12:35:04.9018000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9018122Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9018238Z       |             ^~~~
2025-12-04T12:35:04.9019464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9019598Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9019699Z       |                   ^~~~
2025-12-04T12:35:04.9020913Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9021041Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9021143Z       |                         ^~~~
2025-12-04T12:35:04.9022338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9022448Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9022542Z       |       ^~~~
2025-12-04T12:35:04.9023743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9023860Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9023957Z       |             ^~~~
2025-12-04T12:35:04.9025159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9025270Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9025388Z       |                   ^~~~
2025-12-04T12:35:04.9026568Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9026679Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9026798Z       |                         ^~~~
2025-12-04T12:35:04.9027983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9028108Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9028208Z       |       ^~~~
2025-12-04T12:35:04.9029391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9029558Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9029653Z       |             ^~~~
2025-12-04T12:35:04.9030853Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9031007Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9031154Z       |                   ^~~~
2025-12-04T12:35:04.9032356Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9032507Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9032610Z       |                         ^~~~
2025-12-04T12:35:04.9033803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9033926Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9034035Z       |       ^~~~
2025-12-04T12:35:04.9035220Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9035339Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9035456Z       |             ^~~~
2025-12-04T12:35:04.9036643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9036771Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9036872Z       |                   ^~~~
2025-12-04T12:35:04.9038054Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9038179Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9038280Z       |                         ^~~~
2025-12-04T12:35:04.9039767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = unsigned char; int64_t = long int]’:
2025-12-04T12:35:04.9040362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:933:31:   required from here
2025-12-04T12:35:04.9041538Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9041671Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9041764Z       |       ^~~~
2025-12-04T12:35:04.9042959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9043078Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9043182Z       |             ^~~~
2025-12-04T12:35:04.9044375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9044494Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9044605Z       |                   ^~~~
2025-12-04T12:35:04.9045782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9045934Z   201 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9046050Z       |                         ^~~~
2025-12-04T12:35:04.9047262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9047407Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9047515Z       |       ^~~~
2025-12-04T12:35:04.9048737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9048866Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9048965Z       |             ^~~~
2025-12-04T12:35:04.9050147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9050286Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9050389Z       |                   ^~~~
2025-12-04T12:35:04.9051590Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9051710Z   202 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9051813Z       |                         ^~~~
2025-12-04T12:35:04.9053008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9053123Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9053224Z       |       ^~~~
2025-12-04T12:35:04.9054419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9054533Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9054647Z       |             ^~~~
2025-12-04T12:35:04.9055841Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9055956Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9056081Z       |                   ^~~~
2025-12-04T12:35:04.9057326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9057463Z   203 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9057567Z       |                         ^~~~
2025-12-04T12:35:04.9058751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9058890Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9058991Z       |       ^~~~
2025-12-04T12:35:04.9060186Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9060307Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9060405Z       |             ^~~~
2025-12-04T12:35:04.9061603Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9061763Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9061866Z       |                   ^~~~
2025-12-04T12:35:04.9063098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9063261Z   205 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9063375Z       |                         ^~~~
2025-12-04T12:35:04.9064585Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9064701Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9064810Z       |       ^~~~
2025-12-04T12:35:04.9065992Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9066129Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9066229Z       |             ^~~~
2025-12-04T12:35:04.9067417Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9067546Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9067648Z       |                   ^~~~
2025-12-04T12:35:04.9068844Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9068956Z   206 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9069064Z       |                         ^~~~
2025-12-04T12:35:04.9070258Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9070371Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9070463Z       |       ^~~~
2025-12-04T12:35:04.9071836Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9071947Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9072064Z       |             ^~~~
2025-12-04T12:35:04.9073241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9073359Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9073474Z       |                   ^~~~
2025-12-04T12:35:04.9074658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9074792Z   207 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9074897Z       |                         ^~~~
2025-12-04T12:35:04.9076069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9076199Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9076292Z       |       ^~~~
2025-12-04T12:35:04.9077491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9077695Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9077791Z       |             ^~~~
2025-12-04T12:35:04.9079043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9079199Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9079302Z       |                   ^~~~
2025-12-04T12:35:04.9080544Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9080662Z   209 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9080784Z       |                         ^~~~
2025-12-04T12:35:04.9081973Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9082087Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9082197Z       |       ^~~~
2025-12-04T12:35:04.9083384Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9083515Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9083610Z       |             ^~~~
2025-12-04T12:35:04.9084795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9084921Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9085028Z       |                   ^~~~
2025-12-04T12:35:04.9086221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9086336Z   210 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9086445Z       |                         ^~~~
2025-12-04T12:35:04.9087640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9087754Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9087854Z       |       ^~~~
2025-12-04T12:35:04.9089043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9089198Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9089307Z       |             ^~~~
2025-12-04T12:35:04.9090484Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9090603Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9090753Z       |                   ^~~~
2025-12-04T12:35:04.9091934Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9092072Z   211 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9092174Z       |                         ^~~~
2025-12-04T12:35:04.9093352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9093485Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9093578Z       |       ^~~~
2025-12-04T12:35:04.9094810Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9094928Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9095026Z       |             ^~~~
2025-12-04T12:35:04.9096258Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9096447Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9096550Z       |                   ^~~~
2025-12-04T12:35:04.9097769Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9097883Z   213 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9098001Z       |                         ^~~~
2025-12-04T12:35:04.9099189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9099310Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9099422Z       |       ^~~~
2025-12-04T12:35:04.9100607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9100737Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9100842Z       |             ^~~~
2025-12-04T12:35:04.9102020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9102147Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9102249Z       |                   ^~~~
2025-12-04T12:35:04.9103452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9103563Z   214 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9103672Z       |                         ^~~~
2025-12-04T12:35:04.9104857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9105020Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9105113Z       |       ^~~~
2025-12-04T12:35:04.9106311Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9106430Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9106579Z       |             ^~~~
2025-12-04T12:35:04.9107765Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9107883Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9107996Z       |                   ^~~~
2025-12-04T12:35:04.9109172Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow]
2025-12-04T12:35:04.9109304Z   215 |       0xff, 0xff, 0xff, 0xff,
2025-12-04T12:35:04.9109405Z       |                         ^~~~
2025-12-04T12:35:04.9109511Z PASSED [9.5383s] [ 48%]
2025-12-04T12:35:04.9110191Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_update_weights SKIPPED [0.0033s] (No support for cpp only) [ 50%]
2025-12-04T12:35:04.9110611Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_add PASSED [6.0938s] [ 51%]
2025-12-04T12:35:04.9111058Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_bool_input PASSED [6.0266s] [ 52%]
2025-12-04T12:35:04.9111785Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_after_package SKIPPED [0.0003s] (Test is only supported on CUDA 12.6+) [ 53%]
2025-12-04T12:35:04.9112508Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_after_package_multi_arch SKIPPED [0.0002s] (Test is only supported on CUDA 12.8+) [ 54%]
2025-12-04T12:35:04.9113234Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_after_package_static SKIPPED [0.0002s] (Test is only supported on CUDA 12.6+) [ 55%]
2025-12-04T12:35:04.9113899Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_standalone_cos SKIPPED [0.0034s] (Only meant to test cpp package) [ 56%]
2025-12-04T12:35:04.9114587Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_with_exporter SKIPPED [0.0002s] (Test is only supported on CUDA 12.6+) [ 57%]
2025-12-04T12:35:04.9115307Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_with_exporter_weights SKIPPED [0.0002s] (Test is only supported on CUDA 12.6+) [ 59%]
2025-12-04T12:35:04.9116506Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_deepcopy_compiled_model W1204 12:30:18.171000 140836 site-packages/torch/export/pt2_archive/_package.py:763] AOTICompiledModel deepcopy warning: AOTICompiledModel.loader is not deepcopied.
2025-12-04T12:35:04.9116630Z PASSED [6.0298s] [ 60%]
2025-12-04T12:35:04.9117104Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_duplicate_calls PASSED [17.3854s] [ 61%]
2025-12-04T12:35:04.9117976Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_linear W1204 12:30:36.578000 140836 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T12:35:04.9118084Z PASSED [7.0526s] [ 62%]
2025-12-04T12:35:04.9119128Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_loading_wrong_model W1204 12:30:48.678000 140836 site-packages/torch/_inductor/package/package.py:120] Loading outdated pt2 file. Please regenerate your package.
2025-12-04T12:35:04.9119247Z PASSED [6.0630s] [ 63%]
2025-12-04T12:35:04.9119681Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_metadata PASSED [6.1607s] [ 64%]
2025-12-04T12:35:04.9120209Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_multiple_methods PASSED [11.9880s] [ 65%]
2025-12-04T12:35:04.9120709Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_package_shared_weights PASSED [2.1197s] [ 67%]
2025-12-04T12:35:04.9121229Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_package_user_managed_weight PASSED [6.4443s] [ 68%]
2025-12-04T12:35:04.9121867Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_package_weights_on_disk_nested_module PASSED [5.4121s] [ 69%]
2025-12-04T12:35:04.9122368Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_package_without_weight PASSED [5.3768s] [ 70%]
2025-12-04T12:35:04.9122890Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_remove_intermediate_files PASSED [6.0808s] [ 71%]
2025-12-04T12:35:04.9123341Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_save_buffer PASSED [6.1294s] [ 72%]
2025-12-04T12:35:04.9123833Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_specified_output_dir PASSED [6.1094s] [ 73%]
2025-12-04T12:35:04.9124308Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_update_weights PASSED [5.7031s] [ 75%]
2025-12-04T12:35:04.9124771Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_add PASSED [9.4664s] [ 76%]
2025-12-04T12:35:04.9125252Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_bool_input PASSED [9.3557s] [ 77%]
2025-12-04T12:35:04.9125979Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_after_package SKIPPED [0.0003s] (Test is only supported on CUDA 12.6+) [ 78%]
2025-12-04T12:35:04.9126724Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_after_package_multi_arch SKIPPED [0.0002s] (Test is only supported on CUDA 12.8+) [ 79%]
2025-12-04T12:35:04.9127467Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_after_package_static SKIPPED [0.0002s] (Test is only supported on CUDA 12.6+) [ 80%]
2025-12-04T12:35:04.9128689Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos W1204 12:32:09.075000 140836 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True.
2025-12-04T12:35:04.9128838Z ('RERUN', {'yellow': True}) [0.6115s] [ 81%]
2025-12-04T12:35:04.9130065Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos W1204 12:32:09.687000 140836 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True.
2025-12-04T12:35:04.9130212Z ('RERUN', {'yellow': True}) [0.5757s] [ 81%]
2025-12-04T12:35:04.9131421Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos W1204 12:32:10.265000 140836 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True.
2025-12-04T12:35:04.9131527Z FAILED [0.5841s] [ 81%]
2025-12-04T12:35:04.9131537Z 
2025-12-04T12:35:04.9131697Z ==================================== RERUNS ====================================
2025-12-04T12:35:04.9132000Z __________ TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos __________
2025-12-04T12:35:04.9132138Z Traceback (most recent call last):
2025-12-04T12:35:04.9132673Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 452, in test_compile_standalone_cos
2025-12-04T12:35:04.9132804Z     build_path, _ = self.cmake_compile(
2025-12-04T12:35:04.9133273Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 179, in cmake_compile
2025-12-04T12:35:04.9133478Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T12:35:04.9134049Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T12:35:04.9134198Z     return aot_inductor_minifier_wrapper(
2025-12-04T12:35:04.9134741Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T12:35:04.9134850Z     raise e
2025-12-04T12:35:04.9135439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T12:35:04.9135539Z     return func(
2025-12-04T12:35:04.9136103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T12:35:04.9136405Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T12:35:04.9136884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T12:35:04.9137003Z     return compile_fx_aot(
2025-12-04T12:35:04.9137499Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T12:35:04.9137641Z     compiled_artifacts = compile_fx(
2025-12-04T12:35:04.9138153Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T12:35:04.9138264Z     return compile_fx(
2025-12-04T12:35:04.9138748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T12:35:04.9138886Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T12:35:04.9139502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T12:35:04.9139620Z     return _compile_fx_main(
2025-12-04T12:35:04.9140121Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T12:35:04.9140342Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T12:35:04.9140861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T12:35:04.9141015Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T12:35:04.9141537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T12:35:04.9141658Z     return compile_fx_forward(
2025-12-04T12:35:04.9142192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T12:35:04.9142309Z     return inner_compile(
2025-12-04T12:35:04.9142592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T12:35:04.9142723Z     return func(*args, **kwds)
2025-12-04T12:35:04.9143225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T12:35:04.9143492Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T12:35:04.9144001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T12:35:04.9144180Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T12:35:04.9144701Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T12:35:04.9144897Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:35:04.9145400Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T12:35:04.9145566Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:35:04.9146102Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:35:04.9146501Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:35:04.9147019Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1519, in codegen_and_compile
2025-12-04T12:35:04.9147165Z     compiled_fn = AotCodeCompiler.compile(
2025-12-04T12:35:04.9147671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2409, in compile
2025-12-04T12:35:04.9147779Z     subprocess.run(
2025-12-04T12:35:04.9148057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/subprocess.py", line 526, in run
2025-12-04T12:35:04.9148245Z     raise CalledProcessError(retcode, process.args,
2025-12-04T12:35:04.9150320Z torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255.
2025-12-04T12:35:04.9150330Z 
2025-12-04T12:35:04.9150570Z To execute this test, run the following from the base repo dir:
2025-12-04T12:35:04.9151324Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos
2025-12-04T12:35:04.9151334Z 
2025-12-04T12:35:04.9151621Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:35:04.9151850Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:35:04.9152685Z inductor [('async_compile_cache_miss', 2), ('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_hit', 1)]
2025-12-04T12:35:04.9152807Z graph_break []
2025-12-04T12:35:04.9153027Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:35:04.9153856Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T12:35:04.9153973Z   return cls.__new__(cls, *args)
2025-12-04T12:35:04.9155576Z nvcc -fatbin /tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75  failed with:
2025-12-04T12:35:04.9155687Z stdout:
2025-12-04T12:35:04.9155695Z 
2025-12-04T12:35:04.9155786Z stderr:
2025-12-04T12:35:04.9156644Z ptxas /tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal   : Unsupported .version 8.7; current version is '8.4'
2025-12-04T12:35:04.9156815Z ptxas fatal   : Ptx assembly aborted due to errors
2025-12-04T12:35:04.9156821Z 
2025-12-04T12:35:04.9157123Z __________ TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos __________
2025-12-04T12:35:04.9157257Z Traceback (most recent call last):
2025-12-04T12:35:04.9157796Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 452, in test_compile_standalone_cos
2025-12-04T12:35:04.9157938Z     build_path, _ = self.cmake_compile(
2025-12-04T12:35:04.9158394Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 179, in cmake_compile
2025-12-04T12:35:04.9158599Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T12:35:04.9159142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T12:35:04.9159273Z     return aot_inductor_minifier_wrapper(
2025-12-04T12:35:04.9159815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T12:35:04.9159962Z     raise e
2025-12-04T12:35:04.9160497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T12:35:04.9160609Z     return func(
2025-12-04T12:35:04.9161158Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T12:35:04.9161422Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T12:35:04.9161893Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T12:35:04.9162007Z     return compile_fx_aot(
2025-12-04T12:35:04.9162511Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T12:35:04.9162638Z     compiled_artifacts = compile_fx(
2025-12-04T12:35:04.9163110Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T12:35:04.9163231Z     return compile_fx(
2025-12-04T12:35:04.9163700Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T12:35:04.9163867Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T12:35:04.9164456Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T12:35:04.9164570Z     return _compile_fx_main(
2025-12-04T12:35:04.9165110Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T12:35:04.9165312Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T12:35:04.9165832Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T12:35:04.9165996Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T12:35:04.9166497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T12:35:04.9166613Z     return compile_fx_forward(
2025-12-04T12:35:04.9167144Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T12:35:04.9167258Z     return inner_compile(
2025-12-04T12:35:04.9167550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T12:35:04.9167663Z     return func(*args, **kwds)
2025-12-04T12:35:04.9168156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T12:35:04.9168437Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T12:35:04.9168928Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T12:35:04.9169119Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T12:35:04.9169619Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T12:35:04.9169813Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:35:04.9170330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T12:35:04.9170476Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:35:04.9171178Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:35:04.9171518Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:35:04.9172039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1519, in codegen_and_compile
2025-12-04T12:35:04.9172307Z     compiled_fn = AotCodeCompiler.compile(
2025-12-04T12:35:04.9172761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2409, in compile
2025-12-04T12:35:04.9172871Z     subprocess.run(
2025-12-04T12:35:04.9173167Z   File "/opt/conda/envs/py_3.10/lib/python3.10/subprocess.py", line 526, in run
2025-12-04T12:35:04.9173385Z     raise CalledProcessError(retcode, process.args,
2025-12-04T12:35:04.9175468Z torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmpb0ttpte1/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmpb0ttpte1/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255.
2025-12-04T12:35:04.9175476Z 
2025-12-04T12:35:04.9175698Z To execute this test, run the following from the base repo dir:
2025-12-04T12:35:04.9176386Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos
2025-12-04T12:35:04.9176412Z 
2025-12-04T12:35:04.9176684Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:35:04.9176964Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:35:04.9177779Z inductor [('async_compile_cache_miss', 2), ('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_hit', 1)]
2025-12-04T12:35:04.9177884Z graph_break []
2025-12-04T12:35:04.9178153Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:35:04.9178986Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T12:35:04.9179106Z   return cls.__new__(cls, *args)
2025-12-04T12:35:04.9180727Z nvcc -fatbin /tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75  failed with:
2025-12-04T12:35:04.9180826Z stdout:
2025-12-04T12:35:04.9180831Z 
2025-12-04T12:35:04.9180926Z stderr:
2025-12-04T12:35:04.9181792Z ptxas /tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal   : Unsupported .version 8.7; current version is '8.4'
2025-12-04T12:35:04.9181963Z ptxas fatal   : Ptx assembly aborted due to errors
﻿2025-12-04T12:35:04.9185072Z 
2025-12-04T12:35:04.9185316Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:35:04.9186150Z inductor [('async_compile_cache_miss', 2), ('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_hit', 1)]
2025-12-04T12:35:04.9186256Z graph_break []
2025-12-04T12:35:04.9186494Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:35:04.9187311Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T12:35:04.9187430Z   return cls.__new__(cls, *args)
2025-12-04T12:35:04.9189043Z nvcc -fatbin /tmp/tmpb0ttpte1/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpb0ttpte1/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75  failed with:
2025-12-04T12:35:04.9189174Z stdout:
2025-12-04T12:35:04.9189180Z 
2025-12-04T12:35:04.9189342Z stderr:
2025-12-04T12:35:04.9190183Z ptxas /tmp/tmpb0ttpte1/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal   : Unsupported .version 8.7; current version is '8.4'
2025-12-04T12:35:04.9190351Z ptxas fatal   : Ptx assembly aborted due to errors
2025-12-04T12:35:04.9190356Z 
2025-12-04T12:35:04.9190523Z =================================== FAILURES ===================================
2025-12-04T12:35:04.9190824Z __________ TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos __________
2025-12-04T12:35:04.9190951Z Traceback (most recent call last):
2025-12-04T12:35:04.9191498Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 452, in test_compile_standalone_cos
2025-12-04T12:35:04.9191627Z     build_path, _ = self.cmake_compile(
2025-12-04T12:35:04.9192097Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 179, in cmake_compile
2025-12-04T12:35:04.9192303Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T12:35:04.9192840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T12:35:04.9192987Z     return aot_inductor_minifier_wrapper(
2025-12-04T12:35:04.9193565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T12:35:04.9193674Z     raise e
2025-12-04T12:35:04.9194210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T12:35:04.9194310Z     return func(
2025-12-04T12:35:04.9194904Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T12:35:04.9195142Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T12:35:04.9195597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T12:35:04.9195729Z     return compile_fx_aot(
2025-12-04T12:35:04.9196220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T12:35:04.9196359Z     compiled_artifacts = compile_fx(
2025-12-04T12:35:04.9196831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T12:35:04.9196938Z     return compile_fx(
2025-12-04T12:35:04.9197416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T12:35:04.9197555Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T12:35:04.9198128Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T12:35:04.9198334Z     return _compile_fx_main(
2025-12-04T12:35:04.9198844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T12:35:04.9199060Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T12:35:04.9199579Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T12:35:04.9199734Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T12:35:04.9200251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T12:35:04.9200372Z     return compile_fx_forward(
2025-12-04T12:35:04.9200902Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T12:35:04.9201017Z     return inner_compile(
2025-12-04T12:35:04.9201301Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T12:35:04.9201434Z     return func(*args, **kwds)
2025-12-04T12:35:04.9201964Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T12:35:04.9202229Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T12:35:04.9202735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T12:35:04.9202910Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T12:35:04.9203415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T12:35:04.9208841Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:35:04.9209385Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T12:35:04.9209552Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:35:04.9210090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:35:04.9210419Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:35:04.9210958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1519, in codegen_and_compile
2025-12-04T12:35:04.9211191Z     compiled_fn = AotCodeCompiler.compile(
2025-12-04T12:35:04.9211663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2409, in compile
2025-12-04T12:35:04.9211772Z     subprocess.run(
2025-12-04T12:35:04.9212054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/subprocess.py", line 526, in run
2025-12-04T12:35:04.9212270Z     raise CalledProcessError(retcode, process.args,
2025-12-04T12:35:04.9214359Z torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmp7iehhac5/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmp7iehhac5/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255.
2025-12-04T12:35:04.9214373Z 
2025-12-04T12:35:04.9214606Z To execute this test, run the following from the base repo dir:
2025-12-04T12:35:04.9215239Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos
2025-12-04T12:35:04.9215246Z 
2025-12-04T12:35:04.9215519Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:35:04.9215767Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:35:04.9216677Z inductor [('async_compile_cache_miss', 2), ('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_hit', 1)]
2025-12-04T12:35:04.9216871Z graph_break []
2025-12-04T12:35:04.9217096Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:35:04.9217919Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T12:35:04.9218054Z   return cls.__new__(cls, *args)
2025-12-04T12:35:04.9219659Z nvcc -fatbin /tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75  failed with:
2025-12-04T12:35:04.9219772Z stdout:
2025-12-04T12:35:04.9219778Z 
2025-12-04T12:35:04.9219873Z stderr:
2025-12-04T12:35:04.9220720Z ptxas /tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal   : Unsupported .version 8.7; current version is '8.4'
2025-12-04T12:35:04.9220945Z ptxas fatal   : Ptx assembly aborted due to errors
2025-12-04T12:35:04.9220952Z 
2025-12-04T12:35:04.9221169Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:35:04.9221984Z inductor [('async_compile_cache_miss', 2), ('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_hit', 1)]
2025-12-04T12:35:04.9222086Z graph_break []
2025-12-04T12:35:04.9222307Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:35:04.9223140Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T12:35:04.9223262Z   return cls.__new__(cls, *args)
2025-12-04T12:35:04.9224865Z nvcc -fatbin /tmp/tmpb0ttpte1/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpb0ttpte1/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75  failed with:
2025-12-04T12:35:04.9224962Z stdout:
2025-12-04T12:35:04.9224967Z 
2025-12-04T12:35:04.9225060Z stderr:
2025-12-04T12:35:04.9225943Z ptxas /tmp/tmpb0ttpte1/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal   : Unsupported .version 8.7; current version is '8.4'
2025-12-04T12:35:04.9226112Z ptxas fatal   : Ptx assembly aborted due to errors
2025-12-04T12:35:04.9226117Z 
2025-12-04T12:35:04.9226351Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:35:04.9227203Z inductor [('async_compile_cache_miss', 2), ('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_hit', 1)]
2025-12-04T12:35:04.9227310Z graph_break []
2025-12-04T12:35:04.9227548Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:35:04.9228361Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T12:35:04.9228490Z   return cls.__new__(cls, *args)
2025-12-04T12:35:04.9230086Z nvcc -fatbin /tmp/tmp7iehhac5/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmp7iehhac5/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75  failed with:
2025-12-04T12:35:04.9230181Z stdout:
2025-12-04T12:35:04.9230203Z 
2025-12-04T12:35:04.9230294Z stderr:
2025-12-04T12:35:04.9231133Z ptxas /tmp/tmp7iehhac5/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal   : Unsupported .version 8.7; current version is '8.4'
2025-12-04T12:35:04.9231362Z ptxas fatal   : Ptx assembly aborted due to errors
2025-12-04T12:35:04.9231367Z 
2025-12-04T12:35:04.9232200Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-b1ca468dab29d0d8.xml -
2025-12-04T12:35:04.9232377Z =========================== short test summary info ============================
2025-12-04T12:35:04.9235036Z FAILED [0.5841s] inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos - torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmp7iehhac5/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmp7iehhac5/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255.
2025-12-04T12:35:04.9235078Z 
2025-12-04T12:35:04.9235310Z To execute this test, run the following from the base repo dir:
2025-12-04T12:35:04.9235939Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos
2025-12-04T12:35:04.9235945Z 
2025-12-04T12:35:04.9236211Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:35:04.9236408Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:35:04.9236637Z ======== 1 failed, 46 passed, 25 skipped, 2 rerun in 370.49s (0:06:10) =========
2025-12-04T12:35:04.9236738Z Got exit code 1
2025-12-04T12:35:04.9236862Z Retrying single test...
2025-12-04T12:35:04.9237515Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-69f64b5320fd797d.xml
2025-12-04T12:35:04.9237700Z ============================= test session starts ==============================
2025-12-04T12:35:04.9238059Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:35:04.9238170Z cachedir: .pytest_cache
2025-12-04T12:35:04.9238709Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:35:04.9238839Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:35:04.9238979Z configfile: pytest.ini
2025-12-04T12:35:04.9239585Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:35:04.9239799Z collecting ... collected 88 items / 87 deselected / 1 selected
2025-12-04T12:35:04.9240557Z stepcurrent: skipping 71 already run items. Running only test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos
2025-12-04T12:35:04.9240678Z Running 1 items in this shard
2025-12-04T12:35:04.9240683Z 
2025-12-04T12:35:04.9241909Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos W1204 12:32:24.269000 147773 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True.
2025-12-04T12:35:04.9242061Z ('RERUN', {'yellow': True}) [5.5651s] [100%]
2025-12-04T12:35:04.9243271Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos W1204 12:32:28.137000 147773 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True.
2025-12-04T12:35:04.9243417Z ('RERUN', {'yellow': True}) [0.5711s] [100%]
2025-12-04T12:35:04.9244637Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos W1204 12:32:28.710000 147773 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True.
2025-12-04T12:35:04.9244795Z FAILED [0.5715s] [100%]
2025-12-04T12:35:04.9244801Z 
2025-12-04T12:35:04.9244947Z ==================================== RERUNS ====================================
2025-12-04T12:35:04.9245245Z __________ TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos __________
2025-12-04T12:35:04.9245380Z Traceback (most recent call last):
2025-12-04T12:35:04.9245916Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 452, in test_compile_standalone_cos
2025-12-04T12:35:04.9246046Z     build_path, _ = self.cmake_compile(
2025-12-04T12:35:04.9246519Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 179, in cmake_compile
2025-12-04T12:35:04.9246722Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T12:35:04.9247259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T12:35:04.9247393Z     return aot_inductor_minifier_wrapper(
2025-12-04T12:35:04.9247967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T12:35:04.9248077Z     raise e
2025-12-04T12:35:04.9248611Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T12:35:04.9248723Z     return func(
2025-12-04T12:35:04.9249273Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T12:35:04.9249502Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T12:35:04.9249977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T12:35:04.9250092Z     return compile_fx_aot(
2025-12-04T12:35:04.9250584Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T12:35:04.9250724Z     compiled_artifacts = compile_fx(
2025-12-04T12:35:04.9251192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T12:35:04.9251313Z     return compile_fx(
2025-12-04T12:35:04.9251808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T12:35:04.9251946Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T12:35:04.9252532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T12:35:04.9252647Z     return _compile_fx_main(
2025-12-04T12:35:04.9253180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T12:35:04.9253398Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T12:35:04.9253918Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T12:35:04.9254079Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T12:35:04.9254580Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T12:35:04.9254696Z     return compile_fx_forward(
2025-12-04T12:35:04.9255225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T12:35:04.9255336Z     return inner_compile(
2025-12-04T12:35:04.9255628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T12:35:04.9255744Z     return func(*args, **kwds)
2025-12-04T12:35:04.9256236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T12:35:04.9256631Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T12:35:04.9257124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T12:35:04.9257302Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T12:35:04.9257817Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T12:35:04.9258014Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:35:04.9258528Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T12:35:04.9258676Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:35:04.9259210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:35:04.9259552Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:35:04.9260110Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1519, in codegen_and_compile
2025-12-04T12:35:04.9260266Z     compiled_fn = AotCodeCompiler.compile(
2025-12-04T12:35:04.9260717Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2409, in compile
2025-12-04T12:35:04.9260821Z     subprocess.run(
2025-12-04T12:35:04.9261110Z   File "/opt/conda/envs/py_3.10/lib/python3.10/subprocess.py", line 526, in run
2025-12-04T12:35:04.9261279Z     raise CalledProcessError(retcode, process.args,
2025-12-04T12:35:04.9263353Z torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255.
2025-12-04T12:35:04.9263364Z 
2025-12-04T12:35:04.9263582Z To execute this test, run the following from the base repo dir:
2025-12-04T12:35:04.9264212Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos
2025-12-04T12:35:04.9264218Z 
2025-12-04T12:35:04.9264535Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:35:04.9264760Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:35:04.9265457Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)]
2025-12-04T12:35:04.9265585Z graph_break []
2025-12-04T12:35:04.9265804Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:35:04.9266639Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T12:35:04.9266757Z   return cls.__new__(cls, *args)
2025-12-04T12:35:04.9268377Z nvcc -fatbin /tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75  failed with:
2025-12-04T12:35:04.9268472Z stdout:
2025-12-04T12:35:04.9268478Z 
2025-12-04T12:35:04.9268565Z stderr:
2025-12-04T12:35:04.9269418Z ptxas /tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal   : Unsupported .version 8.7; current version is '8.4'
2025-12-04T12:35:04.9269584Z ptxas fatal   : Ptx assembly aborted due to errors
2025-12-04T12:35:04.9269628Z 
2025-12-04T12:35:04.9269943Z __________ TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos __________
2025-12-04T12:35:04.9270072Z Traceback (most recent call last):
2025-12-04T12:35:04.9270603Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 452, in test_compile_standalone_cos
2025-12-04T12:35:04.9270743Z     build_path, _ = self.cmake_compile(
2025-12-04T12:35:04.9271487Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 179, in cmake_compile
2025-12-04T12:35:04.9271695Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T12:35:04.9272235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T12:35:04.9272364Z     return aot_inductor_minifier_wrapper(
2025-12-04T12:35:04.9272924Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T12:35:04.9273022Z     raise e
2025-12-04T12:35:04.9273556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T12:35:04.9273752Z     return func(
2025-12-04T12:35:04.9274301Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T12:35:04.9274549Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T12:35:04.9275008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T12:35:04.9275122Z     return compile_fx_aot(
2025-12-04T12:35:04.9275626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T12:35:04.9275753Z     compiled_artifacts = compile_fx(
2025-12-04T12:35:04.9276224Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T12:35:04.9276347Z     return compile_fx(
2025-12-04T12:35:04.9276813Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T12:35:04.9276964Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T12:35:04.9277532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T12:35:04.9277647Z     return _compile_fx_main(
2025-12-04T12:35:04.9278231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T12:35:04.9278434Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T12:35:04.9279008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T12:35:04.9279158Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T12:35:04.9279661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T12:35:04.9279792Z     return compile_fx_forward(
2025-12-04T12:35:04.9280304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T12:35:04.9280413Z     return inner_compile(
2025-12-04T12:35:04.9280705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T12:35:04.9280823Z     return func(*args, **kwds)
2025-12-04T12:35:04.9281329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T12:35:04.9281593Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T12:35:04.9282087Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T12:35:04.9282343Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T12:35:04.9282842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T12:35:04.9283038Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:35:04.9283548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T12:35:04.9283695Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:35:04.9284242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:35:04.9284561Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:35:04.9285082Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1519, in codegen_and_compile
2025-12-04T12:35:04.9285244Z     compiled_fn = AotCodeCompiler.compile(
2025-12-04T12:35:04.9285697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2409, in compile
2025-12-04T12:35:04.9285854Z     subprocess.run(
2025-12-04T12:35:04.9286133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/subprocess.py", line 526, in run
2025-12-04T12:35:04.9286301Z     raise CalledProcessError(retcode, process.args,
2025-12-04T12:35:04.9288384Z torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmp690r3gye/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmp690r3gye/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255.
2025-12-04T12:35:04.9288392Z 
2025-12-04T12:35:04.9288614Z To execute this test, run the following from the base repo dir:
2025-12-04T12:35:04.9289265Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos
2025-12-04T12:35:04.9289274Z 
2025-12-04T12:35:04.9289542Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:35:04.9289766Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:35:04.9290510Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)]
2025-12-04T12:35:04.9290615Z graph_break []
2025-12-04T12:35:04.9290848Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:35:04.9291700Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T12:35:04.9291820Z   return cls.__new__(cls, *args)
2025-12-04T12:35:04.9293436Z nvcc -fatbin /tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75  failed with:
2025-12-04T12:35:04.9293534Z stdout:
2025-12-04T12:35:04.9293540Z 
2025-12-04T12:35:04.9293647Z stderr:
2025-12-04T12:35:04.9294491Z ptxas /tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal   : Unsupported .version 8.7; current version is '8.4'
2025-12-04T12:35:04.9294658Z ptxas fatal   : Ptx assembly aborted due to errors
2025-12-04T12:35:04.9294664Z 
2025-12-04T12:35:04.9294902Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:35:04.9295586Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)]
2025-12-04T12:35:04.9295738Z graph_break []
2025-12-04T12:35:04.9295956Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:35:04.9296843Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T12:35:04.9296977Z   return cls.__new__(cls, *args)
2025-12-04T12:35:04.9298564Z nvcc -fatbin /tmp/tmp690r3gye/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmp690r3gye/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75  failed with:
2025-12-04T12:35:04.9298673Z stdout:
2025-12-04T12:35:04.9298678Z 
2025-12-04T12:35:04.9298771Z stderr:
2025-12-04T12:35:04.9299612Z ptxas /tmp/tmp690r3gye/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal   : Unsupported .version 8.7; current version is '8.4'
2025-12-04T12:35:04.9299800Z ptxas fatal   : Ptx assembly aborted due to errors
2025-12-04T12:35:04.9299845Z 
2025-12-04T12:35:04.9299994Z =================================== FAILURES ===================================
2025-12-04T12:35:04.9300307Z __________ TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos __________
2025-12-04T12:35:04.9300434Z Traceback (most recent call last):
2025-12-04T12:35:04.9300968Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 452, in test_compile_standalone_cos
2025-12-04T12:35:04.9301109Z     build_path, _ = self.cmake_compile(
2025-12-04T12:35:04.9301571Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 179, in cmake_compile
2025-12-04T12:35:04.9301772Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T12:35:04.9302319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T12:35:04.9302453Z     return aot_inductor_minifier_wrapper(
2025-12-04T12:35:04.9303010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T12:35:04.9303101Z     raise e
2025-12-04T12:35:04.9303638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T12:35:04.9303751Z     return func(
2025-12-04T12:35:04.9304328Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T12:35:04.9304572Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T12:35:04.9305058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T12:35:04.9305173Z     return compile_fx_aot(
2025-12-04T12:35:04.9305675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T12:35:04.9305800Z     compiled_artifacts = compile_fx(
2025-12-04T12:35:04.9306264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T12:35:04.9306382Z     return compile_fx(
2025-12-04T12:35:04.9306846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T12:35:04.9306998Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T12:35:04.9307568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T12:35:04.9307682Z     return _compile_fx_main(
2025-12-04T12:35:04.9308198Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T12:35:04.9308396Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T12:35:04.9308949Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T12:35:04.9309111Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T12:35:04.9309615Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T12:35:04.9309743Z     return compile_fx_forward(
2025-12-04T12:35:04.9310257Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T12:35:04.9310368Z     return inner_compile(
2025-12-04T12:35:04.9310662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T12:35:04.9310772Z     return func(*args, **kwds)
2025-12-04T12:35:04.9311280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T12:35:04.9311549Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T12:35:04.9312068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T12:35:04.9312256Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T12:35:04.9312756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T12:35:04.9312951Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:35:04.9313468Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T12:35:04.9313614Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:35:04.9314157Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:35:04.9314478Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:35:04.9314994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1519, in codegen_and_compile
2025-12-04T12:35:04.9315151Z     compiled_fn = AotCodeCompiler.compile(
2025-12-04T12:35:04.9315600Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2409, in compile
2025-12-04T12:35:04.9315717Z     subprocess.run(
2025-12-04T12:35:04.9316021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/subprocess.py", line 526, in run
2025-12-04T12:35:04.9316191Z     raise CalledProcessError(retcode, process.args,
2025-12-04T12:35:04.9318284Z torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmpw2ggbbe6/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmpw2ggbbe6/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255.
2025-12-04T12:35:04.9318295Z 
2025-12-04T12:35:04.9318512Z To execute this test, run the following from the base repo dir:
2025-12-04T12:35:04.9319158Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos
2025-12-04T12:35:04.9319164Z 
2025-12-04T12:35:04.9319431Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:35:04.9319652Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:35:04.9320349Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)]
2025-12-04T12:35:04.9320449Z graph_break []
2025-12-04T12:35:04.9320683Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:35:04.9321502Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T12:35:04.9321653Z   return cls.__new__(cls, *args)
2025-12-04T12:35:04.9323271Z nvcc -fatbin /tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75  failed with:
2025-12-04T12:35:04.9323367Z stdout:
2025-12-04T12:35:04.9323372Z 
2025-12-04T12:35:04.9323479Z stderr:
2025-12-04T12:35:04.9324315Z ptxas /tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal   : Unsupported .version 8.7; current version is '8.4'
2025-12-04T12:35:04.9324481Z ptxas fatal   : Ptx assembly aborted due to errors
2025-12-04T12:35:04.9324487Z 
2025-12-04T12:35:04.9324722Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:35:04.9325396Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)]
2025-12-04T12:35:04.9325558Z graph_break []
2025-12-04T12:35:04.9325777Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:35:04.9326589Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T12:35:04.9326719Z   return cls.__new__(cls, *args)
2025-12-04T12:35:04.9328321Z nvcc -fatbin /tmp/tmp690r3gye/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmp690r3gye/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75  failed with:
2025-12-04T12:35:04.9328427Z stdout:
2025-12-04T12:35:04.9328432Z 
2025-12-04T12:35:04.9328527Z stderr:
2025-12-04T12:35:04.9329363Z ptxas /tmp/tmp690r3gye/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal   : Unsupported .version 8.7; current version is '8.4'
2025-12-04T12:35:04.9329538Z ptxas fatal   : Ptx assembly aborted due to errors
2025-12-04T12:35:04.9329543Z 
2025-12-04T12:35:04.9329758Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:35:04.9330478Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)]
2025-12-04T12:35:04.9330579Z graph_break []
2025-12-04T12:35:04.9330795Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:35:04.9331648Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T12:35:04.9331766Z   return cls.__new__(cls, *args)
2025-12-04T12:35:04.9333370Z nvcc -fatbin /tmp/tmpw2ggbbe6/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpw2ggbbe6/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75  failed with:
2025-12-04T12:35:04.9333463Z stdout:
2025-12-04T12:35:04.9333468Z 
2025-12-04T12:35:04.9333558Z stderr:
2025-12-04T12:35:04.9334407Z ptxas /tmp/tmpw2ggbbe6/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal   : Unsupported .version 8.7; current version is '8.4'
2025-12-04T12:35:04.9334571Z ptxas fatal   : Ptx assembly aborted due to errors
2025-12-04T12:35:04.9334576Z 
2025-12-04T12:35:04.9335415Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-69f64b5320fd797d.xml -
2025-12-04T12:35:04.9335628Z =========================== short test summary info ============================
2025-12-04T12:35:04.9338389Z FAILED [0.5715s] inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos - torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmpw2ggbbe6/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmpw2ggbbe6/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255.
2025-12-04T12:35:04.9338397Z 
2025-12-04T12:35:04.9338617Z To execute this test, run the following from the base repo dir:
2025-12-04T12:35:04.9339253Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos
2025-12-04T12:35:04.9339272Z 
2025-12-04T12:35:04.9339540Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:35:04.9339760Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:35:04.9339974Z ================== 1 failed, 87 deselected, 2 rerun in 6.75s ===================
2025-12-04T12:35:04.9340077Z Got exit code 1
2025-12-04T12:35:04.9340187Z Retrying single test...
2025-12-04T12:35:04.9340850Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-e41e403fca9b1188.xml
2025-12-04T12:35:04.9341014Z ============================= test session starts ==============================
2025-12-04T12:35:04.9341377Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:35:04.9341491Z cachedir: .pytest_cache
2025-12-04T12:35:04.9342013Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:35:04.9342145Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:35:04.9342260Z configfile: pytest.ini
2025-12-04T12:35:04.9342850Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:35:04.9343073Z collecting ... collected 88 items / 87 deselected / 1 selected
2025-12-04T12:35:04.9343820Z stepcurrent: skipping 71 already run items. Running only test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos
2025-12-04T12:35:04.9343948Z Running 1 items in this shard
2025-12-04T12:35:04.9343953Z 
2025-12-04T12:35:04.9345214Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos W1204 12:32:44.069000 148188 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True.
2025-12-04T12:35:04.9345350Z ('RERUN', {'yellow': True}) [5.6710s] [100%]
2025-12-04T12:35:04.9346583Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos W1204 12:32:48.031000 148188 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True.
2025-12-04T12:35:04.9346714Z ('RERUN', {'yellow': True}) [0.5973s] [100%]
2025-12-04T12:35:04.9347936Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos W1204 12:32:48.630000 148188 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True.
2025-12-04T12:35:04.9348038Z FAILED [0.5838s] [100%]
2025-12-04T12:35:04.9348044Z 
2025-12-04T12:35:04.9348199Z ==================================== RERUNS ====================================
2025-12-04T12:35:04.9348497Z __________ TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos __________
2025-12-04T12:35:04.9348657Z Traceback (most recent call last):
2025-12-04T12:35:04.9349204Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 452, in test_compile_standalone_cos
2025-12-04T12:35:04.9349334Z     build_path, _ = self.cmake_compile(
2025-12-04T12:35:04.9349786Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 179, in cmake_compile
2025-12-04T12:35:04.9349998Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T12:35:04.9350530Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T12:35:04.9350671Z     return aot_inductor_minifier_wrapper(
2025-12-04T12:35:04.9351218Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T12:35:04.9351310Z     raise e
2025-12-04T12:35:04.9351861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T12:35:04.9351991Z     return func(
2025-12-04T12:35:04.9352538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T12:35:04.9352784Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T12:35:04.9353239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T12:35:04.9353366Z     return compile_fx_aot(
2025-12-04T12:35:04.9353857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T12:35:04.9353979Z     compiled_artifacts = compile_fx(
2025-12-04T12:35:04.9354463Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T12:35:04.9354568Z     return compile_fx(
2025-12-04T12:35:04.9355045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T12:35:04.9355182Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T12:35:04.9355751Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T12:35:04.9355876Z     return _compile_fx_main(
2025-12-04T12:35:04.9356405Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T12:35:04.9356607Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T12:35:04.9357140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T12:35:04.9357319Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T12:35:04.9357837Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T12:35:04.9357954Z     return compile_fx_forward(
2025-12-04T12:35:04.9358471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T12:35:04.9358595Z     return inner_compile(
2025-12-04T12:35:04.9358876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T12:35:04.9359003Z     return func(*args, **kwds)
2025-12-04T12:35:04.9359514Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T12:35:04.9359781Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T12:35:04.9360291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T12:35:04.9360467Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T12:35:04.9361007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T12:35:04.9361217Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:35:04.9361720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T12:35:04.9361882Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:35:04.9362418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:35:04.9362742Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:35:04.9363279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1519, in codegen_and_compile
2025-12-04T12:35:04.9363427Z     compiled_fn = AotCodeCompiler.compile(
2025-12-04T12:35:04.9363883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2409, in compile
2025-12-04T12:35:04.9364009Z     subprocess.run(
2025-12-04T12:35:04.9364413Z   File "/opt/conda/envs/py_3.10/lib/python3.10/subprocess.py", line 526, in run
2025-12-04T12:35:04.9364595Z     raise CalledProcessError(retcode, process.args,
2025-12-04T12:35:04.9366669Z torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255.
2025-12-04T12:35:04.9366676Z 
2025-12-04T12:35:04.9366914Z To execute this test, run the following from the base repo dir:
2025-12-04T12:35:04.9367551Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos
2025-12-04T12:35:04.9367559Z 
2025-12-04T12:35:04.9367831Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:35:04.9368075Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:35:04.9368759Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)]
2025-12-04T12:35:04.9368877Z graph_break []
2025-12-04T12:35:04.9369130Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:35:04.9369953Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T12:35:04.9370123Z   return cls.__new__(cls, *args)
2025-12-04T12:35:04.9371910Z nvcc -fatbin /tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75  failed with:
2025-12-04T12:35:04.9372027Z stdout:
2025-12-04T12:35:04.9372032Z 
2025-12-04T12:35:04.9372126Z stderr:
2025-12-04T12:35:04.9372970Z ptxas /tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal   : Unsupported .version 8.7; current version is '8.4'
2025-12-04T12:35:04.9373158Z ptxas fatal   : Ptx assembly aborted due to errors
2025-12-04T12:35:04.9373163Z 
2025-12-04T12:35:04.9373466Z __________ TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos __________
2025-12-04T12:35:04.9373608Z Traceback (most recent call last):
2025-12-04T12:35:04.9374149Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 452, in test_compile_standalone_cos
2025-12-04T12:35:04.9374279Z     build_path, _ = self.cmake_compile(
2025-12-04T12:35:04.9374847Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 179, in cmake_compile
2025-12-04T12:35:04.9375055Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T12:35:04.9375585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T12:35:04.9375732Z     return aot_inductor_minifier_wrapper(
2025-12-04T12:35:04.9376280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T12:35:04.9376464Z     raise e
2025-12-04T12:35:04.9377004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T12:35:04.9377102Z     return func(
2025-12-04T12:35:04.9377667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T12:35:04.9377903Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T12:35:04.9378412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T12:35:04.9378540Z     return compile_fx_aot(
2025-12-04T12:35:04.9379030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T12:35:04.9379168Z     compiled_artifacts = compile_fx(
2025-12-04T12:35:04.9379639Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T12:35:04.9379746Z     return compile_fx(
2025-12-04T12:35:04.9380227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T12:35:04.9380365Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T12:35:04.9380955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T12:35:04.9381076Z     return _compile_fx_main(
2025-12-04T12:35:04.9381579Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T12:35:04.9381796Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T12:35:04.9382322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T12:35:04.9382673Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T12:35:04.9383191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T12:35:04.9383309Z     return compile_fx_forward(
2025-12-04T12:35:04.9383879Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T12:35:04.9383995Z     return inner_compile(
2025-12-04T12:35:04.9384277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T12:35:04.9384411Z     return func(*args, **kwds)
2025-12-04T12:35:04.9384903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T12:35:04.9385171Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T12:35:04.9385677Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T12:35:04.9385852Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T12:35:04.9386368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T12:35:04.9386565Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:35:04.9387063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T12:35:04.9387261Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:35:04.9387794Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:35:04.9388129Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:35:04.9388653Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1519, in codegen_and_compile
2025-12-04T12:35:04.9388797Z     compiled_fn = AotCodeCompiler.compile(
2025-12-04T12:35:04.9389263Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2409, in compile
2025-12-04T12:35:04.9389369Z     subprocess.run(
2025-12-04T12:35:04.9389645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/subprocess.py", line 526, in run
2025-12-04T12:35:04.9389826Z     raise CalledProcessError(retcode, process.args,
2025-12-04T12:35:04.9391892Z torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmp0mbifsnb/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmp0mbifsnb/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255.
2025-12-04T12:35:04.9391932Z 
2025-12-04T12:35:04.9392170Z To execute this test, run the following from the base repo dir:
2025-12-04T12:35:04.9392800Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos
2025-12-04T12:35:04.9392806Z 
2025-12-04T12:35:04.9393089Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:35:04.9393314Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:35:04.9393998Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)]
2025-12-04T12:35:04.9394117Z graph_break []
2025-12-04T12:35:04.9394337Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:35:04.9395163Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T12:35:04.9395313Z   return cls.__new__(cls, *args)
2025-12-04T12:35:04.9396944Z nvcc -fatbin /tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75  failed with:
2025-12-04T12:35:04.9397056Z stdout:
2025-12-04T12:35:04.9397062Z 
2025-12-04T12:35:04.9397161Z stderr:
2025-12-04T12:35:04.9398016Z ptxas /tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal   : Unsupported .version 8.7; current version is '8.4'
2025-12-04T12:35:04.9398185Z ptxas fatal   : Ptx assembly aborted due to errors
2025-12-04T12:35:04.9398190Z 
2025-12-04T12:35:04.9398410Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:35:04.9399107Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)]
2025-12-04T12:35:04.9399210Z graph_break []
2025-12-04T12:35:04.9399441Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:35:04.9400256Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T12:35:04.9400409Z   return cls.__new__(cls, *args)
2025-12-04T12:35:04.9402022Z nvcc -fatbin /tmp/tmp0mbifsnb/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmp0mbifsnb/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75  failed with:
2025-12-04T12:35:04.9402119Z stdout:
2025-12-04T12:35:04.9402124Z 
2025-12-04T12:35:04.9402231Z stderr:
2025-12-04T12:35:04.9403073Z ptxas /tmp/tmp0mbifsnb/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal   : Unsupported .version 8.7; current version is '8.4'
2025-12-04T12:35:04.9403239Z ptxas fatal   : Ptx assembly aborted due to errors
2025-12-04T12:35:04.9403244Z 
2025-12-04T12:35:04.9403406Z =================================== FAILURES ===================================
2025-12-04T12:35:04.9403706Z __________ TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos __________
2025-12-04T12:35:04.9403846Z Traceback (most recent call last):
2025-12-04T12:35:04.9404384Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 452, in test_compile_standalone_cos
2025-12-04T12:35:04.9404546Z     build_path, _ = self.cmake_compile(
2025-12-04T12:35:04.9405017Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 179, in cmake_compile
2025-12-04T12:35:04.9405220Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T12:35:04.9405745Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T12:35:04.9405889Z     return aot_inductor_minifier_wrapper(
2025-12-04T12:35:04.9406429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T12:35:04.9406538Z     raise e
2025-12-04T12:35:04.9407080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T12:35:04.9407180Z     return func(
2025-12-04T12:35:04.9407744Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T12:35:04.9407976Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T12:35:04.9408436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T12:35:04.9408593Z     return compile_fx_aot(
2025-12-04T12:35:04.9409088Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T12:35:04.9409225Z     compiled_artifacts = compile_fx(
2025-12-04T12:35:04.9409724Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T12:35:04.9409832Z     return compile_fx(
2025-12-04T12:35:04.9410314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T12:35:04.9410453Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T12:35:04.9411036Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T12:35:04.9411151Z     return _compile_fx_main(
2025-12-04T12:35:04.9411653Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T12:35:04.9411865Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T12:35:04.9412384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T12:35:04.9412535Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T12:35:04.9413052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T12:35:04.9413224Z     return compile_fx_forward(
2025-12-04T12:35:04.9413748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T12:35:04.9413860Z     return inner_compile(
2025-12-04T12:35:04.9414140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T12:35:04.9414266Z     return func(*args, **kwds)
2025-12-04T12:35:04.9414762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T12:35:04.9415024Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T12:35:04.9415525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T12:35:04.9415701Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T12:35:04.9416218Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T12:35:04.9416505Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T12:35:04.9417049Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T12:35:04.9417209Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T12:35:04.9417740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T12:35:04.9418078Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T12:35:04.9418598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1519, in codegen_and_compile
2025-12-04T12:35:04.9418739Z     compiled_fn = AotCodeCompiler.compile(
2025-12-04T12:35:04.9419206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2409, in compile
2025-12-04T12:35:04.9419316Z     subprocess.run(
2025-12-04T12:35:04.9419590Z   File "/opt/conda/envs/py_3.10/lib/python3.10/subprocess.py", line 526, in run
2025-12-04T12:35:04.9419776Z     raise CalledProcessError(retcode, process.args,
2025-12-04T12:35:04.9421875Z torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmpgm40r8lx/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmpgm40r8lx/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255.
2025-12-04T12:35:04.9421883Z 
2025-12-04T12:35:04.9422124Z To execute this test, run the following from the base repo dir:
2025-12-04T12:35:04.9422810Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos
2025-12-04T12:35:04.9422819Z 
2025-12-04T12:35:04.9423104Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:35:04.9423329Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:35:04.9424012Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)]
2025-12-04T12:35:04.9424127Z graph_break []
2025-12-04T12:35:04.9424348Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:35:04.9425182Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T12:35:04.9425300Z   return cls.__new__(cls, *args)
2025-12-04T12:35:04.9426903Z nvcc -fatbin /tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75  failed with:
2025-12-04T12:35:04.9427046Z stdout:
2025-12-04T12:35:04.9427051Z 
2025-12-04T12:35:04.9427146Z stderr:
2025-12-04T12:35:04.9427997Z ptxas /tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal   : Unsupported .version 8.7; current version is '8.4'
2025-12-04T12:35:04.9428162Z ptxas fatal   : Ptx assembly aborted due to errors
2025-12-04T12:35:04.9428169Z 
2025-12-04T12:35:04.9428388Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:35:04.9429083Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)]
2025-12-04T12:35:04.9429186Z graph_break []
2025-12-04T12:35:04.9429416Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:35:04.9430233Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T12:35:04.9430384Z   return cls.__new__(cls, *args)
2025-12-04T12:35:04.9431990Z nvcc -fatbin /tmp/tmp0mbifsnb/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmp0mbifsnb/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75  failed with:
2025-12-04T12:35:04.9432084Z stdout:
2025-12-04T12:35:04.9432090Z 
2025-12-04T12:35:04.9432195Z stderr:
2025-12-04T12:35:04.9433034Z ptxas /tmp/tmp0mbifsnb/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal   : Unsupported .version 8.7; current version is '8.4'
2025-12-04T12:35:04.9433200Z ptxas fatal   : Ptx assembly aborted due to errors
2025-12-04T12:35:04.9433208Z 
2025-12-04T12:35:04.9433443Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:35:04.9434129Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)]
2025-12-04T12:35:04.9434246Z graph_break []
2025-12-04T12:35:04.9434464Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:35:04.9435309Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T12:35:04.9435443Z   return cls.__new__(cls, *args)
2025-12-04T12:35:04.9437065Z nvcc -fatbin /tmp/tmpgm40r8lx/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpgm40r8lx/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75  failed with:
2025-12-04T12:35:04.9437175Z stdout:
2025-12-04T12:35:04.9437182Z 
2025-12-04T12:35:04.9437276Z stderr:
2025-12-04T12:35:04.9438115Z ptxas /tmp/tmpgm40r8lx/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal   : Unsupported .version 8.7; current version is '8.4'
2025-12-04T12:35:04.9438296Z ptxas fatal   : Ptx assembly aborted due to errors
2025-12-04T12:35:04.9438302Z 
2025-12-04T12:35:04.9439134Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-e41e403fca9b1188.xml -
2025-12-04T12:35:04.9439326Z =========================== short test summary info ============================
2025-12-04T12:35:04.9442005Z FAILED [0.5838s] inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos - torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmpgm40r8lx/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmpgm40r8lx/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255.
2025-12-04T12:35:04.9442048Z 
2025-12-04T12:35:04.9442290Z To execute this test, run the following from the base repo dir:
2025-12-04T12:35:04.9442928Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos
2025-12-04T12:35:04.9442934Z 
2025-12-04T12:35:04.9443205Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:35:04.9443405Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:35:04.9443610Z ================== 1 failed, 87 deselected, 2 rerun in 6.89s ===================
2025-12-04T12:35:04.9443732Z Got exit code 1
2025-12-04T12:35:04.9444294Z FAILED CONSISTENTLY: test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos
2025-12-04T12:35:04.9444736Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:35:04.9445399Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-07f86488d4cce1d3.xml
2025-12-04T12:35:04.9445570Z ============================= test session starts ==============================
2025-12-04T12:35:04.9445923Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:35:04.9446055Z cachedir: .pytest_cache
2025-12-04T12:35:04.9446579Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:35:04.9446724Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:35:04.9446837Z configfile: pytest.ini
2025-12-04T12:35:04.9447423Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:35:04.9447664Z collecting ... collected 88 items / 72 deselected / 16 selected
2025-12-04T12:35:04.9447811Z stepcurrent: skipping 72 already run items.
2025-12-04T12:35:04.9447930Z Running 16 items in this shard
2025-12-04T12:35:04.9447949Z 
2025-12-04T12:35:04.9448689Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_with_exporter SKIPPED [0.0004s] (Test is only supported on CUDA 12.6+) [  6%]
2025-12-04T12:35:04.9449430Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_with_exporter_weights SKIPPED [0.0003s] (Test is only supported on CUDA 12.6+) [ 12%]
2025-12-04T12:35:04.9450699Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_deepcopy_compiled_model W1204 12:33:12.749000 148603 site-packages/torch/export/pt2_archive/_package.py:763] AOTICompiledModel deepcopy warning: AOTICompiledModel.loader is not deepcopied.
2025-12-04T12:35:04.9450813Z PASSED [10.5333s] [ 18%]
2025-12-04T12:35:04.9451324Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_duplicate_calls PASSED [22.4138s] [ 25%]
2025-12-04T12:35:04.9452202Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_linear W1204 12:33:36.177000 148603 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T12:35:04.9452318Z PASSED [10.8322s] [ 31%]
2025-12-04T12:35:04.9453393Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_loading_wrong_model W1204 12:33:52.011000 148603 site-packages/torch/_inductor/package/package.py:120] Loading outdated pt2 file. Please regenerate your package.
2025-12-04T12:35:04.9453503Z PASSED [6.0079s] [ 37%]
2025-12-04T12:35:04.9453978Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_metadata PASSED [9.9375s] [ 43%]
2025-12-04T12:35:04.9454510Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_multiple_methods PASSED [37.7687s] [ 50%]
2025-12-04T12:35:04.9455163Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_package_shared_weights SKIPPED [0.0033s] (No support for cpp only) [ 56%]
2025-12-04T12:35:04.9455850Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_package_user_managed_weight SKIPPED [0.0030s] (No support for cpp only) [ 62%]
2025-12-04T12:35:04.9456630Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_package_weights_on_disk_nested_module SKIPPED [0.0030s] (No support for cpp only) [ 68%]
2025-12-04T12:35:04.9457309Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_package_without_weight SKIPPED [0.0029s] (No support for cpp only) [ 75%]
2025-12-04T12:35:04.9457843Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_remove_intermediate_files PASSED [6.0768s] [ 81%]
2025-12-04T12:35:04.9458313Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_save_buffer PASSED [6.1722s] [ 87%]
2025-12-04T12:35:04.9458878Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_specified_output_dir PASSED [9.9551s] [ 93%]
2025-12-04T12:35:04.9459491Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_update_weights SKIPPED [0.0052s] (No support for cpp only) [100%]
2025-12-04T12:35:04.9459497Z 
2025-12-04T12:35:04.9460344Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-07f86488d4cce1d3.xml -
2025-12-04T12:35:04.9460572Z =========== 9 passed, 7 skipped, 72 deselected in 119.78s (0:01:59) ============
2025-12-04T12:35:04.9461252Z The following tests failed consistently: ['test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos']
2025-12-04T12:35:04.9461260Z 
2025-12-04T12:35:04.9461881Z FINISHED PRINTING LOG FILE of inductor/test_aot_inductor_package 1/1 (test/test-reports/inductor.test_aot_inductor_package_1.1_5509f9f54e762912_.log)
2025-12-04T12:35:04.9461889Z 
2025-12-04T12:35:04.9462281Z Finished inductor/test_aot_inductor_package 1/1 ... [2025-12-04 12:35:04.270661][12132.653543468], took 9.22min
2025-12-04T12:35:04.9463243Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-b1ca468dab29d0d8.xml
2025-12-04T12:35:04.9464123Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-69f64b5320fd797d.xml
2025-12-04T12:35:04.9465041Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-e41e403fca9b1188.xml
2025-12-04T12:35:04.9465914Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-07f86488d4cce1d3.xml
2025-12-04T12:35:05.0701010Z Uploading logs for 57119749248 to S3
2025-12-04T12:35:05.2827341Z Uploading artifacts took 0.81 seconds
2025-12-04T12:35:05.2827852Z inductor/test_aot_inductor_package 1/1 failed!
2025-12-04T12:35:05.2832017Z Running inductor/test_padding 1/1 ... [2025-12-04 12:35:05.283004][12133.665898165]
2025-12-04T12:35:05.2832613Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:35:05.2837081Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_padding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:35:05.283465]
2025-12-04T12:35:59.9338211Z 
2025-12-04T12:35:59.9339368Z inductor/test_padding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_padding_1.1_4d224b6d5f4af5af_.log
2025-12-04T12:35:59.9368708Z Running 55 items in this shard: test/inductor/test_padding.py::PerfTestBetweenGoodAndBadShape::test_BertForMaskedLM, test/inductor/test_padding.py::PerfTestBetweenGoodAndBadShape::test_LinearAndSoftmax_both_shapes, test/inductor/test_padding.py::PerfTestBetweenGoodAndBadShape::test_nobias_LinearAndSoftmax_both_shapes, test/inductor/test_padding.py::PerfTestWithAndWithoutPadding::test_longformer, test/inductor/test_padding.py::PerfTestWithAndWithoutPadding::test_longformer_small_bs, test/inductor/test_padding.py::PerfTestWithAndWithoutPadding::test_nvidia_deeprecommender, test/inductor/test_padding.py::PaddingTest::test_LinearAndSoftmax_codegen, test/inductor/test_padding.py::PaddingTest::test_attention, test/inductor/test_padding.py::PaddingTest::test_cat, test/inductor/test_padding.py::PaddingTest::test_conv, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape0_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape1_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape2_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape3_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape4_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape5_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape6_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape7_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_matmul, test/inductor/test_padding.py::PaddingTest::test_mm_padding_perf, test/inductor/test_padding.py::PaddingTest::test_nobias_LinearAndSoftmax_codegen, test/inductor/test_padding.py::PaddingTest::test_noop_concat_output_padding_shape0_alignment_bytes_32_pad_output_False, test/inductor/test_padding.py::PaddingTest::test_noop_concat_output_padding_shape1_alignment_bytes_32_pad_output_True, test/inductor/test_padding.py::PaddingTest::test_noop_concat_output_padding_shape2_alignment_bytes_64_pad_output_False, test/inductor/test_padding.py::PaddingTest::test_noop_concat_output_padding_shape3_alignment_bytes_64_pad_output_True, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape0_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape1_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape2_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape3_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape4_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape5_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape6_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape7_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_pad_3d_tensor, test/inductor/test_padding.py::PaddingTest::test_pad_channels_last, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_128_shape0_float16, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_128_shape0_float32, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_128_shape1_float16, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_128_shape1_float32, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_32_shape0_float16, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_32_shape0_float32, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_32_shape1_float16, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_32_shape1_float32, test/inductor/test_padding.py::PaddingTest::test_pad_strides, test/inductor/test_padding.py::PaddingTest::test_pad_strides_skip, test/inductor/test_padding.py::PaddingTest::test_padmm, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape0_perm0_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape1_perm1_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape2_perm2_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape3_perm3_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape4_perm4_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape5_perm5_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape6_perm6_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape7_perm7_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_view
2025-12-04T12:35:59.9396284Z 
2025-12-04T12:35:59.9396630Z Finished inductor/test_padding 1/1 ... [2025-12-04 12:35:59.933676][12188.316571745], took 0.91min
2025-12-04T12:35:59.9575457Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_padding/inductor.test_padding-be250a10b53bb058.xml
2025-12-04T12:36:00.0467886Z Running dynamo/test_aot_compile 1/1 ... [2025-12-04 12:36:00.046401][12188.429295275]
2025-12-04T12:36:00.0468480Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:36:00.0471203Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_aot_compile.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:36:00.046858]
2025-12-04T12:37:50.1755930Z 
2025-12-04T12:37:50.1757700Z dynamo/test_aot_compile 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_aot_compile_1.1_232ed44e0e50b87e_.log
2025-12-04T12:37:50.1768687Z Running 25 items in this shard: test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_basic_fn, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_basic_fn_inductor, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_basic_forward, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_disable_guard_check, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_grad_mode_after_prior_compile, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_graph_break_error_fmt, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_module, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_repeat_interleave, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_source_info, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_with_aoti, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_with_aoti_module, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_with_aoti_torch_compile, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_with_checkpoint, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_with_closure_save_and_load, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_with_default_args, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_with_global_tensor, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_with_super_call, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_module_simplified_serializable_autograd, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_module_simplified_serializable_inference, test/dynamo/test_aot_compile.py::TestAOTCompile::test_decorated_function_aot, test/dynamo/test_aot_compile.py::TestAOTCompile::test_decorated_function_with_functools_wrap_aot, test/dynamo/test_aot_compile.py::TestAOTCompile::test_external_refs_validation, test/dynamo/test_aot_compile.py::TestAOTCompile::test_fullgraph_capture_with_pytree_func, test/dynamo/test_aot_compile.py::TestAOTCompile::test_fullgraph_capture_with_pytree_module, test/dynamo/test_aot_compile.py::TestAOTCompile::test_guard_filter_override_aot
2025-12-04T12:37:50.1779725Z 
2025-12-04T12:37:50.1780093Z Finished dynamo/test_aot_compile 1/1 ... [2025-12-04 12:37:50.175336][12298.558230105], took 1.84min
2025-12-04T12:37:50.1993594Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_aot_compile/dynamo.test_aot_compile-10a88b68c9603fe3.xml
2025-12-04T12:37:50.2889723Z Running dynamo/test_sets 1/1 ... [2025-12-04 12:37:50.288629][12298.671523635]
2025-12-04T12:37:50.2890278Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:37:50.2893023Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_sets.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:37:50.289067]
2025-12-04T12:38:06.7776261Z 
2025-12-04T12:38:06.7777447Z dynamo/test_sets 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_sets_1.1_e77962cd1c25fe47_.log
2025-12-04T12:38:06.7815113Z Running 124 items in this shard: test/dynamo/test_sets.py::CustomSetTests::test_custom_add, test/dynamo/test_sets.py::CustomSetTests::test_custom_contains, test/dynamo/test_sets.py::MiscTests::test_isdisjoint_with_generator, test/dynamo/test_sets.py::TestSetGuards::test_in_guard, test/dynamo/test_sets.py::TestSetGuards::test_set_guard_on_keys_change, test/dynamo/test_sets.py::TestSetGuards::test_set_multiple_types, test/dynamo/test_sets.py::TestSetGuards::test_set_recompile_on_key_change, test/dynamo/test_sets.py::TestSetGuards::test_set_recompile_on_key_pop, test/dynamo/test_sets.py::TestSetGuards::test_set_with_function, test/dynamo/test_sets.py::TestSetGuards::test_set_with_tensors, test/dynamo/test_sets.py::FrozensetTests::test_binop_and, test/dynamo/test_sets.py::FrozensetTests::test_binop_or, test/dynamo/test_sets.py::FrozensetTests::test_binop_sub, test/dynamo/test_sets.py::FrozensetTests::test_binop_xor, test/dynamo/test_sets.py::FrozensetTests::test_cmp_eq, test/dynamo/test_sets.py::FrozensetTests::test_cmp_greater_than, test/dynamo/test_sets.py::FrozensetTests::test_cmp_greater_than_or_equal, test/dynamo/test_sets.py::FrozensetTests::test_cmp_less_than, test/dynamo/test_sets.py::FrozensetTests::test_cmp_less_than_or_equal, test/dynamo/test_sets.py::FrozensetTests::test_cmp_ne, test/dynamo/test_sets.py::FrozensetTests::test_constructor_iterable, test/dynamo/test_sets.py::FrozensetTests::test_contains, test/dynamo/test_sets.py::FrozensetTests::test_copy, test/dynamo/test_sets.py::FrozensetTests::test_difference, test/dynamo/test_sets.py::FrozensetTests::test_equality, test/dynamo/test_sets.py::FrozensetTests::test_in_frozenset, test/dynamo/test_sets.py::FrozensetTests::test_intersection, test/dynamo/test_sets.py::FrozensetTests::test_isdisjoint, test/dynamo/test_sets.py::FrozensetTests::test_issubset, test/dynamo/test_sets.py::FrozensetTests::test_issuperset, test/dynamo/test_sets.py::FrozensetTests::test_symmetric_difference, test/dynamo/test_sets.py::FrozensetTests::test_to_frozenset, test/dynamo/test_sets.py::FrozensetTests::test_to_set, test/dynamo/test_sets.py::FrozensetTests::test_union, test/dynamo/test_sets.py::SetTests::test_add, test/dynamo/test_sets.py::SetTests::test_binop_and, test/dynamo/test_sets.py::SetTests::test_binop_or, test/dynamo/test_sets.py::SetTests::test_binop_sub, test/dynamo/test_sets.py::SetTests::test_binop_xor, test/dynamo/test_sets.py::SetTests::test_clear, test/dynamo/test_sets.py::SetTests::test_cmp_eq, test/dynamo/test_sets.py::SetTests::test_cmp_greater_than, test/dynamo/test_sets.py::SetTests::test_cmp_greater_than_or_equal, test/dynamo/test_sets.py::SetTests::test_cmp_less_than, test/dynamo/test_sets.py::SetTests::test_cmp_less_than_or_equal, test/dynamo/test_sets.py::SetTests::test_cmp_ne, test/dynamo/test_sets.py::SetTests::test_constructor_iterable, test/dynamo/test_sets.py::SetTests::test_contains, test/dynamo/test_sets.py::SetTests::test_copy, test/dynamo/test_sets.py::SetTests::test_difference, test/dynamo/test_sets.py::SetTests::test_difference_update, test/dynamo/test_sets.py::SetTests::test_discard, test/dynamo/test_sets.py::SetTests::test_equality, test/dynamo/test_sets.py::SetTests::test_in_frozenset, test/dynamo/test_sets.py::SetTests::test_intersection, test/dynamo/test_sets.py::SetTests::test_intersection_update, test/dynamo/test_sets.py::SetTests::test_isdisjoint, test/dynamo/test_sets.py::SetTests::test_issubset, test/dynamo/test_sets.py::SetTests::test_issuperset, test/dynamo/test_sets.py::SetTests::test_pop, test/dynamo/test_sets.py::SetTests::test_remove, test/dynamo/test_sets.py::SetTests::test_symmetric_difference, test/dynamo/test_sets.py::SetTests::test_symmetric_difference_update, test/dynamo/test_sets.py::SetTests::test_to_frozenset, test/dynamo/test_sets.py::SetTests::test_to_set, test/dynamo/test_sets.py::SetTests::test_union, test/dynamo/test_sets.py::SetTests::test_update, test/dynamo/test_sets.py::UserDefinedSetTests::test_add, test/dynamo/test_sets.py::UserDefinedSetTests::test_binop_and, test/dynamo/test_sets.py::UserDefinedSetTests::test_binop_or, test/dynamo/test_sets.py::UserDefinedSetTests::test_binop_sub, test/dynamo/test_sets.py::UserDefinedSetTests::test_binop_xor, test/dynamo/test_sets.py::UserDefinedSetTests::test_clear, test/dynamo/test_sets.py::UserDefinedSetTests::test_cmp_eq, test/dynamo/test_sets.py::UserDefinedSetTests::test_cmp_greater_than, test/dynamo/test_sets.py::UserDefinedSetTests::test_cmp_greater_than_or_equal, test/dynamo/test_sets.py::UserDefinedSetTests::test_cmp_less_than, test/dynamo/test_sets.py::UserDefinedSetTests::test_cmp_less_than_or_equal, test/dynamo/test_sets.py::UserDefinedSetTests::test_cmp_ne, test/dynamo/test_sets.py::UserDefinedSetTests::test_constructor_iterable, test/dynamo/test_sets.py::UserDefinedSetTests::test_contains, test/dynamo/test_sets.py::UserDefinedSetTests::test_copy, test/dynamo/test_sets.py::UserDefinedSetTests::test_difference, test/dynamo/test_sets.py::UserDefinedSetTests::test_difference_update, test/dynamo/test_sets.py::UserDefinedSetTests::test_discard, test/dynamo/test_sets.py::UserDefinedSetTests::test_equality, test/dynamo/test_sets.py::UserDefinedSetTests::test_in_frozenset, test/dynamo/test_sets.py::UserDefinedSetTests::test_intersection, test/dynamo/test_sets.py::UserDefinedSetTests::test_intersection_update, test/dynamo/test_sets.py::UserDefinedSetTests::test_isdisjoint, test/dynamo/test_sets.py::UserDefinedSetTests::test_issubset, test/dynamo/test_sets.py::UserDefinedSetTests::test_issuperset, test/dynamo/test_sets.py::UserDefinedSetTests::test_pop, test/dynamo/test_sets.py::UserDefinedSetTests::test_remove, test/dynamo/test_sets.py::UserDefinedSetTests::test_symmetric_difference, test/dynamo/test_sets.py::UserDefinedSetTests::test_symmetric_difference_update, test/dynamo/test_sets.py::UserDefinedSetTests::test_to_frozenset, test/dynamo/test_sets.py::UserDefinedSetTests::test_to_set, test/dynamo/test_sets.py::UserDefinedSetTests::test_union, test/dynamo/test_sets.py::UserDefinedSetTests::test_update, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_binop_and, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_binop_or, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_binop_sub, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_binop_xor, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_cmp_eq, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_cmp_greater_than, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_cmp_greater_than_or_equal, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_cmp_less_than, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_cmp_less_than_or_equal, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_cmp_ne, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_constructor_iterable, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_contains, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_copy, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_difference, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_equality, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_in_frozenset, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_intersection, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_isdisjoint, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_issubset, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_issuperset, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_symmetric_difference, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_to_frozenset, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_to_set, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_union
2025-12-04T12:38:06.7852424Z 
2025-12-04T12:38:06.7852721Z Finished dynamo/test_sets 1/1 ... [2025-12-04 12:38:06.777519][12315.160414108], took 0.27min
2025-12-04T12:38:06.8015161Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_sets/dynamo.test_sets-f0cb58e83c4ea8ef.xml
2025-12-04T12:38:06.8806884Z Running dynamo/test_wrap_inductor_compiled_regions 1/1 ... [2025-12-04 12:38:06.880344][12315.263239371]
2025-12-04T12:38:06.8807593Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:38:06.8810082Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_wrap_inductor_compiled_regions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:38:06.880752]
2025-12-04T12:38:40.9615757Z 
2025-12-04T12:38:40.9619305Z dynamo/test_wrap_inductor_compiled_regions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_wrap_inductor_compiled_regions_1.1_1c64e72dd7c0888e_.log
2025-12-04T12:38:40.9633042Z Running 18 items in this shard: test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_with_sac_must_save, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_with_sac_prefer_recompute, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_with_wrapper_basic, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_wrapper_visible_in_debug_mode, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_wrapper_with_backward, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_wrapper_with_cache, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_sac_outer_compile_inner_basic, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_sac_outer_compile_inner_flex_attention, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_config_affects_cache_key, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_default_disabled, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_disabled_not_visible_in_debug_mode, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_enabled_visible_in_debug_mode, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_no_dispatch_mode_no_hop_invoked, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_option_type_validation, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_per_compilation, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_with_backward, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_with_cache, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_with_multiple_ops
2025-12-04T12:38:40.9644384Z 
2025-12-04T12:38:40.9644832Z Finished dynamo/test_wrap_inductor_compiled_regions 1/1 ... [2025-12-04 12:38:40.961336][12349.344232968], took 0.57min
2025-12-04T12:38:40.9853355Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_wrap_inductor_compiled_regions/dynamo.test_wrap_inductor_compiled_regions-2f1d9c362e038030.xml
2025-12-04T12:38:41.0751344Z Running test_sparse 2/2 ... [2025-12-04 12:38:41.074786][12349.457680642]
2025-12-04T12:38:41.0752091Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:38:41.0754571Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_sparse.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:38:41.075224]
2025-12-04T12:44:58.2663314Z 
2025-12-04T12:44:58.2664256Z test_sparse 2/2 was successful, full logs can be found in artifacts with path test/test-reports/test_sparse_2.2_a491ad82f72502f4_.log
2025-12-04T12:44:58.3334850Z Running 1574 items in this shard: test/test_sparse.py::TestSparseLegacyAndDeprecation::test_legacy_warnings, test/test_sparse.py::TestSparseOneOff::test_cuda_sparse_cpu_dense_add, test/test_sparse.py::TestSparseMeta::test_add_meta_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_fake_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_fake_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_fake_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_fake_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_print_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_print_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_print_meta_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_sum_meta_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_to_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_to_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_to_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_to_meta_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseBSR_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_frac_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_uint8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_bfloat16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_float16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_uint8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_bfloat16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_float16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_float64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_uint8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_bool, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_complex128, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_complex64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_float32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_uint8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_bool, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_complex64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_float16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_float64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy__cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy_multi_gpu_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_add_sub_nnz_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_add_zeros_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_basic_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_bmm_oob_cuda, test/test_sparse.py::TestSparseCUDA::test_bmm_windows_error_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_cat_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_cat_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_change_tensor_metadata_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_coalesce_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_coalesce_reference_cycle_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_contig_hybrid_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_contig_hybrid_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_ctor_is_coalesced_with_gradcheck_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_ctor_large_sizes_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_ctor_size_checks_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_cuda_empty_cuda, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_bool, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_empty_like_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_copy_cuda, test/test_sparse.py::TestSparseCUDA::test_factory_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_factory_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_dense_dim_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_dense_dim_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_device_type_inference_cuda, test/test_sparse.py::TestSparseCUDA::test_factory_empty_indices_cuda, test/test_sparse.py::TestSparseCUDA::test_factory_nnz_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_size_check_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_full_broadcast_to_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_full_broadcast_to_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_hsmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_index_select_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_index_select_empty_and_non_contiguous_index_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_index_select_empty_and_non_contiguous_index_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_large_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_small_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_small_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_index_select_parallelization_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_isnan_cuda, test/test_sparse.py::TestSparseCUDA::test_legacy_new_cuda, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_log_softmax_zero_nnz_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_mm_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_narrow_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_negative_indices_cuda, test/test_sparse.py::TestSparseCUDA::test_new_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_new_device_single_gpu_cuda, test/test_sparse.py::TestSparseCUDA::test_norm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_permute_masked_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_permute_masked_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_permute_sparse_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_same_gpu_cuda, test/test_sparse.py::TestSparseCUDA::test_scalar_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_select_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_shared_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_small_nnz_coalesced_cuda, test/test_sparse.py::TestSparseCUDA::test_softmax_zero_nnz_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_spadd_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_bool_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_bool_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_hybrid_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_hybrid_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_bool, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_sspaddmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_t_empty_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_masked_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_zeros_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_zeros_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_zeros_like_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_zeros_like_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_masked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_nonmasked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_nonmasked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_masked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_masked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_nonmasked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_masked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_nonmasked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_nonmasked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_nonmasked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_masked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_masked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_nonmasked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_nonmasked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_masked_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_sparse_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_masked_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_sparse_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_sparse_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_sparse_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_masked_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_sparse_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_masked_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_masked_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_sparse_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_invalid_blocksize_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_Strided_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseCSC_cuda
2025-12-04T12:44:58.3984393Z 
2025-12-04T12:44:58.3984708Z Finished test_sparse 2/2 ... [2025-12-04 12:44:58.268347][12726.651240454], took 6.29min
2025-12-04T12:44:58.3985774Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-598e6683c5cfc22a.xml
2025-12-04T12:44:58.4325994Z Running test_decomp 3/17 ... [2025-12-04 12:44:58.432293][12726.815187852]
2025-12-04T12:44:58.4326484Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:44:58.4329511Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=3', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:44:58.432729]
2025-12-04T12:55:44.6226542Z 
2025-12-04T12:55:44.6227909Z test_decomp 3/17 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_3.17_3a5dd6feb399010e_.log
2025-12-04T12:55:44.6434986Z Running 547 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__softmax_backward_data_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcdiv_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_decomposed_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmv_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_allclose_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bincount_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chalf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_max_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumulative_trapezoid_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumulative_trapezoid_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumulative_trapezoid_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_floor_rounding_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_trunc_rounding_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exponential_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_histc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_imag_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_put_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_put_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_mean_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_int_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvals_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_householder_product_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_inv_ex_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_ex_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_multi_dot_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_multi_dot_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_hermitian_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_slogdet_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_triangular_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svd_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logaddexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logcumsumexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logdet_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_solve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_log_softmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_mean_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_scatter_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_no_dim_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_no_dim_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nansum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_bilinear_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv3d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_gelu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardshrink_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_instance_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_kl_div_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_layer_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multilabel_margin_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_prelu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rms_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softshrink_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_fro_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_in_place_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_number_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ormqr_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ormqr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pinverse_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rand_like_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_renorm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_renorm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_conj_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signbit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_entr_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1e_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_zeta_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_svd_lowrank_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_uint16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_unbiased_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_decomposed_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_amax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_amax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_arange_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_baddbmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_and_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_cauchy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_complex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_mv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_rsub_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_select_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_squeeze_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_unfold_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_digamma_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_floor_divide_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_floor_divide_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_geometric_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_grid_sampler_2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_heaviside_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_i0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_lerp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_lerp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_vector_norm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_mean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nan_to_num_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nan_to_num_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_embedding_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardshrink_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_huber_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_leaky_relu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_remainder_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_remainder_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_signbit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_zeta_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_std_unbiased_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_unsafe_split_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_complex64
2025-12-04T12:55:44.6637261Z 
2025-12-04T12:55:44.6637737Z Finished test_decomp 3/17 ... [2025-12-04 12:55:44.623271][13373.006165549], took 10.77min
2025-12-04T12:55:44.6638783Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-5879e0e26736617e.xml
2025-12-04T12:55:45.7031190Z Uploading artifacts took 0.97 seconds
2025-12-04T12:55:45.7034458Z Running test_decomp 8/17 ... [2025-12-04 12:55:45.703295][13374.086189868]
2025-12-04T12:55:45.7034961Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:55:45.7039595Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=8', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:55:45.703754]
2025-12-04T13:09:24.4113437Z 
2025-12-04T13:09:24.4114802Z test_decomp 8/17 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_8.17_26b4abb8a1042a34_.log
2025-12-04T13:09:24.4321920Z Running 541 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__segment_reduce_lengths_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__segment_reduce_offsets_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_decomposed_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argsort_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_or_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bmm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_complex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_complex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_trunc_rounding_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dot_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_einsum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_frac_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_frexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gt_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hash_tensor_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_inner_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lcm_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_le_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lerp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_det_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_householder_product_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lstsq_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_solve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_qr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_ex_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lt_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_solve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_log_softmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_log_softmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_median_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_softmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanquantile_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_layer_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_layer_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_ctc_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_with_train_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_group_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardswish_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_kl_div_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool1d_grad_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool2d_grad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mse_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_one_hot_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rrelu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_silu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_unfold_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_fro_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_inf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pca_lowrank_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pinverse_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_0_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_neg_3_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_mean_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_blackman_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_exponential_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_nuttall_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_airy_ai_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j1_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_zeta_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stft_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triangular_solve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vdot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick__native_batch_norm_legit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__softmax_backward_data_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_bernoulli_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_addcmul_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_diagonal_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_frac_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_index_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_linalg_cross_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_unfold_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_norm_fro_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_special_xlog1py_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_cumsum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_cumsum_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_digamma_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_dist_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_div_floor_rounding_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_expand_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_expand_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_expm1_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_exponential_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_frac_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_ge_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_geometric_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_geometric_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_heaviside_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_heaviside_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_log_normal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_native_dropout_backward_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_native_dropout_backward_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_native_layer_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_nextafter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_gelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardsigmoid_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardsigmoid_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardtanh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool2d_grad_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_unfold_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_norm_inf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_normal_number_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_erfcx_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_i0e_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_i0e_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtr_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_special_zeta_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_std_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_unbiased_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_tril_indices_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_uniform_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_LSTM_eval_mode_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_uniform_cuda
2025-12-04T13:09:24.4527077Z 
2025-12-04T13:09:24.4527402Z Finished test_decomp 8/17 ... [2025-12-04 13:09:24.411846][14192.794741033], took 13.65min
2025-12-04T13:09:24.4528509Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-c4519c63d1395608.xml
2025-12-04T13:09:24.5566196Z Running test_decomp 13/17 ... [2025-12-04 13:09:24.556294][14192.939188006]
2025-12-04T13:09:24.5566772Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:09:24.5569943Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=13', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:09:24.556741]
2025-12-04T13:20:11.2757816Z 
2025-12-04T13:20:11.2759273Z test_decomp 13/17 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_13.17_a52400f805dcf5ec_.log
2025-12-04T13:20:11.2969619Z Running 552 items in this shard: test/test_decomp.py::TestDecompCUDA::test_arange_graph_cuda, test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__segment_reduce_offsets_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__softmax_backward_data_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcdiv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_allclose_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argsort_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_baddbmm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bernoulli_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bernoulli_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bincount_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_left_shift_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_not_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bucketize_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bucketize_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_solve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_max_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dist_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_einsum_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hash_tensor_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_histc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_histc_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_imag_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_put_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_prod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_prod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_le_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lerp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lgamma_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lgamma_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_det_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_solve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lstsq_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_rank_hermitian_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_multi_dot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_qr_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svdvals_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logaddexp2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_log_softmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_binary_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_binary_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_dropout_backward_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cross_entropy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_instance_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_layer_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_local_response_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool3d_grad_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_nll_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_prelu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_silu_complex_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_inf_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_in_place_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_renorm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_neg_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_mean_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_bartlett_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_hann_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_zeta_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sub_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensordot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch__scaled_mm_cuda_float8_e4m3fn, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_complex_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_masked_fill_cuda, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick__native_batch_norm_legit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_decomposed_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_addr_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_arange_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_and_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_left_shift_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_xor_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_masked_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_rad2deg_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_ge_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_igamma_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_lcm_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_lerp_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_tensor_overload_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_mv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_native_layer_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nextafter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_prelu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_silu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_unfold_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_norm_fro_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_normal_in_place_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_randn_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_round_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_special_log_ndtr_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_zeta_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_std_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_std_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_std_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_unsafe_split_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_unsafe_split_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_int8, test/test_decomp.py::DecompOneOffTestsCUDA::test_amp_batch_norm_backward_cuda
2025-12-04T13:20:11.3178805Z 
2025-12-04T13:20:11.3179120Z Finished test_decomp 13/17 ... [2025-12-04 13:20:11.276365][14839.659259266], took 10.78min
2025-12-04T13:20:11.3180207Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-fd1a91e45a41098b.xml
2025-12-04T13:20:12.3499653Z Uploading artifacts took 0.96 seconds
2025-12-04T13:20:12.3505483Z Running test_ops_fwd_gradients 1/2 ... [2025-12-04 13:20:12.350297][14840.733191155]
2025-12-04T13:20:12.3506347Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:20:12.3512105Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_fwd_gradients.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:20:12.350860]
2025-12-04T13:30:01.7149662Z 
2025-12-04T13:30:01.7150630Z test_ops_fwd_gradients 1/2 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_fwd_gradients_1.2_4abfc4ee1bccdea9_.log
2025-12-04T13:30:01.7991127Z Running 1619 items in this shard: test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_H_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_T_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___getitem___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___radd___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___radd___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rmatmul___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rmod___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rmul___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rpow___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__segment_reduce_lengths_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__segment_reduce_offsets_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__unsafe_masked_index_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_abs_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_acos_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_acosh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_add_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addbmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addcdiv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addmm_decomposed_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addmv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_alias_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_alias_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_all_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_allclose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_aminmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_angle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_angle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_any_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_argmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_argsort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_argwhere_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_as_strided_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_as_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_as_strided_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_as_strided_partial_views_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_as_strided_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_asin_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_asinh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_asinh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atan2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atan_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atleast_1d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atleast_2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_bfloat16_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_block_diag_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_bmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_bool_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_bool_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_broadcast_to_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_broadcast_to_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_byte_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cartesian_prod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cauchy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cdouble_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cfloat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_chalf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cholesky_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cholesky_inverse_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cholesky_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_clamp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_clamp_max_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_clamp_min_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_clone_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_clone_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_column_stack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_column_stack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_combinations_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_conj_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_constant_pad_nd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_copysign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_corrcoef_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cov_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cov_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cross_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cross_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cummax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cumprod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cumprod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cumsum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diag_embed_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diag_embed_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diagonal_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diagonal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diagonal_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diff_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_dist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_div_no_rounding_mode_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_double_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_double_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_dsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_dsplit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_empty_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_empty_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_empty_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_empty_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_empty_permuted_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_empty_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_eq_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_equal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_erfc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_exp2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_exp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_expand_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_expand_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_expm1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_exponential_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_eye_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_eye_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_fft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_fft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_fft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_fftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_fftshift_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_fftshift_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_hfft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_hfftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_hfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_ifftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_ifftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_ifftshift_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_ihfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_irfft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_irfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_irfft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_rfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_rfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_flip_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fliplr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_flipud_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_frexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_full_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_full_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_full_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_gather_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_gather_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_grid_sampler_3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_gt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_half_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_histc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_hsplit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_hstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_hstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_hypot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_igamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_igammac_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_fill_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_put_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_put_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_reduce_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_reduce_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_reduce_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_reduce_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_select_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_inner_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_inner_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_int_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_int_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isclose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isclose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isfinite_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isinf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isinf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isnan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isneginf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isposinf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_istft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_2inputs_2outputs_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_binary_return_by_ref_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_binary_return_by_ref_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_unary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_kthvalue_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ldexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_lerp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_lgamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_cholesky_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_cholesky_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_cholesky_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_cond_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_det_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_eig_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_eigh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_eigh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_eigvals_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_householder_product_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_inv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_inv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_inv_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_ldl_factor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_ldl_factor_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_ldl_factor_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_ldl_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lstsq_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lstsq_grad_oriented_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lstsq_grad_oriented_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lu_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lu_factor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lu_factor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lu_factor_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lu_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lu_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_matrix_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_matrix_power_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_matrix_rank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_multi_dot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_multi_dot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_pinv_hermitian_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_pinv_hermitian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_pinv_singular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_qr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_qr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_slogdet_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_solve_triangular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_svd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_svd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_svdvals_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_svdvals_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_tensorinv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_vecdot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_vecdot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_vector_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log10_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log10_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log1p_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logaddexp2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logaddexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logcumsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logcumsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logdet_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logical_and_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logical_not_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logical_xor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logspace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logspace_tensor_overload_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logspace_tensor_overload_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_long_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_lt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_lu_unpack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mH_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mT_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_argmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_cumprod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_cumsum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_log_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_logaddexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_logsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_logsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_median_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_select_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_std_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_std_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_sum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_var_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_var_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_matmul_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_matmul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_matrix_exp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_max_pool2d_with_indices_backward_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_max_reduction_no_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_max_reduction_with_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_median_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_meshgrid_variadic_tensors_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_meshgrid_variadic_tensors_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_min_binary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_min_reduction_no_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_minimum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_movedim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_msort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_multinomial_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nanquantile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nansum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nansum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_narrow_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_narrow_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_native_dropout_backward_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ne_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_neg_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_new_empty_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_new_empty_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nextafter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_avg_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_avg_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_avg_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_binary_cross_entropy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_celu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_channel_shuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_channel_shuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv1d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv_transpose1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv_transpose2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv_transpose3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_cross_entropy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_ctc_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_dropout2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_dropout_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_elu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_embedding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_gaussian_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_gelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_hardtanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_interpolate_bicubic_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_interpolate_linear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_l1_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_leaky_relu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_linear_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_local_response_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_max_unpool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_max_unpool2d_grad_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_max_unpool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_mse_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_multi_head_attention_forward_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_multi_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_multilabel_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pad_circular_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pad_constant_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pad_reflect_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pad_replicate_negative_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pairwise_distance_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pairwise_distance_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pdist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pixel_shuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_prelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_relu6_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_rms_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_rrelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_selu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_softmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_softshrink_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_softsign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_tanhshrink_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_unfold_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_unfold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_upsample_nearest_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nonzero_static_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nonzero_static_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_norm_fro_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_norm_inf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_norm_nuc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_normal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_normal_in_place_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_normal_in_place_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ones_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ones_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ones_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ormqr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_outer_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_outer_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_pca_lowrank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_pca_lowrank_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_permute_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_permute_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_positive_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_prod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_put_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_qr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rad2deg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rand_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rand_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_randint_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_randint_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_randn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_randn_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ravel_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_real_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_reciprocal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_renorm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_renorm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_repeat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_repeat_interleave_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_reshape_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_reshape_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_reshape_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_reshape_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_resize__cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_resolve_neg_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_resolve_neg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_roll_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rot90_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_round_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_round_decimals_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rsub_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scalar_tensor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scatter_add_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scatter_reduce_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scatter_reduce_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scatter_reduce_sum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_select_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sgn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sigmoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_bartlett_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_blackman_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_exponential_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_general_cosine_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_hann_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_nuttall_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signbit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sinc_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sinh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sinh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_slice_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_softmax_with_dtype_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_softmax_with_dtype_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sparse_sampled_addmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_airy_ai_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_bessel_j1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_bessel_y0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_bessel_y1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_chebyshev_polynomial_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_chebyshev_polynomial_v_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_entr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_i0e_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_i1e_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_legendre_polynomial_p_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_modified_bessel_k0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_modified_bessel_k1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_ndtr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_ndtri_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_scaled_modified_bessel_k0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_xlog1py_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_zeta_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_split_with_sizes_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_split_with_sizes_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_split_with_sizes_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sqrt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_square_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_squeeze_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_squeeze_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_stack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_mean_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_stft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_stft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sub_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_svd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_svd_lowrank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_t_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_take_along_dim_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_take_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tan_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tile_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_to_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_to_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_to_sparse_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_trace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_transpose_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_trapezoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_trapz_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_trapz_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tril_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tril_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_triu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_true_divide_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_true_divide_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unbind_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unbind_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unflatten_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unfold_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unfold_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsafe_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsafe_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsqueeze_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsqueeze_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsqueeze_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_var_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_var_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_var_mean_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_var_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_vdot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_view_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_view_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_view_as_real_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_view_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_vsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_vsplit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_vstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_where_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_zero__cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_zero__cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_zeros_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_zeros_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_zeros_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_H_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_T_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___radd___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___radd___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rmatmul___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rmatmul___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rmul___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rpow___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rpow___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rsub___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD__segment_reduce_lengths_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD__segment_reduce_offsets_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD__unsafe_masked_index_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_abs_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_acos_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_add_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addbmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addcdiv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addcmul_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addcmul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addmm_decomposed_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addmm_decomposed_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_alias_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_alias_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_allclose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_angle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_angle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_any_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_arange_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_argmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_argmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_argsort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_argwhere_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_argwhere_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_as_strided_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_as_strided_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_as_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_as_strided_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_as_strided_partial_views_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_as_strided_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_asin_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_asinh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_atleast_2d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_atleast_2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_baddbmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_bfloat16_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_block_diag_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_bmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_bool_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_bucketize_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_byte_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cdist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cdouble_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cdouble_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cfloat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cfloat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_chalf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cholesky_inverse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cholesky_inverse_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cholesky_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_chunk_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_clamp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_clone_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_column_stack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_combinations_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_complex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_conj_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_conj_physical_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_conj_physical_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_copysign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_corrcoef_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cos_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cosh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cov_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cov_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cross_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cummax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cummin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cumulative_trapezoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cumulative_trapezoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diag_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diag_embed_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diagflat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diagflat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diagonal_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diagonal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diagonal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diff_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_digamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_dist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_dot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_double_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_double_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_dsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_dstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_empty_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_empty_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_empty_strided_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_eq_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_equal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_erfc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_exp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_exp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_expand_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_expand_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_expand_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_exponential_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fftshift_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_hfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_hfftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_hfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ifft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ifft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ifft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ifftshift_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ihfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_irfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_irfft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_irfftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_rfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fliplr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_flipud_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_float_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_float_power_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_floor_divide_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_frac_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_full_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_full_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_gradient_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_grid_sampler_2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_grid_sampler_3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_histc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_hsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_hstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_hstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_hypot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_igammac_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_add_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_add_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_reduce_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_reduce_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_reduce_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_inner_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_int_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isclose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isclose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isfinite_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isinf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isnan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isnan_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isneginf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_binary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_binary_return_by_ref_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_binary_return_by_ref_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_unary_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_kron_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_kthvalue_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ldexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lerp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lgamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_cholesky_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_cholesky_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_cross_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_cross_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_det_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_det_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_eig_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_eigh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_eigvals_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_eigvalsh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_eigvalsh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_householder_product_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_householder_product_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_inv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_inv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_inv_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_ldl_factor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_ldl_factor_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_ldl_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_ldl_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lstsq_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lstsq_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lstsq_grad_oriented_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lu_factor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lu_factor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lu_factor_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lu_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lu_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_matrix_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_matrix_power_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_matrix_power_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_matrix_rank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_matrix_rank_hermitian_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_multi_dot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_pinv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_pinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_pinv_hermitian_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_pinv_hermitian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_pinv_singular_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_pinv_singular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_qr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_slogdet_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_svd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_svd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_tensorinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_tensorsolve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_vander_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_vecdot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_vecdot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_vector_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_vector_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linspace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linspace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linspace_tensor_overload_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linspace_tensor_overload_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log10_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log10_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log1p_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log1p_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logaddexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logcumsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logical_and_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logical_not_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logical_or_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logical_or_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logical_xor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logspace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logspace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lu_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lu_unpack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mH_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mT_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mT_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_argmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_cumprod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_cumsum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_median_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_select_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_softmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_sum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_var_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_matrix_exp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_matrix_exp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_max_binary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_max_reduction_with_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_maximum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_meshgrid_list_of_tensors_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_min_binary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_min_reduction_with_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_minimum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mode_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_movedim_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_msort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mul_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_multinomial_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nanmean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nanmedian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_narrow_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_narrow_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_native_dropout_backward_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_native_layer_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ne_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_neg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_empty_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_empty_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_ones_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_zeros_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_zeros_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_alpha_dropout_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_avg_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_avg_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_batch_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_celu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_channel_shuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv1d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv_transpose3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_cosine_embedding_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_cosine_similarity_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_dropout2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_dropout3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_dropout_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_gaussian_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_hardsigmoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_hardtanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_instance_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_interpolate_nearest_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_l1_loss_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_linear_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_linear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_local_response_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_max_unpool1d_grad_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_max_unpool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_mish_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_multi_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_normalize_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pad_circular_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pad_constant_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pad_reflect_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pad_reflect_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pairwise_distance_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pixel_shuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_prelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_relu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_rms_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_rrelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_selu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_silu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_soft_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_softplus_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_tanhshrink_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_tanhshrink_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_unfold_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_unfold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_upsample_bilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nonzero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nonzero_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nonzero_static_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_norm_fro_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_norm_inf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_norm_nuc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_normal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_normal_in_place_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ones_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ones_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ones_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ormqr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ormqr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_pca_lowrank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_permute_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_permute_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_polar_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_polygamma_polygamma_n_1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_positive_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_pow_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_put_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_qr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_quantile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_rad2deg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_rand_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_randn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_randn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_randn_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_real_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_real_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_reciprocal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_remainder_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_renorm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_repeat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_repeat_interleave_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_repeat_interleave_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_reshape_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_reshape_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_resize_as__cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_resolve_conj_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_resolve_neg_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_resolve_neg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_rot90_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_round_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_round_decimals_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_rsqrt_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_reduce_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_reduce_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_reduce_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_reduce_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_reduce_sum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_select_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sgn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_short_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_short_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sigmoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_bartlett_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_exponential_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_gaussian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_general_cosine_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_hann_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_kaiser_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_nuttall_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sin_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sinc_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sinc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sinh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sinh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_slice_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_slice_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_softmax_with_dtype_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_airy_ai_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_bessel_j1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_bessel_y0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_bessel_y1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_chebyshev_polynomial_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_chebyshev_polynomial_u_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_chebyshev_polynomial_w_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_entr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_hermite_polynomial_h_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_i0e_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_laguerre_polynomial_l_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_log_ndtr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_modified_bessel_i1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_modified_bessel_k0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_ndtri_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_spherical_bessel_j0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_split_list_args_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_split_with_sizes_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sqrt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_square_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_squeeze_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_squeeze_multiple_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_std_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_std_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_std_mean_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_std_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_stft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sub_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_t_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_t_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_take_along_dim_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_take_along_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_take_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_take_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tan_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tensor_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tile_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_to_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_to_sparse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_to_sparse_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_topk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_trace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_trace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_transpose_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_transpose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_trapezoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_trapz_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_triangular_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tril_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_triu_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_triu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_trunc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unbind_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unflatten_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unfold_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unfold_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unfold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_uniform_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unique_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unsafe_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unsqueeze_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_mean_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_view_as_complex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_view_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_view_as_real_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_view_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_view_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_vsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_vsplit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_vstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_vstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_where_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_where_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_xlogy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_zero__cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_zeros_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_zeros_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_zeros_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_H_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_H_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_T_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_T_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___getitem___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___radd___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rdiv___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rmatmul___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rmatmul___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rpow___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rsub___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rsub___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD__segment_reduce_lengths_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD__unsafe_masked_index_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD__unsafe_masked_index_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_abs_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_abs_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_acos_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_acosh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_acosh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addbmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addbmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addcmul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addmm_decomposed_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addmm_decomposed_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_allclose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_aminmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_angle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_any_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_argmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_argmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_argsort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_as_strided_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_as_strided_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_as_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_as_strided_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_as_strided_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_asinh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_atanh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_atleast_2d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_atleast_2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_atleast_3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_baddbmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_bernoulli_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_block_diag_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_bmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_bmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_bool_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_broadcast_to_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_bucketize_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_byte_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_byte_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cauchy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cdouble_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cfloat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_chalf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_char_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_char_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cholesky_inverse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cholesky_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_chunk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_clone_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_combinations_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_complex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_conj_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_conj_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_conj_physical_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_conj_physical_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_contiguous_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_copysign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cosh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_count_nonzero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cov_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cross_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cumprod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cumsum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cumulative_trapezoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cumulative_trapezoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_deg2rad_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diag_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diag_embed_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diagflat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diagonal_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diagonal_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diagonal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diagonal_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_digamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_dist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_div_no_rounding_mode_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_div_no_rounding_mode_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_div_trunc_rounding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_dot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_double_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_double_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_dsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_dsplit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_einsum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_einsum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_empty_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_empty_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_empty_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_empty_permuted_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_empty_permuted_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_eq_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_erfinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_exp2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_exp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_expand_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_expand_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_expm1_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_expm1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_eye_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_eye_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_fft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_fft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_fftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_fftshift_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_hfftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ihfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_irfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_irfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_rfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_rfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fill_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_flatten_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_flip_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fliplr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_flipud_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_float_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_frac_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_full_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_full_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ge_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_geqrf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_gradient_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_gt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_half_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_half_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_hash_tensor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_heaviside_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_histc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_hsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_hstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_hstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_hypot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_i0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_igammac_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_fill_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_put_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_put_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_reduce_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_reduce_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isclose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isinf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isnan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isposinf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_item_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_jiterator_binary_return_by_ref_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ldexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_le_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_lerp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_lerp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_lgamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_cholesky_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_cholesky_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_cholesky_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_cond_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_cond_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_cross_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_det_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_diagonal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_diagonal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_eigvals_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_householder_product_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_householder_product_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_inv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_inv_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_ldl_factor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_ldl_factor_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_ldl_factor_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_ldl_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lstsq_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lstsq_grad_oriented_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lstsq_grad_oriented_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lu_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lu_factor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lu_factor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lu_factor_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lu_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_matrix_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_matrix_power_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_matrix_power_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_matrix_rank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_matrix_rank_hermitian_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_multi_dot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_pinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_pinv_singular_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_slogdet_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_slogdet_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_solve_triangular_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_solve_triangular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_svd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_svd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_tensorinv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_tensorinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_tensorsolve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_vander_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_vecdot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_vector_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linspace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log10_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log_normal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log_softmax_with_dtype_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log_softmax_with_dtype_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logaddexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logcumsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logcumsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logical_and_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logspace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logspace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logspace_tensor_overload_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logspace_tensor_overload_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_lu_unpack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mH_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mT_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_argmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_cumsum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_cumsum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_logsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_median_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_normalize_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_prod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_softmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_std_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_sum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_var_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_matmul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_matrix_exp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_matrix_exp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_maximum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_meshgrid_variadic_tensors_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_min_binary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_min_reduction_no_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_min_reduction_with_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_minimum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_movedim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_msort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mul_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nan_to_num_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nanmean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nanmean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nanmedian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nanquantile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_native_batch_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_native_layer_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ne_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_empty_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_empty_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_full_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_ones_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_ones_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_zeros_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_alpha_dropout_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_avg_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_avg_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_batch_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_channel_shuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_channel_shuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv2d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv_transpose1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv_transpose2d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv_transpose2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_ctc_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_dropout2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_dropout_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_elu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_embedding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_gelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_glu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_grid_sample_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_group_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_huber_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_instance_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_interpolate_linear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_interpolate_nearest_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_interpolate_trilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_kl_div_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_l1_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_leaky_relu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_linear_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_logsigmoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_max_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_max_unpool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_max_unpool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_max_unpool2d_grad_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_max_unpool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_multilabel_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_normalize_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pad_constant_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pad_replicate_negative_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pad_replicate_negative_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pairwise_distance_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pdist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_prelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_relu6_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_relu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_rms_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_rrelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_silu_complex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_softshrink_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_softsign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_tanhshrink_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_threshold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_unfold_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nonzero_static_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_norm_fro_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_norm_inf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_normal_number_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ones_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ones_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_outer_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_pca_lowrank_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_permute_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_permute_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_permute_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_pinverse_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_polar_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_polygamma_polygamma_n_0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_polygamma_polygamma_n_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_positive_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_positive_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_pow_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_put_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_qr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_quantile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_rand_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_randint_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_randint_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_randn_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ravel_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ravel_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_remainder_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_renorm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_repeat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_repeat_interleave_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_repeat_interleave_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_reshape_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_reshape_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_reshape_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_reshape_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resize__cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resize__cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resize_as__cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resolve_conj_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resolve_conj_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resolve_neg_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resolve_neg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_roll_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_roll_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_rot90_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_rot90_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_round_decimals_0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_round_decimals_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_rsqrt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_rsub_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scatter_add_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scatter_reduce_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scatter_reduce_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scatter_reduce_sum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sgn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_short_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_short_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sigmoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_bartlett_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_blackman_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_gaussian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_general_cosine_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_hamming_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_nuttall_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sin_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sinc_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_slice_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_slice_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_slice_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_softmax_with_dtype_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_bessel_j0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_bessel_y1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_chebyshev_polynomial_u_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_chebyshev_polynomial_v_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_entr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_erfcx_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_hermite_polynomial_h_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_i1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_legendre_polynomial_p_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_modified_bessel_i1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_modified_bessel_k0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_ndtri_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_list_args_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_list_args_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_with_sizes_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_with_sizes_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sqrt_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_squeeze_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_squeeze_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_squeeze_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_squeeze_multiple_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_stack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_std_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_std_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_std_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_stft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_stft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sub_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_svd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_svd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_svd_lowrank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_svd_lowrank_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_t_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_take_along_dim_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tan_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tensordot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tensordot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_to_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_to_sparse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_topk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_trace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_transpose_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_transpose_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_transpose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_transpose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_trapz_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_triangular_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tril_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_true_divide_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_trunc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unbind_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unflatten_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unfold_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unfold_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unfold_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unfold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unique_consecutive_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unsafe_chunk_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unsafe_chunk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unsafe_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unsafe_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unsqueeze_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_var_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_var_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_var_mean_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_var_mean_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_vdot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_view_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_view_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_vsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_vstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_vstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_where_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_xlogy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_zero__cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_zero__cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_zeros_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_zeros_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_zeros_like_cuda_complex128
2025-12-04T13:30:01.8810860Z 
2025-12-04T13:30:01.8811241Z Finished test_ops_fwd_gradients 1/2 ... [2025-12-04 13:30:01.717653][15430.10054539], took 9.82min
2025-12-04T13:30:01.8812510Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops_fwd_gradients/test_ops_fwd_gradients-dac273fbaf67ad10.xml
2025-12-04T13:30:01.8820076Z Running test_meta 2/5 ... [2025-12-04 13:30:01.881677][15430.264571133]
2025-12-04T13:30:01.8820676Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:30:01.8824258Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_meta.py', '--shard-id=2', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:30:01.882150]
2025-12-04T13:56:15.6860335Z 
2025-12-04T13:56:15.6861327Z test_meta 2/5 was successful, full logs can be found in artifacts with path test/test-reports/test_meta_2.5_dad2a564d06ce93f_.log
2025-12-04T13:56:16.0272795Z Running 8088 items in this shard: test/test_meta.py::TestMetaConverter::test_imag, test/test_meta.py::TestMetaConverter::test_inplace_set_storage, test/test_meta.py::TestMetaConverter::test_non_leaf, test/test_meta.py::TestMetaConverter::test_view_mutate, test/test_meta.py::TestMetaConverter::test_weakref, test/test_meta.py::TestMetaCUDA::test_batch_norm_backward_output_mask0_cuda, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_nextafter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_special_xlog1py_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_copysign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_div_trunc_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_T_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_T_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_T_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___getitem___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rand___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmatmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmatmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___ror___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___ror___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___ror___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rxor___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rxor___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcmul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_frac_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log10_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__native_batch_norm_legit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__native_batch_norm_legit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__softmax_backward_data_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__softmax_backward_data_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__upsample_bilinear2d_aa_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addbmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_decomposed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_decomposed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_decomposed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_decomposed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_allclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_allclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_arange_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_arange_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_arange_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_arange_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_arange_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_scatter_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_baddbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_baddbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bernoulli_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bernoulli_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_right_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_right_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bucketize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cauchy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cauchy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cfloat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clone_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_copysign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_floor_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_floor_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_einsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_equal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_equal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exponential_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_float8_e4m3fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_float8_e5m2fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_frexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gcd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_geometric_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_geometric_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_grid_sampler_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_grid_sampler_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hash_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hash_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_heaviside_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_heaviside_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_histc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_igamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_igammac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_imag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_inner_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_istft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_item_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_item_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kthvalue_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lcm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lerp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cond_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_det_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_det_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eig_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigvalsh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_inv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lstsq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_rank_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_rank_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_multi_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_multi_dot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_singular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_qr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_triangular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_svdvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vecdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log1p_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logsumexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_matmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_matrix_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_matrix_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nan_to_num_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nan_to_num_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmedian_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmedian_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanquantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_native_dropout_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_native_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_alpha_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_alpha_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_similarity_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_embedding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_gelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_gelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_grid_sample_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_group_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardswish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardswish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_huber_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_area_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_bicubic_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_trilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool1d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_mish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_circular_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_constant_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_constant_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_constant_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_reflect_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_reflect_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu6_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu6_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rms_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rrelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_selu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_silu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_smooth_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_threshold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_threshold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_nuc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_in_place_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_in_place_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ormqr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_3_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_3_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rand_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ravel_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_remainder_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_remainder_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_remainder_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_neg_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_neg_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_bartlett_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_gaussian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_general_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_hann_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_hann_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sparse_mm_reduce_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sparse_sampled_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_u_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_entr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_h_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_he_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_legendre_polynomial_p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_legendre_polynomial_p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_log_ndtr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtri_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtri_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_spherical_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_xlog1py_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_svd_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_svd_lowrank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__efficient_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triangular_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trunc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_uint64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmatmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmatmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___ror___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___ror___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rpow___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rxor___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rxor___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_ceil_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lerp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__native_batch_norm_legit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__native_batch_norm_legit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__segment_reduce_lengths_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_decomposed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_allclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argsort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argsort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argsort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_1d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_1d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_baddbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bernoulli_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bincount_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_left_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cauchy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_combinations_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_copysign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cov_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cov_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_floor_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_einsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_einsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_float8_e5m2fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_divide_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_divide_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gcd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geqrf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_grid_sampler_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_grid_sampler_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_half_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_heaviside_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_heaviside_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_histc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_istft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_return_by_ref_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_return_by_ref_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_return_by_ref_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kthvalue_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lcm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lcm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_le_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cond_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_det_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_diagonal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_inv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_inv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_inv_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lstsq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lstsq_grad_oriented_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_rank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_rank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_rank_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_multi_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_pinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_slogdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_triangular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_triangular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vecdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vecdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vecdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vecdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_normal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logaddexp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logaddexp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logaddexp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logaddexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logcumsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_unpack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_with_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmean_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanquantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_dropout_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_similarity_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_elu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_embedding_bag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_embedding_bag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_fractional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_fractional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_grid_sample_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_group_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardswish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardtanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hinge_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_instance_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_area_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_area_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_bicubic_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_nearest_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_trilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_kl_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_leaky_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_leaky_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_linear_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_local_response_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_margin_ranking_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_margin_ranking_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool2d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_mse_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multi_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_one_hot_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_poisson_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_prelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu6_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_rrelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_rrelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_silu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_silu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_smooth_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softplus_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softplus_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_nearest_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_fro_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_inf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_in_place_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_number_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ormqr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_outer_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pca_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_positive_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rand_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_remainder_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_remainder_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_renorm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_renorm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scalar_tensor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_searchsorted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_bartlett_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sparse_mm_reduce_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_airy_ai_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_erfcx_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_h_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_h_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i0e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_laguerre_polynomial_l_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtri_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtri_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtri_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_xlog1py_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_zeta_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_multiple_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_multiple_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_multiple_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensordot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_topk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__efficient_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapezoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapz_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapz_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_uniform_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_uniform_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_consecutive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_uint64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rand___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rand___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmatmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___ror___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rpow___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rxor___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__native_batch_norm_legit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__segment_reduce_offsets_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__softmax_backward_data_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__upsample_bilinear2d_aa_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides___rmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides___ror___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides___rsub___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__unsafe_masked_index_put_accumulate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_addmm_decomposed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_any_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_argwhere_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_atleast_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_atleast_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bernoulli_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bfloat16_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bitwise_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cdouble_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_chalf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_hfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_ifft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_irfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_flip_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_isclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_isnan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_isneginf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_isposinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_jiterator_4inputs_with_extra_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_kthvalue_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_lcm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_inv_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_lu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_lu_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_slogdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_svdvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_vector_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_lt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_matrix_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_new_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_gelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_grid_sample_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_interpolate_trilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_kl_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_margin_ranking_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_max_unpool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_max_unpool2d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_mish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_multilabel_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_pad_replicate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_poisson_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_softmin_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_softshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_triplet_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_normal_in_place_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_outer_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_permute_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_positive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_randint_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_randn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_real_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_repeat_interleave_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_resolve_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_rot90_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_sparse_sampled_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_bessel_y0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_laguerre_polynomial_l_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_scaled_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_tril_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_true_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unsafe_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unsqueeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_var_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_vdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_zero__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_allclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_allclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_allclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_allclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_aminmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_aminmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_angle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_baddbmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_left_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_right_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bucketize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bucketize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_count_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_count_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_count_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_digamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_digamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dist_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_no_rounding_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_trunc_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_trunc_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_trunc_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_einsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_equal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_equal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfinv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_float8_e5m2fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flip_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_power_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gcd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ge_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geometric_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geometric_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gradient_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_grid_sampler_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_grid_sampler_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_grid_sampler_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_grid_sampler_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hash_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hash_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_heaviside_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_heaviside_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_heaviside_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_heaviside_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_heaviside_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hypot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hypot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_igamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_imag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_inner_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_inner_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_inner_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_int_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_int_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_int_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_return_by_ref_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lcm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ldexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ldexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ldexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_le_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cond_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigvalsh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_householder_product_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_householder_product_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_inv_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lstsq_grad_oriented_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_multi_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_multi_dot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_multi_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_norm_subgradients_at_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_singular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_singular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_slogdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_tensorinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_tensorsolve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_tensorsolve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vecdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vector_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vector_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vector_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logaddexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logaddexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lu_unpack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_var_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_median_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_variadic_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_minimum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nan_to_num_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nan_to_num_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nansum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_dropout_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nextafter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_alpha_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_batch_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_similarity_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_similarity_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cross_entropy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_embedding_bag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_embedding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_gelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_glu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_glu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_group_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardtanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardtanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_huber_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_instance_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_area_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_linear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_nearest_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_kl_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_linear_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_linear_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_local_response_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool2d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multi_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_one_hot_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pairwise_distance_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu6_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu6_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_rrelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_silu_complex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softplus_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softplus_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softplus_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softsign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softsign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_upsample_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_upsample_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_upsample_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_upsample_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_upsample_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_fro_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_fro_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_nuc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pinverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pinverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_quantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rand_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_renorm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_renorm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_neg_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_mean_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_blackman_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_hann_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_nuttall_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signbit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sparse_mm_reduce_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sparse_sampled_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sparse_sampled_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_airy_ai_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_airy_ai_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_erfcx_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_h_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_h_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_he_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_laguerre_polynomial_l_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_legendre_polynomial_p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_log_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensordot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_sparse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_topk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_topk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triangular_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_uniform_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_uniform_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_uniform_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unravel_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zero__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___ror___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rpow___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rxor___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sigmoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__native_batch_norm_legit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__softmax_backward_data_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_decomposed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_decomposed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_all_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_allclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bool_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_broadcast_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cartesian_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_char_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_clone_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_combinations_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cummax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_double_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_expand_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_geometric_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_hypot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_isposinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_jiterator_unary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_inv_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_ldl_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_matrix_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_multi_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_pinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_pinv_singular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_solve_triangular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_vander_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_vecdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_long_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_mH_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_meshgrid_variadic_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_min_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_narrow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_new_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_cosine_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_elu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_embedding_bag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_fractional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_gaussian_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_glu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_group_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_huber_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_interpolate_bicubic_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_interpolate_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_interpolate_trilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_mish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_pad_circular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_rms_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_softmin_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_pca_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_permute_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_polygamma_polygamma_n_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_rad2deg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_randn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_ravel_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_renorm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_reshape_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_scatter_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_general_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signbit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sparse_sampled_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_hermite_polynomial_h_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_i0e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_squeeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_svd_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tril_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unbind_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unflatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unsafe_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_view_as_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_view_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_vstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_allclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_aminmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_angle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_angle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_angle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argsort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bernoulli_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bernoulli_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_left_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_right_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bool_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bucketize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bucketize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cauchy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cauchy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cauchy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_digamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dist_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dist_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_floor_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_einsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gcd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gradient_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_grid_sampler_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_heaviside_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_histc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hypot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hypot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_istft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_istft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_2inputs_2outputs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_2inputs_2outputs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kthvalue_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lcm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cholesky_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eig_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvals_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvalsh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvalsh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_householder_product_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lstsq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lstsq_grad_oriented_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lstsq_grad_oriented_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_multi_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_subgradients_at_zero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_pinv_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_qr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_solve_triangular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_svdvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vander_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vecdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vecdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logcumsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_long_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lu_unpack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumsum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_median_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_var_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_var_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matrix_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_pool2d_with_indices_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_multinomial_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmean_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanquantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_dropout_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_alpha_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_celu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_celu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_similarity_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_ctc_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_elu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_embedding_bag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_fractional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_gelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_glu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_group_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardswish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardtanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hinge_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_huber_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_huber_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_bicubic_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_bicubic_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_trilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_kl_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_kl_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_leaky_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_local_response_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_local_response_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool3d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool3d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multilabel_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multilabel_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_reflect_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pairwise_distance_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_prelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_prelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rms_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_silu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_upsample_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_fro_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_inf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_in_place_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_in_place_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_in_place_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rad2deg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rand_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_renorm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_renorm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_interleave_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_interleave_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_interleave_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_mean_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_searchsorted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_searchsorted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_cosine_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_gaussian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_gaussian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sparse_mm_reduce_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sparse_mm_reduce_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sparse_sampled_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_airy_ai_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_airy_ai_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_airy_ai_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_airy_ai_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_erfcx_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_h_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_h_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_h_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_he_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_legendre_polynomial_p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_log_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_log_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtri_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtri_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtri_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtri_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_spherical_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_spherical_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_zeta_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_zeta_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_zeta_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_svd_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensordot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_topk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_topk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_topk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unravel_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_xlogy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_xlogy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_xlogy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_group_norm_backward_output_mask0_cuda, test/test_meta.py::TestMetaCUDA::test_group_norm_backward_output_mask5_cuda, test/test_meta.py::TestMetaCUDA::test_index_select_out_cuda, test/test_meta.py::TestMetaCUDA::test_layer_norm_backward_output_mask6_cuda, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rand___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rand___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rand___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmatmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmul___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___ror___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rpow___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rpow___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rpow___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rxor___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__batch_norm_with_update_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__native_batch_norm_legit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__segment_reduce_offsets_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__softmax_backward_data_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__upsample_bilinear2d_aa_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_decomposed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_decomposed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_alias_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_allclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_allclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_allclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_aminmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_arange_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_arange_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_baddbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bincount_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bincount_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_left_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_not_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_right_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bucketize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bucketize_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cholesky_inverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cholesky_inverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_contiguous_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_contiguous_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cross_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_deg2rad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diff_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diff_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diff_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diff_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flipud_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flipud_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gcd_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gcd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geometric_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geometric_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geometric_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gradient_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gradient_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_grid_sampler_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_grid_sampler_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_igamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_imag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_imag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_inner_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isneginf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isneginf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isposinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isposinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kthvalue_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kthvalue_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lcm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_le_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_le_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cholesky_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cross_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eig_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigvalsh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_householder_product_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_householder_product_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_inv_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_factor_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lstsq_grad_oriented_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_rank_hermitian_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_rank_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_subgradients_at_zero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_subgradients_at_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_pinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_pinv_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_slogdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_svdvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_tensorsolve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vecdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vector_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vector_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logaddexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_unpack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matrix_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matrix_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_multinomial_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_multinomial_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nan_to_num_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_dropout_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_dropout_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ne_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ne_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ne_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ne_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nextafter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_celu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_dropout3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_dropout3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_elu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_elu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_embedding_bag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_embedding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_fractional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_gelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_glu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_group_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardtanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardtanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_area_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_bicubic_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_l1_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_leaky_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_linear_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_logsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_logsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_margin_ranking_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_margin_ranking_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool1d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multi_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multilabel_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_poisson_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_poisson_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu6_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_rms_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_selu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_selu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_silu_complex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_smooth_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softplus_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_tanhshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_tanhshrink_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_threshold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_threshold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_threshold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_upsample_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_upsample_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_fro_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_nuc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pca_lowrank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pinverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pinverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_qr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_quantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rand_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rand_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_remainder_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_interleave_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sgn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sgn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_short_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_short_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_gaussian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_general_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_nuttall_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signbit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signbit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sparse_mm_reduce_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_airy_ai_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_airy_ai_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_airy_ai_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_h_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_h_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_he_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_laguerre_polynomial_l_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_laguerre_polynomial_l_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_laguerre_polynomial_l_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_legendre_polynomial_p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_log_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_spherical_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_xlog1py_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_xlog1py_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_zeta_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_zeta_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_zeta_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_mean_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_to_size_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_to_size_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_svd_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensordot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_topk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_topk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__efficient_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triangular_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unravel_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_T_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_T_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_T_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rand___cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rand___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmatmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rpow___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__chunk_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sigmoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_zero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_zero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__segment_reduce_lengths_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__segment_reduce_offsets_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__segment_reduce_offsets_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__segment_reduce_offsets_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmm_decomposed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_baddbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bernoulli_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bfloat16_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_shapes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bucketize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cauchy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chalf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chalf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chalf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_inverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_inverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dist_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dist_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dist_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_floor_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_trunc_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_permuted_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_equal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_equal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_equal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float8_e5m2fnuz, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gcd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geometric_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geometric_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geometric_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_grid_sampler_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hash_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_heaviside_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_heaviside_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_i0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_igammac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_inner_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isposinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isposinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isposinf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isreal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kron_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kron_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kthvalue_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kthvalue_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cholesky_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cond_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_det_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eig_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eig_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_householder_product_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lstsq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lstsq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lstsq_grad_oriented_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_rank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_rank_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_rank_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_multi_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_singular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_singular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_tensorinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_tensorsolve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_tensorsolve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_unpack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_pool2d_with_indices_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nan_to_num_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmedian_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmedian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmedian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmedian_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanquantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_avg_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_alpha_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_similarity_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_elu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_embedding_bag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_embedding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_fractional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_fractional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_gelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_grid_sample_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_group_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardswish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_huber_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_instance_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_instance_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_area_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_bicubic_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_bicubic_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_bicubic_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_kl_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_leaky_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_linear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_logsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_margin_ranking_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_mish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_mse_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_mse_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pairwise_distance_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu6_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rms_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rms_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_threshold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_upsample_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_upsample_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_nuc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_in_place_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_qr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_quantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rad2deg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rad2deg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rand_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rand_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_renorm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_searchsorted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_searchsorted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_bartlett_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_hann_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_kaiser_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signbit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_mm_reduce_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_mm_reduce_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_sampled_addmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_airy_ai_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_erfcx_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_erfcx_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_h_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_laguerre_polynomial_l_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_legendre_polynomial_p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtri_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtri_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_spherical_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_zeta_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_zeta_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_svd_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_along_dim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_along_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensordot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_topk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__efficient_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_true_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unravel_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unravel_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_mixed_dtype_for_native_layer_norm_backward_float16_float32_cuda
2025-12-04T13:56:16.3629243Z 
2025-12-04T13:56:16.3629556Z Finished test_meta 2/5 ... [2025-12-04 13:56:15.698116][17004.081006297], took 26.23min
2025-12-04T13:56:16.3630622Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_meta/test_meta-cbc50d7c3e0b1b6a.xml
2025-12-04T13:56:17.1390137Z Uploading artifacts took 1.11 seconds
2025-12-04T13:56:17.1394393Z Running test_ops_jit 2/2 ... [2025-12-04 13:56:17.139266][17005.522161083]
2025-12-04T13:56:17.1394911Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:56:17.1399269Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_jit.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:56:17.139713]
2025-12-04T14:08:06.1254035Z 
2025-12-04T14:08:06.1255012Z test_ops_jit 2/2 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_jit_2.2_814c1a8715769c60_.log
2025-12-04T14:08:06.1510377Z Running 594 items in this shard: test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_acos_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_asinh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_atan_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_atanh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_div_floor_rounding_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_erf_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_erfc_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_exp2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_expm1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_ge_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_gt_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_igammac_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_linalg_det_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_linalg_inv_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_log_softmax_with_dtype_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_logit_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_logsumexp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_lt_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_mH_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_matrix_exp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_max_binary_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_neg_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_nn_functional_conv2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_nn_functional_conv3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_outer_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_round_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_round_decimals_0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_round_decimals_neg_3_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_transpose_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_trunc_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_H_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_T_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_T_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___getitem___cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___radd___cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___radd___cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rdiv___cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rmod___cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rmul___cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rmul___cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rsub___cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rsub___cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__batch_norm_with_update_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__chunk_cat_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__native_batch_norm_legit_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__segment_reduce_lengths_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__segment_reduce_offsets_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__softmax_backward_data_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__unsafe_masked_index_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__unsafe_masked_index_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__upsample_bilinear2d_aa_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_abs_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_abs_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_acos_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_acos_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_add_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addbmm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addcmul_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addmm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addmm_decomposed_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addmv_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addr_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addr_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_alias_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_alias_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_all_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_all_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_allclose_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_allclose_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_amin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_angle_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_any_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_any_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_arange_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_argmax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_partial_views_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_partial_views_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_scatter_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_scatter_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_asinh_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atan_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atanh_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atanh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atleast_1d_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atleast_1d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atleast_2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_bfloat16_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_bfloat16_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_block_diag_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_bmm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_bmm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_bool_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_broadcast_tensors_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_broadcast_to_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_byte_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cartesian_prod_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cdist_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cdouble_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ceil_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_chalf_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_chalf_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_char_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cholesky_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cholesky_inverse_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cholesky_solve_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_chunk_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_chunk_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_clamp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_clamp_min_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_column_stack_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_combinations_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_complex_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_conj_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_conj_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_conj_physical_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_contiguous_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cummin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cumprod_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cumsum_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cumulative_trapezoid_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_deg2rad_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diag_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diag_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diagflat_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diagonal_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diagonal_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diagonal_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diagonal_scatter_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diagonal_scatter_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diff_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diff_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_dist_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_dist_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_div_floor_rounding_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_div_trunc_rounding_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_double_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_dstack_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_einsum_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_einsum_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_empty_like_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_empty_permuted_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_equal_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_erfc_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_erfinv_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_exp2_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_exp2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_exp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_expand_as_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_expand_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_expand_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_expm1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_eye_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_fft2_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_fft_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_fft_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_fftn_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_hfft2_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ifft_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ifftn_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ifftn_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ihfft2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ihfft_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_irfft2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_irfft_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_irfftn_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_rfft_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_rfftn_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fill_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fill_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_flatten_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_flip_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fliplr_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_flipud_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_float_power_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_floor_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_floor_divide_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fmax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_frac_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_frexp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_full_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_full_like_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_full_like_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_gather_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_gather_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ge_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_geometric_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_geqrf_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_geqrf_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_grid_sampler_3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_half_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_hash_tensor_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_heaviside_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_hstack_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_hstack_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_hypot_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_igamma_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_add_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_add_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_put_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_put_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_reduce_amax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_reduce_amin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_reduce_prod_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_select_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_select_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_inner_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isclose_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isfinite_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isinf_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isneginf_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isposinf_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isreal_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isreal_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_istft_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_item_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_item_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_jiterator_2inputs_2outputs_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_jiterator_binary_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_jiterator_unary_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_kron_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_kron_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ldexp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_le_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_lerp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_cholesky_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_cholesky_ex_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_cholesky_ex_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_cond_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_cross_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_cross_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_diagonal_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_diagonal_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_eig_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_eigvals_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_eigvalsh_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_eigvalsh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_householder_product_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_inv_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_ldl_factor_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_ldl_factor_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_ldl_solve_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lstsq_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lstsq_grad_oriented_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_factor_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_factor_ex_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_solve_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_solve_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_matrix_norm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_matrix_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_matrix_power_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_matrix_rank_hermitian_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_multi_dot_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_multi_dot_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_pinv_hermitian_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_pinv_singular_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_pinv_singular_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_qr_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_solve_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_solve_ex_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_solve_triangular_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_solve_triangular_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_svd_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_svdvals_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_tensorinv_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_tensorinv_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_vander_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_vander_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_vecdot_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linspace_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linspace_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linspace_tensor_overload_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linspace_tensor_overload_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_log1p_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_log_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_log_softmax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logaddexp2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logaddexp_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logcumsumexp_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logdet_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logical_and_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logical_not_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logical_or_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logical_xor_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logical_xor_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logit_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logspace_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logspace_tensor_overload_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logsumexp_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_long_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_long_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_lu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_lu_solve_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_lu_solve_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_lu_unpack_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mH_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mH_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mT_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mT_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_amax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_argmin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_cumsum_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_cumsum_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_fill_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_fill_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_log_softmax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_logaddexp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_logsumexp_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_mean_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_mean_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_prod_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_prod_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_select_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_select_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_softmax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_softmin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_std_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_std_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_sum_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_var_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_matmul_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_matmul_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_matrix_exp_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_matrix_exp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_max_binary_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_maximum_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_median_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_meshgrid_list_of_tensors_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_meshgrid_variadic_tensors_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_min_reduction_no_dim_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_min_reduction_with_dim_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mode_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_multinomial_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nanmean_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nanmedian_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nansum_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_narrow_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_native_batch_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_native_dropout_backward_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_new_empty_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_new_ones_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_new_ones_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_new_zeros_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_new_zeros_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nextafter_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_avg_pool2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_batch_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_bilinear_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_celu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_channel_shuffle_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv1d_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv1d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv_transpose1d_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv_transpose1d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv_transpose2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv_transpose3d_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_ctc_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_dropout2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_dropout3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_embedding_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_gelu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_glu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_grid_sample_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_hardshrink_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_interpolate_area_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_interpolate_linear_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_interpolate_nearest_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_kl_div_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_layer_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_leaky_relu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_linear_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_logsigmoid_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_max_pool1d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_max_pool2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_max_unpool3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_mish_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_mse_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_multi_margin_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_nll_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pad_circular_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pad_constant_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pad_replicate_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pairwise_distance_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pairwise_distance_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_relu6_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_relu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_rms_norm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_rms_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_silu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_soft_margin_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_softsign_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_tanhshrink_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_unfold_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_upsample_bilinear_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nonzero_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nonzero_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nonzero_static_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_norm_fro_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_norm_nuc_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_norm_nuc_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_normal_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_normal_in_place_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ones_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ones_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ones_like_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ormqr_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_outer_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_outer_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_pca_lowrank_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_pca_lowrank_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_permute_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_permute_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_pinverse_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_polygamma_polygamma_n_4_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_positive_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_positive_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_pow_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_prod_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_qr_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_quantile_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_randint_like_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ravel_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ravel_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_real_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_reciprocal_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_remainder_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_renorm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_renorm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_repeat_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_repeat_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_repeat_interleave_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_repeat_interleave_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_reshape_as_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_reshape_as_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_reshape_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_resize_as__cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_resolve_conj_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_resolve_neg_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_resolve_neg_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_roll_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_rot90_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_rot90_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_round_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_round_decimals_neg_3_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_rsqrt_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scalar_tensor_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scatter_add_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scatter_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scatter_reduce_amax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scatter_reduce_amin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scatter_reduce_mean_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_select_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_select_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sgn_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sgn_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_short_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sigmoid_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_signal_windows_exponential_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_signal_windows_general_cosine_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_signal_windows_general_hamming_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_signal_windows_hamming_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sin_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sinc_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sinh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_slice_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_slice_scatter_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_softmax_with_dtype_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sparse_sampled_addmm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sparse_sampled_addmm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_bessel_j1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_bessel_y1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_chebyshev_polynomial_t_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_chebyshev_polynomial_v_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_chebyshev_polynomial_w_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_hermite_polynomial_he_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_i1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_laguerre_polynomial_l_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_log_ndtr_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_modified_bessel_i0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_ndtri_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_spherical_bessel_j0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_zeta_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_list_args_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_list_args_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_with_sizes_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_with_sizes_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_with_sizes_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sqrt_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_square_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_stack_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_stack_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_std_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_std_mean_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_std_unbiased_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_std_unbiased_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sub_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sub_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sum_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sum_to_size_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_svd_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_svd_lowrank_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_svd_lowrank_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_t_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_take_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_take_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_tanh_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_tanh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_tensor_split_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_topk_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_trace_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_trace_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_transpose_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_transpose_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_trapezoid_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_trapz_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_triangular_solve_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_tril_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_triu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_true_divide_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_trunc_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unbind_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unbind_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unbind_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unflatten_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_uniform_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_uniform_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unique_consecutive_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unique_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unsafe_chunk_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unsafe_chunk_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unsqueeze_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unsqueeze_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_var_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_var_unbiased_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_as_complex_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_as_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_as_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_as_real_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_vsplit_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_where_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_zeros_cuda_complex64
2025-12-04T14:08:06.1762185Z 
2025-12-04T14:08:06.1762485Z Finished test_ops_jit 2/2 ... [2025-12-04 14:08:06.126153][17714.509047369], took 11.82min
2025-12-04T14:08:06.1763566Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops_jit/test_ops_jit-2f4faab6a29e642c.xml
2025-12-04T14:08:06.2472064Z Running test_nestedtensor 3/4 ... [2025-12-04 14:08:06.246889][17714.629784328]
2025-12-04T14:08:06.2472605Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:08:06.2475929Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_nestedtensor.py', '--shard-id=3', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:08:06.247332]
2025-12-04T14:16:12.1978081Z 
2025-12-04T14:16:12.1979045Z test_nestedtensor 3/4 was successful, full logs can be found in artifacts with path test/test-reports/test_nestedtensor_3.4_8e55fc0245a5aec0_.log
2025-12-04T14:16:12.2207909Z Running 397 items in this shard: test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_2_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_2_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_4_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_4_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_is_contiguous, test/test_nestedtensor.py::TestNestedTensor::test_like_functions_zeros_like, test/test_nestedtensor.py::TestNestedTensor::test_unbind_3, test/test_nestedtensor.py::TestNestedInt::test_comparisons, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_clone_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_contiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_jagged_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_jagged_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_strided_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_embedding_jagged_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_empty_like_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_uint8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amin_dtypes_cuda_uint8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_int16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_int32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_int32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_uint8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_layer_norm_breaking_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_linear_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_masked_fill_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_masked_fill_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_narrow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_narrow_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_masked_select_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_transpose_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_transpose_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_transpose_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_chunk_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_256_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_8_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_mul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_reshape_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_scaled_dot_product_attention_input_dim_4_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_softmax_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim2_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim2_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim4_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_zero_numel_errors_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_cos_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_isposinf_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_relu_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_sgn_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_sqrt_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_tanh_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unbind_noncontiguous_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_inference_mode_interaction_cuda_float32, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_backward_sub_strided_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_dropout_backward_strided_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_indexing_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_2_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_32_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_edge_case_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_2_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_generates_leaf_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_linear_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_reshape_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_set_requires_grad_from_list_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_set_requires_grad_from_mask_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_to_buffer_series_ops_grad_with_broadcast_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_autograd_function_with_None_grad_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_binary_pointwise_with_nested_int_second_arg_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_compile_with_propagated_dynamic_max_seq_len_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_composite_op_in_inference_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_construction_from_list_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_copy__cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_2d_input_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_2d_input_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_operate_on_batch_dim_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_operate_on_batch_dim_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_reduce_ragged_idx_1_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_with_lengths_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_linear_nt_dim_3_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_linear_nt_dim_4_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_narrow_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_nested_tensor_from_jagged_pass_min_max_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_profiler_sequence_nr_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_record_stream_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_reshape_decomp_requires_grad_True_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_autocast_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_compile_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_constant_sequence_length_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_packed_in_proj_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_noncontig_transposed_weights_only_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_1_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_1_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_2_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_True_components_require_grad_False_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_True_components_require_grad_False_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_True_components_require_grad_True_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_specialize_dynamic_shape_recompile_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_squeeze_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_False_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_ragged_idx_1_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_view_ragged_idx_not_one_cuda, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rmod___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rpow___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_abs_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_bmm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_complex_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_cosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_digamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_div_no_rounding_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_div_trunc_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_double_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_erfc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_erfinv_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_exp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_ldexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_log2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_log_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_logaddexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_min_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_embedding_bag_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_hardtanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_silu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_threshold_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_4_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_remainder_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_round_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_round_decimals_neg_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_log_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_ndtri_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_square_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_std_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_tanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_to_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rmul___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_abs_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_atanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_chalf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_clamp_min_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_clone_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_complex_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_copysign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_div_trunc_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_float_power_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_fmod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_frac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_frexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_half_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_index_put_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_lgamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_log2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_max_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_maximum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_minimum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_embedding_bag_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_mish_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_rrelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_silu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_softplus_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_tanhshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_threshold_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polar_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_positive_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_round_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_round_decimals_neg_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_erfcx_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_log_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_ndtri_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_to_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_unflatten_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_xlogy_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___radd___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_asin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_asinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_clamp_max_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_copysign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_cos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_digamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_div_floor_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_erfinv_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_float_power_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_floor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_floor_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_half_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_igammac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isinf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isneginf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isreal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_celu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_embedding_bag_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_hardtanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_rrelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_tanhshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_reciprocal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_round_decimals_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_rsqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sgn_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_short_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_signbit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_bessel_j1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_bessel_y1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_chebyshev_polynomial_t_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_hermite_polynomial_he_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_legendre_polynomial_p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_scaled_modified_bessel_k1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_trunc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_unflatten_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_var_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_xlogy_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rpow___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_acos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_add_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_argmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_bfloat16_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_chalf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_clone_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_complex_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_conj_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_copysign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_count_nonzero_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_div_no_rounding_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_div_trunc_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_exp2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_frac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_hash_tensor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isclose_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isreal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_jiterator_binary_return_by_ref_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_linalg_vector_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logical_and_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logical_not_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logical_xor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_long_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_lt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_argmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_maximum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_min_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_min_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nansum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_neg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nextafter_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_embedding_bag_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_hardtanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_logsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_prelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_relu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_softsign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_tanhshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_bessel_j0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_chebyshev_polynomial_v_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_entr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_erfcx_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_i0e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_i1e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_laguerre_polynomial_l_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_legendre_polynomial_p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_log_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_modified_bessel_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_squeeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_std_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_xlogy_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_nested_tensor_input_mutation_backward_cuda
2025-12-04T14:16:12.2433134Z 
2025-12-04T14:16:12.2433474Z Finished test_nestedtensor 3/4 ... [2025-12-04 14:16:12.198371][18200.581265167], took 8.10min
2025-12-04T14:16:12.2434955Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-8372b6917771ca4c.xml
2025-12-04T14:16:12.3104307Z Running test_ops 2/11 ... [2025-12-04 14:16:12.310099][18200.692992794]
2025-12-04T14:16:12.3104829Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:16:12.3108038Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops.py', '--shard-id=2', '--num-shards=11', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:16:12.310549]
2025-12-04T14:37:58.1952279Z 
2025-12-04T14:37:58.1953173Z test_ops 2/11 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_2.11_06c992f175cc3a27_.log
2025-12-04T14:37:58.3227785Z Running 3122 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu_H_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rmod___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_cauchy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_dist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_geometric_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_permute_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_resolve_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_abs_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_as_strided_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_asin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_char_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_dstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_hstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_masked_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv_transpose3d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_prod_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_rsqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sub_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unsafe_chunk_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_view_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_view_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_dtypes_T_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___getitem___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rmul___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___ror___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__batch_norm_with_update_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_char_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_alias_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_allclose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_arange_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_as_strided_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_dot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_equal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_erf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isfinite_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_lgamma_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log1p_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log_normal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_meshgrid_variadic_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_relu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_tanhshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i0e_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_multigammaln_mvlgamma_p_1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_split_with_sizes_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_square_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sub_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_triu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_trunc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addcmul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addmm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_asin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bincount_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cdist_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_char_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cholesky_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_count_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_erfc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_erfinv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_flip_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_half_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_hash_tensor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_heaviside_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_reduce_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_inner_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isneginf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_jiterator_binary_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_ldl_factor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logical_not_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_var_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_median_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_native_dropout_backward_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_neg_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_alpha_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_instance_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool2d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_mish_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_normalize_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_rrelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_norm_inf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ormqr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_permute_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_resolve_neg_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_round_decimals_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_select_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_laguerre_polynomial_l_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_i1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_scaled_modified_bessel_k1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_square_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_torch__scaled_mm_v2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unfold_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_uniform_cuda, test/test_ops.py::TestCommonCUDA::test_errors___rand___cuda, test/test_ops.py::TestCommonCUDA::test_errors___rdiv___cuda, test/test_ops.py::TestCommonCUDA::test_errors_aminmax_cuda, test/test_ops.py::TestCommonCUDA::test_errors_as_strided_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_errors_complex_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_errors_gradient_cuda, test/test_ops.py::TestCommonCUDA::test_errors_kthvalue_cuda, test/test_ops.py::TestCommonCUDA::test_errors_median_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_multi_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_t_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_shifted_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_errors_triu_cuda, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_ldexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_norm_inf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_round_decimals_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rdiv___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices__chunk_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argwhere_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_block_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_corrcoef_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cummin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gather_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_mean_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_unary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vander_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_std_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_no_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_list_of_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_circular_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_replicate_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_permute_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resize_as__cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_slice_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_u_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tile_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_to_sparse_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unique_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unsafe_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___radd___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_addr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_all_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_angle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diag_embed_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_digamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fmax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isclose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isfinite_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_2inputs_2outputs_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_max_binary_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_msort_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nan_to_num_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_permute_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_3_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_chebyshev_polynomial_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_hermite_polynomial_h_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_legendre_polynomial_p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_squeeze_multiple_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sum_to_size_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_t_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tril_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_true_divide_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unique_consecutive_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zeros_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__unsafe_masked_index_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bfloat16_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_combinations_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_einsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_histc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isfinite_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_multi_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorinv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vander_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logdet_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_var_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_with_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nansum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_constant_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_threshold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ormqr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_4_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_mean_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sparse_sampled_addmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_airy_ai_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i0e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zero__cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_numpy_ref_argwhere_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diff_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_2inputs_2outputs_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_gelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_roll_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_blackman_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_general_cosine_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_general_hamming_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_squeeze_multiple_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_out_H_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out__refs_item_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_cauchy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_grid_sampler_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ldexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_silu_complex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_pca_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_acos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cummin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_index_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_cholesky_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_eig_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_masked_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_transpose_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_transpose_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_searchsorted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_svd_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_warning___rxor___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_broadcast_to_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_dot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_expm1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fftshift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifftshift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_svd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linspace_tensor_overload_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_mish_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_pow_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_randn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_entr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_log_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_logit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_t_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_to_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_trunc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atleast_1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_block_diag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bool_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_byte_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cauchy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_char_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cholesky_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_dist_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_empty_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_expand_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_expand_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ifftshift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_geometric_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_gradient_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_histc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_igamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_reduce_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isreal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_istft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lgamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_ldl_factor_ex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logaddexp2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_not_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_softmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_max_binary_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_meshgrid_variadic_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_min_binary_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nansum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_celu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_dropout3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_embedding_bag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_embedding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_scaled_dot_product_attention_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softplus_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_norm_fro_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_normal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_prod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_resolve_conj_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_select_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_hann_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signbit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sinh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_slice_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sparse_sampled_addmm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_i0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sqrt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_to_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_torch__scaled_mm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_torch_ops_aten__safe_softmax_default_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tril_indices_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_triu_indices_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_view_cuda, test/test_ops.py::TestCommonCUDA::test_out_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_div_no_rounding_mode_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfinv_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfinv_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_expm1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_i0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_i0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_ldexp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_lgamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log1p_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_logit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_masked_var_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rad2deg_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_v_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_he_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_legendre_polynomial_p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_xlogy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exponential_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float8_e5m2fnuz, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_geometric_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_igamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_normal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_normal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_normal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_split_with_sizes_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_split_with_sizes_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_split_with_sizes_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_T_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_add_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_ge_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_normal__in_place_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_shapes_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cauchy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumprod_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dot_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_geometric_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_item_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_tensor_overload_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cauchy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exponential_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vecdot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_normal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal__in_place_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_select_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_complex_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_geometric_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_geometric_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_tensor_overload_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_smooth_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal_number_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_select_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_split_with_sizes_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vdot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vdot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float64, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_H_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___getitem___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rsub___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diff_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dist_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_double_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_unary_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ldexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eig_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_xor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_unpack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mT_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_logsumexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_native_dropout_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nonzero_static_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ormqr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ormqr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_outer_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rand_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize_as__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rot90_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_with_sizes_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapezoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsafe_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_acosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_angle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_bernoulli_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_bmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_broadcast_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_digamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_dist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_div_floor_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_expand_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_minimum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_native_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_celu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_glu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_rot90_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_squeeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_trapz_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_T_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_acos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_dsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_empty_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_expand_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_expand_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_ifftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_frac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_hash_tensor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_jiterator_unary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_kron_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_kthvalue_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_slogdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_svdvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_log_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_lu_unpack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nanmean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_norm_nuc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_permute_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_polar_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_randint_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_select_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signbit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_bessel_y0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_bessel_y1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_erfcx_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_ndtri_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_svd_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_tensordot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_true_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_view_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___getitem___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rsub___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addcdiv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_aminmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cummin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diagonal_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_empty_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_eye_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_hsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_det_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lstsq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_vander_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log10_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_empty_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_ones_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randint_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resolve_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rsqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_select_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_topk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_transpose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unflatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_H_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___rmatmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___rmod___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addmv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_argwhere_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cummin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_exp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_householder_product_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_inv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_slogdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log_normal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logcumsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mT_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_argmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_matrix_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_native_dropout_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_new_zeros_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_quantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_rand_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_real_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_reshape_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_select_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_slice_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_entr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_zeta_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_take_along_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tensor_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_transpose_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tril_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unique_consecutive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_vstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_zero__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_zeros_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay___radd___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_aminmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_argsort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_diagflat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_div_floor_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_einsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_ifft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_ifft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fliplr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_flipud_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_hypot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_index_reduce_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_isneginf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_le_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_cond_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_lstsq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logical_and_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mT_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_maximum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_msort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nan_to_num_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_narrow_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_outer_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_polar_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_ravel_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_renorm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_resize_as__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_scatter_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_slice_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_entr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_std_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_std_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_stft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unsqueeze_copy_cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view___radd___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__chunk_cat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_bfloat16_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_bool_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_acos_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_addr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_broadcast_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diag_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_empty_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_eq_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_eye_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_add_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_item_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linspace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log10_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_and_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_empty_strided_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_split_with_sizes_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_squeeze_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sub_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_true_divide_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unbind_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unsqueeze_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_vstack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_as_strided_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_as_strided_scatter_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_broadcast_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cholesky_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_count_nonzero_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_empty_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_hfftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_index_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_int_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_binary_return_by_ref_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_lerp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_qr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logcumsumexp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_and_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_or_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_prod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_sum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_matrix_exp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_channel_shuffle_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_put_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_resolve_neg_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_select_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sinc_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_square_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_take_along_dim_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_take_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tile_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unfold_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_vsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_byte_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_cdouble_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_chalf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_abs_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_asinh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cumsum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_empty_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_empty_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isinf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_narrow_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_randn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_stack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_stft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_triu_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atleast_3d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_byte_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_combinations_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_conj_physical_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cumulative_trapezoid_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dist_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_empty_permuted_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_eye_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_geqrf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_hsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isreal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cond_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_ldl_factor_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logcumsumexp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_sum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_silu_complex_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_positive_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_pow_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_randn_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_repeat_interleave_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rsqrt_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_select_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_short_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_t_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unsafe_chunk_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_where_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rpow___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_chalf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_long_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_all_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_as_strided_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_asin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_block_diag_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_broadcast_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ceil_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_conj_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_erfinv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ge_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_hsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_igammac_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_and_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ne_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_empty_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_normal_number_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_round_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sinh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tril_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atan_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_combinations_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diag_embed_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_div_trunc_rounding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_equal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_flip_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_geqrf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_gt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_hypot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_igamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_reduce_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_binary_return_by_ref_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eig_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_rank_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_pinv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_vecdot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logaddexp2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_not_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_xor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nanmean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ne_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_ones_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_batch_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_binary_cross_entropy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv_transpose1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_grid_sample_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_nearest_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool3d_grad_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softsign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_normal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_normal_number_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_rand_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_resolve_conj_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_round_decimals_0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_sum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sinc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_j1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_t_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_i0e_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_i1e_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_stack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_std_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_stft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_trapz_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_view_as_complex_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake___radd___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_argsort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___radd___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_acos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_aminmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_broadcast_shapes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clamp_min_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_div_trunc_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hash_tensor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eigh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_argmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_silu_complex_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_norm_nuc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_real_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_bessel_y1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_xlog1py_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_zeta_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_torch__scaled_mm_cuda_float8_e4m3fn, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_transpose_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trunc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_or_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_block_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cov_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_fftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_frexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_matmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_msort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polar_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_i0e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_squeeze_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unbind_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unsafe_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_native_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sinc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_squeeze_multiple_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_trace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_vstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_erfc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_unary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lcm_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_le_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_solve_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_matmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nan_to_num_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_narrow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_silu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_normal_number_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_quantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_randint_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_randn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_exponential_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_triu_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_abs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_all_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_angle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_as_strided_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bfloat16_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_left_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cummax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ihfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flipud_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_frexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_geqrf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_histc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_igammac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_imag_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isnan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_vecdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_and_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_matmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_narrow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_native_dropout_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_fro_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ones_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rot90_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_blackman_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_squeeze_multiple_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_t_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tensor_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trapz_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unravel_index_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unsafe_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_tensor_overload_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_tensor_overload_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_int16, test/test_ops.py::TestTagsCUDA::test_tags___rdiv___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rxor___cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_as_strided_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_shapes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_bucketize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_diagonal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_empty_strided_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_equal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_expand_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_expm1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_fft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ihfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_floor_divide_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_gcd_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_i0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_lcm_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_logical_and_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_masked_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nan_to_num_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ne_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_threshold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ones_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_pow_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_reciprocal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_ndtri_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_square_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_std_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_tril_indices_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_triu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_vdot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_view_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addcmul_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addmv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_argwhere_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_block_diag_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cholesky_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_column_stack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_contiguous_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cross_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diag_embed_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_dot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_expand_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_expand_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_fftshift_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_hfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_flatten_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_grid_sampler_2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_reduce_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_istft_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags_jiterator_unary_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lcm_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_linalg_eig_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_pinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logdet_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logical_and_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lu_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nan_to_num_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nanmedian_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_narrow_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_native_dropout_backward_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ne_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nextafter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_prelu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nonzero_static_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_normal_number_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_randint_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_resolve_conj_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_zeta_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_split_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_split_with_sizes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_std_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_trapezoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_triangular_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unbind_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unique_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_var_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_view_cuda_float32
2025-12-04T14:37:58.4472281Z 
2025-12-04T14:37:58.4472619Z Finished test_ops 2/11 ... [2025-12-04 14:37:58.199705][19506.582596802], took 21.76min
2025-12-04T14:37:58.4473659Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops-d95bfbe57b5d2d89.xml
2025-12-04T14:37:59.7542973Z Uploading artifacts took 1.37 seconds
2025-12-04T14:37:59.7546682Z Running test_ops 7/11 ... [2025-12-04 14:37:59.754483][19508.137376493]
2025-12-04T14:37:59.7547186Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:37:59.7551823Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops.py', '--shard-id=7', '--num-shards=11', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:37:59.754953]
2025-12-04T14:57:29.2582061Z 
2025-12-04T14:57:29.2583281Z test_ops 7/11 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_7.11_97114ebb7b0ad963_.log
2025-12-04T14:57:29.3860347Z Running 3090 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu___ror___cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_lu_unpack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_T_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_alias_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_conj_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diagonal_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_eq_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_put_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_lerp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_long_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_positive_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_resolve_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sgn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_split_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_trace_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_zeros_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_bool_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_float_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_long_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_addcmul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_conj_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_conj_physical_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_contiguous_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_exp2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_eye_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifftshift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isnan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isneginf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_item_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linspace_tensor_overload_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logspace_tensor_overload_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hardshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ones_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_rad2deg_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_multigammaln_mvlgamma_p_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_t_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_vdot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_view_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__unsafe_masked_index_put_accumulate_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_allclose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_aminmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_as_strided_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bfloat16_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ceil_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_corrcoef_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cov_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cummax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cummin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cumprod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diff_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fill_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_full_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_gather_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_grid_sampler_3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_histogramdd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_imag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_reduce_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_int_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isposinf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_eigvals_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_eigvalsh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_triangular_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_vander_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mT_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_matmul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nansum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_celu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_feature_alpha_dropout_without_train_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_fractional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardtanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_area_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_relu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_scaled_dot_product_attention_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_silu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_soft_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softmin_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softplus_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_triplet_margin_with_distance_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ones_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_pinverse_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_put_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_randn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_round_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_general_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sinc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_y0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_i1e_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_std_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sub_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tile_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_torch_ops_aten__efficient_attention_forward_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unique_consecutive_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_var_mean_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_errors___rsub___cuda, test/test_ops.py::TestCommonCUDA::test_errors_amin_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bucketize_cuda, test/test_ops.py::TestCommonCUDA::test_errors_cat_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diag_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_errors_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_errors_gather_cuda, test/test_ops.py::TestCommonCUDA::test_errors_jiterator_binary_cuda, test/test_ops.py::TestCommonCUDA::test_errors_linalg_lstsq_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_gaussian_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_pow_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_mul_layout4_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_hermite_polynomial_he_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_errors_trace_cuda, test/test_ops.py::TestCommonCUDA::test_errors_view_cuda, test/test_ops.py::TestCommonCUDA::test_errors_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rdiv___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rsub___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__chunk_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__unsafe_masked_index_put_accumulate_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_aminmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_count_nonzero_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cov_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumulative_trapezoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_permuted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_geometric_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lstsq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_silu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_threshold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rand_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_real_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resize__cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resize__cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_squeeze_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_take_along_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unique_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_block_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_count_nonzero_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_deg2rad_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_eq_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_exp2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_expand_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ihfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_rfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_gt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_half_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_hstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_lgamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_reduction_with_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_narrow_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_softsign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ones_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_permute_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_roll_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_short_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_y0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_hermite_polynomial_he_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_scaled_modified_bessel_k1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_squeeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_transpose_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_triu_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unsafe_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unsqueeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_vstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__chunk_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__softmax_backward_data_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__unsafe_masked_index_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_alias_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argwhere_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cartesian_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_inverse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_permuted_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isnan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cross_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_singular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_triangular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vecdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_circular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rand_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_laguerre_polynomial_l_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_lowrank_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_along_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_sparse_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_argwhere_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_cat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diff_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_4inputs_with_extra_args_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose3d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_group_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_rms_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_searchsorted_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tensor_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_out___rxor___cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_integral_dtype__refs_sum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_out_jiterator_unary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randint_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addbmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_alias_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_diff_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_irfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_lu_factor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_pinv_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_lu_unpack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nansum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_qr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_triangular_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_airy_ai_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_t_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unique_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_warning___radd___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__batch_norm_with_update_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_acosh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_addcdiv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atleast_2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_cosh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_empty_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_eye_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_floor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_i0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_index_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_istft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_svdvals_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_log10_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_glu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_pdist_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_pixel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_normal_number_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_erfcx_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_multigammaln_mvlgamma_p_3_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sub_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_tan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_unflatten_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_vdot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__segment_reduce_lengths_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__segment_reduce_offsets_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_alias_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_any_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_argsort_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_chunk_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_eye_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_frexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_half_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_histogram_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isneginf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_kron_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lerp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_cholesky_ex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_det_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lstsq_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_factor_ex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_pinv_hermitian_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_meshgrid_list_of_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_native_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_alpha_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_avg_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_binary_cross_entropy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_leaky_relu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool1d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_relu6_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_4_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_real_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_repeat_interleave_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_select_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_blackman_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_general_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_i1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_split_with_sizes_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tensordot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_to_sparse_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unfold_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unsqueeze_copy_cuda, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_div_no_rounding_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_ldexp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_lgamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_masked_var_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rsqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rsqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sigmoid_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sigmoid_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_t_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_t_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_u_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_v_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_w_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_he_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_laguerre_polynomial_l_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_xlog1py_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_zeta_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_true_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_geometric_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nextafter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pdist_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_normal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_renorm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_select_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vdot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_dot_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_item_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_linalg_cross_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_t_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_trace_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frexp_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frexp_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frexp_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_normal_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_tensor_overload_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_tensor_overload_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_tensor_overload_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal__in_place_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal_number_mean_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_indices_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_indices_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_indices_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumprod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumprod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float8_e5m2, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_geometric_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal__in_place_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rad2deg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_renorm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_polar_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cauchy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_geometric_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_geometric_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_imag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_tensor_overload_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal__in_place_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal__in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_renorm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_all_cuda, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_amin_cuda, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_max_reduction_with_dim_cuda, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_mean_cuda, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argwhere_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_block_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cartesian_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_char_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_combinations_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_corrcoef_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_embed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fliplr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gather_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isfinite_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_item_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_rms_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_silu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softsign_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_qr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_short_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_slice_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_multiple_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_t_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_along_dim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapz_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zero__cuda_complex64, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___getitem___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rmod___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_clamp_min_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_column_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_copysign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_deg2rad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_double_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_hfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_flip_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fliplr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_hstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_lgamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log1p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mH_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nanmean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_native_dropout_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_positive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_ravel_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_remainder_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_select_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_slice_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_to_sparse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_trace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input___rsub___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_argwhere_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cfloat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cummin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_dist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_gradient_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_histc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_hstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_index_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_index_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_index_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logcumsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mH_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mT_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_matrix_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_msort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_new_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_new_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_norm_inf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_ones_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_repeat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_resolve_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_rot90_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_rsqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_to_sparse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_topk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_transpose_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_trapz_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_trunc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_unbind_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_abs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_acos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ceil_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_count_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_dstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_expm1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_rfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_flatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_flip_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_full_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_geqrf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_hypot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lgamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_not_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_long_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_matrix_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_minimum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_dropout_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_glu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nonzero_static_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_permute_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_remainder_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resize__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_decimals_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_slice_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_t_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_vstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__softmax_backward_data_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_abs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atan2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cdouble_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_clamp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diag_embed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_full_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_histc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_hstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_hypot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_igamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_long_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nansum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_narrow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_native_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_randint_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_reshape_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_rot90_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_y0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_split_list_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tensordot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_triangular_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_addmm_decomposed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_addmv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_bernoulli_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_broadcast_shapes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_byte_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_conj_physical_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_contiguous_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_empty_permuted_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_eq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_fftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_float_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_floor_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_ge_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_grid_sampler_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_int_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_isinf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_isnan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_jiterator_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_lerp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_det_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_log1p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logical_not_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logical_xor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_min_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_minimum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_native_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_new_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_new_ones_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nonzero_static_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_pinverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_rad2deg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_randn_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_take_along_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_var_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_view_as_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_vstack_cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view___rmul___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view___rpow___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_cfloat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_abs_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_acosh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_broadcast_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cos_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isfinite_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_cross_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_svd_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log1p_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_masked_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_repeat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_t_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tan_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tril_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_view_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_view_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_where_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_alias_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cumprod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_diag_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_full_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_gather_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eigvals_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_factor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_slogdet_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_not_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logsumexp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mH_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_select_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_movedim_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mul_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_circular_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_unfold_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_randn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_resize_as__cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_scatter_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_squeeze_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_std_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_t_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unsqueeze_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rmatmul___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_float_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_all_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atleast_2d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_block_diag_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_clone_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cumprod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_irfft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_flip_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_istft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_diagonal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_svdvals_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_vecdot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logspace_tensor_overload_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_empty_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_repeat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_special_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_special_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tanh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tril_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_block_diag_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cartesian_prod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_chalf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cholesky_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_contiguous_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cos_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cross_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_empty_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_empty_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_equal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_expand_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_expand_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fftshift_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fliplr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eigvals_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lstsq_grad_oriented_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_pinv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_select_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_matmul_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nanmean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ne_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv1d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_normalize_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_circular_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_replicate_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_permute_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_permute_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_real_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reciprocal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_repeat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reshape_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_slice_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_with_sizes_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_svd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tensor_split_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_trapz_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tril_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_true_divide_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_as_real_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rmatmul___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__chunk_cat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_bfloat16_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_acosh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atan2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_copysign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_exp2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_exponential_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_flip_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fmod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_diagonal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_xor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_zeros_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_dropout_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_threshold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ones_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_permute_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_reshape_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sgn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_square_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sum_to_size_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_t_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_triu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_view_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_alias_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_bfloat16_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cdouble_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_chunk_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_clamp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_conj_physical_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_constant_pad_nd_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cross_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cumprod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_deg2rad_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_einsum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifftshift_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ihfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ihfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_floor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_full_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_half_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_inner_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isfinite_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isnan_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isneginf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_unary_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lerp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_det_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eigvals_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_inv_ex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_solve_triangular_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_and_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_max_pool2d_with_indices_backward_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_max_reduction_no_dim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mul_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_empty_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_empty_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_full_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_channel_shuffle_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_cosine_embedding_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_dropout_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_embedding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_pool2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_prelu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_threshold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_4_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_rad2deg_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scalar_tensor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_add_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sigmoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sparse_sampled_addmm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_i0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_to_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_transpose_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_triangular_solve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_triu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_var_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_vdot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_view_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_zeros_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atleast_1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rsub___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_all_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clone_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_contiguous_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_irfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_irfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fmod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hypot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isreal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_istft_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_xor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nonzero_static_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ormqr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rot90_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scalar_tensor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signbit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_log_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_squeeze_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_stft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_take_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trapz_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_true_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unsqueeze_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bincount_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_bucketize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ceil_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clone_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_div_floor_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_flipud_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mT_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_repeat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_repeat_interleave_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sinc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_i1e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_t_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_einsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_irfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_irfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_float_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_inner_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_multi_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logaddexp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_narrow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_pca_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_reciprocal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rot90_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_i0e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_i1e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_squeeze_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sum_to_size_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cummax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_div_floor_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_expand_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isnan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isreal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eigvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_slogdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mH_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_matrix_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nanmean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_ones_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_gelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softplus_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_pow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rad2deg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_resize__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_round_decimals_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_slice_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_j1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_squeeze_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_var_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bernoulli_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_not_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_empty_permuted_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_exp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fliplr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_hstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_igamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_inner_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isreal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_solve_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logaddexp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_normal_number_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_quantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_stft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_transpose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_true_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_zero__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_zeros_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_arange_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_bool, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_complex32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_int8, test/test_ops.py::TestTagsCUDA::test_tags__batch_norm_with_update_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_short_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_addcmul_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_any_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atleast_1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atleast_3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_cauchy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_eq_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_irfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_flatten_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_flip_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ge_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isclose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isposinf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_lerp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_neg_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_new_empty_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nextafter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_gelu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_rad2deg_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_repeat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_zeta_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_split_with_sizes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unfold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_vsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_acosh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_aminmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_argsort_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bernoulli_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_xor_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_broadcast_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cov_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cummin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diag_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_dsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_equal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_erfinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_irfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_rfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_float_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_floor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_full_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_gt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_hsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_imag_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags_index_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isnan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_kron_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_det_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_householder_product_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_power_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_svd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_tensorinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log10_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logsumexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_select_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_std_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_new_empty_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pdist_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_silu_complex_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_threshold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rand_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_real_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_repeat_interleave_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_resolve_neg_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_softmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_airy_ai_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_bessel_j1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sum_to_size_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_t_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_ops.py::TestTagsCUDA::test_tags_trace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_triu_indices_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_true_divide_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unfold_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_vdot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_vsplit_cuda_float32, test/test_ops.py::TestForwardADWithScalarsCUDA::test_0d_tensor_with_python_scalar_div_floor_rounding_cuda_float32
2025-12-04T14:57:29.5099231Z 
2025-12-04T14:57:29.5099548Z Finished test_ops 7/11 ... [2025-12-04 14:57:29.262559][20677.645451895], took 19.49min
2025-12-04T14:57:29.5100582Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops-75f8d45594e24741.xml
2025-12-04T14:57:29.5101641Z Running functorch/test_dims 1/1 ... [2025-12-04 14:57:29.479650][20677.862543093]
2025-12-04T14:57:29.5102180Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:57:29.5103412Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_dims.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:57:29.480125]
2025-12-04T14:58:19.8684571Z 
2025-12-04T14:58:19.8685920Z functorch/test_dims 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_dims_1.1_a45bb86ae199f167_.log
2025-12-04T14:58:19.8706397Z Running 68 items in this shard: test/functorch/test_dims.py::TestMin::test_adapt, test/functorch/test_dims.py::TestMin::test_attn, test/functorch/test_dims.py::TestMin::test_attn_cuda, test/functorch/test_dims.py::TestMin::test_big_split, test/functorch/test_dims.py::TestMin::test_compare_dims, test/functorch/test_dims.py::TestMin::test_diag, test/functorch/test_dims.py::TestMin::test_dim_args, test/functorch/test_dims.py::TestMin::test_dims_with_size, test/functorch/test_dims.py::TestMin::test_dir, test/functorch/test_dims.py::TestMin::test_doc, test/functorch/test_dims.py::TestMin::test_embed, test/functorch/test_dims.py::TestMin::test_eq, test/functorch/test_dims.py::TestMin::test_expand, test/functorch/test_dims.py::TestMin::test_functorch, test/functorch/test_dims.py::TestMin::test_hello, test/functorch/test_dims.py::TestMin::test_index, test/functorch/test_dims.py::TestMin::test_index_placement, test/functorch/test_dims.py::TestMin::test_inplace, test/functorch/test_dims.py::TestMin::test_manual_stuff, test/functorch/test_dims.py::TestMin::test_mask, test/functorch/test_dims.py::TestMin::test_max, test/functorch/test_dims.py::TestMin::test_mm, test/functorch/test_dims.py::TestMin::test_mm_fuse, test/functorch/test_dims.py::TestMin::test_monkey, test/functorch/test_dims.py::TestMin::test_network, test/functorch/test_dims.py::TestMin::test_order, test/functorch/test_dims.py::TestMin::test_order_keyword, test/functorch/test_dims.py::TestMin::test_permute_orig, test/functorch/test_dims.py::TestMin::test_seg, test/functorch/test_dims.py::TestMin::test_simple, test/functorch/test_dims.py::TestMin::test_softmax_split, test/functorch/test_dims.py::TestMin::test_stack, test/functorch/test_dims.py::TestMin::test_time_mm_fuse, test/functorch/test_dims.py::TestMin::test_with_dims_split, test/functorch/test_dims.py::TestMinFunctorchOnly::test_adapt, test/functorch/test_dims.py::TestMinFunctorchOnly::test_attn, test/functorch/test_dims.py::TestMinFunctorchOnly::test_attn_cuda, test/functorch/test_dims.py::TestMinFunctorchOnly::test_big_split, test/functorch/test_dims.py::TestMinFunctorchOnly::test_compare_dims, test/functorch/test_dims.py::TestMinFunctorchOnly::test_diag, test/functorch/test_dims.py::TestMinFunctorchOnly::test_dim_args, test/functorch/test_dims.py::TestMinFunctorchOnly::test_dims_with_size, test/functorch/test_dims.py::TestMinFunctorchOnly::test_dir, test/functorch/test_dims.py::TestMinFunctorchOnly::test_doc, test/functorch/test_dims.py::TestMinFunctorchOnly::test_embed, test/functorch/test_dims.py::TestMinFunctorchOnly::test_eq, test/functorch/test_dims.py::TestMinFunctorchOnly::test_expand, test/functorch/test_dims.py::TestMinFunctorchOnly::test_functorch, test/functorch/test_dims.py::TestMinFunctorchOnly::test_hello, test/functorch/test_dims.py::TestMinFunctorchOnly::test_index, test/functorch/test_dims.py::TestMinFunctorchOnly::test_index_placement, test/functorch/test_dims.py::TestMinFunctorchOnly::test_inplace, test/functorch/test_dims.py::TestMinFunctorchOnly::test_manual_stuff, test/functorch/test_dims.py::TestMinFunctorchOnly::test_mask, test/functorch/test_dims.py::TestMinFunctorchOnly::test_max, test/functorch/test_dims.py::TestMinFunctorchOnly::test_mm, test/functorch/test_dims.py::TestMinFunctorchOnly::test_mm_fuse, test/functorch/test_dims.py::TestMinFunctorchOnly::test_monkey, test/functorch/test_dims.py::TestMinFunctorchOnly::test_network, test/functorch/test_dims.py::TestMinFunctorchOnly::test_order, test/functorch/test_dims.py::TestMinFunctorchOnly::test_order_keyword, test/functorch/test_dims.py::TestMinFunctorchOnly::test_permute_orig, test/functorch/test_dims.py::TestMinFunctorchOnly::test_seg, test/functorch/test_dims.py::TestMinFunctorchOnly::test_simple, test/functorch/test_dims.py::TestMinFunctorchOnly::test_softmax_split, test/functorch/test_dims.py::TestMinFunctorchOnly::test_stack, test/functorch/test_dims.py::TestMinFunctorchOnly::test_time_mm_fuse, test/functorch/test_dims.py::TestMinFunctorchOnly::test_with_dims_split
2025-12-04T14:58:19.8726235Z 
2025-12-04T14:58:19.8726562Z Finished functorch/test_dims 1/1 ... [2025-12-04 14:58:19.868288][20728.251183012], took 0.84min
2025-12-04T14:58:19.8940876Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_dims/functorch.test_dims-e2a9e671430fd99e.xml
2025-12-04T14:58:21.2486504Z Uploading artifacts took 1.27 seconds
2025-12-04T14:58:21.2490604Z Running functorch/test_ops 1/7 ... [2025-12-04 14:58:21.248849][20729.631742543]
2025-12-04T14:58:21.2491448Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:58:21.2495714Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '--shard-id=1', '--num-shards=7', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:58:21.249305]
2025-12-04T15:09:59.9473859Z 
2025-12-04T15:09:59.9474870Z functorch/test_ops 1/7 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_1.7_2b66798f0700c47b_.log
2025-12-04T15:10:00.0170070Z Running 1492 items in this shard: test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_l1_loss_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_softmax_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_bool_raises_argmin_cuda_bool, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_amin_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_clamp_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_ge_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_lt_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_lt_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_sort_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_topk_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_dsplit_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_dsplit_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_unbind_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_vsplit_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_vsplit_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_positive_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_real_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_special_grad_op_vjp_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_unflatten_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_unflatten_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_view_as_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_view_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_MulGenVmapAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyCubeAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SelectGenVmapAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rdiv___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__segment_reduce_offsets_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__unsafe_masked_index_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_abs_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addbmm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_alias_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_any_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atleast_3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bfloat16_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_char_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_min_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_copysign_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cosh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_deg2rad_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_permuted_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exp2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expand_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exponential_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifftshift_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ihfftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_irfft2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flip_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fliplr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_gradient_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_heaviside_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_put_functorch_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_prod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_int_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isfinite_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_kron_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_eigvalsh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_inv_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_factor_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_multi_dot_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_triangular_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_normal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logsumexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_long_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lu_solve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lu_unpack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_fill_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_reduction_with_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_meshgrid_list_of_tensors_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_msort_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanmean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanquantile_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ne_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_neg_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_zeros_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nextafter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_binary_cross_entropy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_with_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_cosine_embedding_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_cosine_similarity_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_embedding_functorch_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_trilinear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_reflect_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pixel_shuffle_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_poisson_nll_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_selu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softshrink_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_nuc_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_number_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ormqr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_quantile_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_decimals_neg_3_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_general_cosine_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sinc_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_airy_ai_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_y1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_hermite_polynomial_h_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_hermite_polynomial_he_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_i1e_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_laguerre_polynomial_l_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_v_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_multiple_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sub_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_svd_lowrank_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_torch_ops_aten__safe_softmax_default_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_triangular_solve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_trunc_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsqueeze_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_mean_unbiased_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_unbiased_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_vdot_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_NumpySortAutogradFunction_cuda_float32
2025-12-04T15:10:00.0849530Z 
2025-12-04T15:10:00.0849901Z Finished functorch/test_ops 1/7 ... [2025-12-04 15:09:59.949482][21428.332373553], took 11.65min
2025-12-04T15:10:00.0851101Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-caabf5583dae6043.xml
2025-12-04T15:10:00.1449865Z Running functorch/test_ops 6/7 ... [2025-12-04 15:10:00.144694][21428.52758652]
2025-12-04T15:10:00.1450415Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:10:00.1454349Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '--shard-id=6', '--num-shards=7', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:10:00.145197]
2025-12-04T15:20:28.9988402Z 
2025-12-04T15:20:28.9989454Z functorch/test_ops 6/7 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_6.7_b2e5f87489ea3e61_.log
2025-12-04T15:20:29.0676878Z Running 1471 items in this shard: test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_nll_loss_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_bool_raises_topk_cuda_bool, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_amax_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_argmin_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_gt_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_le_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_conj_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_contiguous_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_split_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_movedim_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_permute_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_permute_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_positive_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_real_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_special_grad_op_jvp_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_squeeze_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_squeeze_multiple_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_H_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyExpMarkDirtyAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___radd___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmod___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addcdiv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_amax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argmin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_baddbmm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_block_diag_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_broadcast_tensors_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_broadcast_to_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_chalf_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cholesky_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cos_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cummin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumprod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_digamma_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dist_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_strided_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erfc_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifft2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_floor_divide_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ge_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_grid_sampler_2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_half_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_add_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isinf_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isreal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_2inputs_2outputs_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ldexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lerp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lgamma_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_pinv_hermitian_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_slogdet_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linspace_tensor_overload_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log10_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_softmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_softmax_with_dtype_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logcumsumexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_not_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_or_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mT_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_amax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_argmin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_select_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_binary_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_reduction_no_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mode_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_narrow_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_native_batch_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_full_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_alpha_dropout_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_bilinear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_padding_with_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_fractional_max_pool3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardsigmoid_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_bilinear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_nearest_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_local_response_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_upsample_nearest_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ops_aten_index_put_functorch_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polar_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_3_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_4_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_qr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resolve_conj_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sigmoid_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_blackman_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_general_hamming_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_hamming_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_hann_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_v_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_i0e_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_legendre_polynomial_p_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_log_ndtr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_i0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_k0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_scaled_modified_bessel_k1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sqrt_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_take_along_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tanh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tensor_split_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_triu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unflatten_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsafe_chunk_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_where_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zeros_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvmapjvp_linalg_solve_cuda
﻿2025-12-04T15:20:29.1353537Z 
2025-12-04T15:20:29.1354235Z Finished functorch/test_ops 6/7 ... [2025-12-04 15:20:29.000777][22057.383669017], took 10.48min
2025-12-04T15:20:29.1355429Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-b6190fae5240f1fb.xml
2025-12-04T15:20:30.3031787Z Uploading artifacts took 1.16 seconds
2025-12-04T15:20:30.3036054Z Running inductor/test_select_algorithm 1/1 ... [2025-12-04 15:20:30.303413][22058.686307177]
2025-12-04T15:20:30.3036657Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:20:30.3040943Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_select_algorithm.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:20:30.303861]
2025-12-04T15:20:40.7214234Z 
2025-12-04T15:20:40.7215392Z inductor/test_select_algorithm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_select_algorithm_1.1_7db4d246e17eb863_.log
2025-12-04T15:20:40.7216393Z 
2025-12-04T15:20:40.7216801Z Finished inductor/test_select_algorithm 1/1 ... [2025-12-04 15:20:40.721200][22069.104096027], took 0.17min
2025-12-04T15:20:40.7479381Z Running inductor/test_cpu_repro 1/3 ... [2025-12-04 15:20:40.747662][22069.130557131]
2025-12-04T15:20:40.7479965Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:20:40.7483256Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cpu_repro.py', '--shard-id=1', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:20:40.748096]
2025-12-04T15:35:02.5880011Z 
2025-12-04T15:35:02.5881534Z inductor/test_cpu_repro 1/3 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cpu_repro_1.3_45e7fcc9d89e84f9_.log
2025-12-04T15:35:02.6107579Z Running 233 items in this shard: test/inductor/test_cpu_repro.py::CPUReproTests::test_add_layernorm, test/inductor/test_cpu_repro.py::CPUReproTests::test_aten_normal_dtype, test/inductor/test_cpu_repro.py::CPUReproTests::test_auto_zvec_vsx_simd, test/inductor/test_cpu_repro.py::CPUReproTests::test_avx2_bool_constant_pad_nd, test/inductor/test_cpu_repro.py::CPUReproTests::test_bf16_zeros, test/inductor/test_cpu_repro.py::CPUReproTests::test_bitwise_logical_op_bool, test/inductor/test_cpu_repro.py::CPUReproTests::test_bitwise_shift_corner_inputs, test/inductor/test_cpu_repro.py::CPUReproTests::test_channels_last_view_as_complex, test/inductor/test_cpu_repro.py::CPUReproTests::test_consistent_remove_buffers, test/inductor/test_cpu_repro.py::CPUReproTests::test_constant_bool_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_conv1d_strided_weight_torch_compile, test/inductor/test_cpu_repro.py::CPUReproTests::test_conv2d_autocast, test/inductor/test_cpu_repro.py::CPUReproTests::test_conv_in_channel_1_dynamic_shapes, test/inductor/test_cpu_repro.py::CPUReproTests::test_convert_fp32_int64_oob_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_convert_fp32_to_int64_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_convert_int32_to_int64_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_decomposed_fake_quant_per_channel, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_quant_lowering_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_quant_lowering_uint8, test/inductor/test_cpu_repro.py::CPUReproTests::test_double_pointwise_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_for_loop_collapsed, test/inductor/test_cpu_repro.py::CPUReproTests::test_full_bits_lowp, test/inductor/test_cpu_repro.py::CPUReproTests::test_fused_node, test/inductor/test_cpu_repro.py::CPUReproTests::test_group_norm_backward_symint_divisible_channels, test/inductor/test_cpu_repro.py::CPUReproTests::test_index_add, test/inductor/test_cpu_repro.py::CPUReproTests::test_index_propagation_issue_102065, test/inductor/test_cpu_repro.py::CPUReproTests::test_inplace_add_alpha, test/inductor/test_cpu_repro.py::CPUReproTests::test_int32_pointwise_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_int64_reduction_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_issue_148058, test/inductor/test_cpu_repro.py::CPUReproTests::test_linear_with_reshape, test/inductor/test_cpu_repro.py::CPUReproTests::test_load_half, test/inductor/test_cpu_repro.py::CPUReproTests::test_load_inf_bf16, test/inductor/test_cpu_repro.py::CPUReproTests::test_load_same_bool_tensor_twice, test/inductor/test_cpu_repro.py::CPUReproTests::test_low_fp_index_expr_issue_147279, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_masked_fill_softmax, test/inductor/test_cpu_repro.py::CPUReproTests::test_masked_fill_with_inf_or_nan_value, test/inductor/test_cpu_repro.py::CPUReproTests::test_masked_load_int64_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_memory_copy_with_fusion, test/inductor/test_cpu_repro.py::CPUReproTests::test_mkl_linear, test/inductor/test_cpu_repro.py::CPUReproTests::test_nn_param_assign_wrapped, test/inductor/test_cpu_repro.py::CPUReproTests::test_no_redundant_to_dtypes_between_fused_scheduler_node, test/inductor/test_cpu_repro.py::CPUReproTests::test_non_contiguous_index_with_constant_stride, test/inductor/test_cpu_repro.py::CPUReproTests::test_non_contiguous_load_buf_quant_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_outer_loop_fusion, test/inductor/test_cpu_repro.py::CPUReproTests::test_outer_loop_fusion_buffer_remove, test/inductor/test_cpu_repro.py::CPUReproTests::test_pad_with_nan_value, test/inductor/test_cpu_repro.py::CPUReproTests::test_per_channel_fake_quant_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_per_channel_fake_quant_uint8, test/inductor/test_cpu_repro.py::CPUReproTests::test_per_tensor_fake_quant_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_randint_symint_input, test/inductor/test_cpu_repro.py::CPUReproTests::test_reduction_with_dynamic_threads, test/inductor/test_cpu_repro.py::CPUReproTests::test_relu_with_inf_value, test/inductor/test_cpu_repro.py::CPUReproTests::test_scalar_sign_with_min, test/inductor/test_cpu_repro.py::CPUReproTests::test_scatter_using_atomic_add, test/inductor/test_cpu_repro.py::CPUReproTests::test_sign_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_slice_scatter_default_end_value, test/inductor/test_cpu_repro.py::CPUReproTests::test_tile2d_store_channel_shuffle_cl_quant_output_uint8, test/inductor/test_cpu_repro.py::CPUReproTests::test_timed_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_to_channels_last_fp8, test/inductor/test_cpu_repro.py::CPUReproTests::test_to_channels_last_lowp_fp, test/inductor/test_cpu_repro.py::CPUReproTests::test_to_dtype_float_bool, test/inductor/test_cpu_repro.py::CPUReproTests::test_to_uint8_rounding_method, test/inductor/test_cpu_repro.py::CPUReproTests::test_transpose_mxn_16_16_bf16_fp16, test/inductor/test_cpu_repro.py::CPUReproTests::test_transpose_non_contiguous, test/inductor/test_cpu_repro.py::CPUReproTests::test_transpose_vertical_sum_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_two_local_buffers_in_outer_loop_fusion, test/inductor/test_cpu_repro.py::CPUReproTests::test_uint64_pointwise_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_unrolled_bool_prod_vectorized, test/inductor/test_cpu_repro.py::CPUReproTests::test_unsupported_conv_transpose, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_compare_op_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_logical, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_randn, test/inductor/test_cpu_repro.py::CPUReproTests::test_view_dtype
2025-12-04T15:35:02.6287030Z 
2025-12-04T15:35:02.6287419Z Finished inductor/test_cpu_repro 1/3 ... [2025-12-04 15:35:02.588412][22930.971297542], took 14.36min
2025-12-04T15:35:02.6288692Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cpu_repro/inductor.test_cpu_repro-e45fcbaf6c1a2b2c.xml
2025-12-04T15:35:02.7181129Z Running inductor/test_custom_lowering 1/1 ... [2025-12-04 15:35:02.717691][22931.100584221]
2025-12-04T15:35:02.7181791Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:35:02.7184702Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_custom_lowering.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:35:02.718182]
2025-12-04T15:35:26.3188745Z 
2025-12-04T15:35:26.3189867Z inductor/test_custom_lowering 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_custom_lowering_1.1_b51e0c13dc286ed6_.log
2025-12-04T15:35:26.3193427Z Running 6 items in this shard: test/inductor/test_custom_lowering.py::TestCustomLowering::test_constant_creation, test/inductor/test_custom_lowering.py::TestCustomLowering::test_jagged_to_padded_dense_sanity_cuda, test/inductor/test_custom_lowering.py::TestCustomLowering::test_jagged_to_padded_dense_zero_size, test/inductor/test_custom_lowering.py::TestCustomLowering::test_multi_inp_asm, test/inductor/test_custom_lowering.py::TestCustomLowering::test_register_lowering_custom_dict, test/inductor/test_custom_lowering.py::TestCustomLowering::test_tanh_approx
2025-12-04T15:35:26.3196259Z 
2025-12-04T15:35:26.3196640Z Finished inductor/test_custom_lowering 1/1 ... [2025-12-04 15:35:26.318667][22954.701563649], took 0.39min
2025-12-04T15:35:26.3449880Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_custom_lowering/inductor.test_custom_lowering-f90a8c2a1b7dd9b0.xml
2025-12-04T15:35:26.4286308Z Running inductor/test_perf 1/1 ... [2025-12-04 15:35:26.428372][22954.811266278]
2025-12-04T15:35:26.4286840Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:35:26.4290699Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_perf.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:35:26.428845]
2025-12-04T15:36:27.7444889Z 
2025-12-04T15:36:27.7445877Z inductor/test_perf 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_perf_1.1_8b1dd16368b2df6e_.log
2025-12-04T15:36:27.7471341Z Running 66 items in this shard: test/inductor/test_perf.py::NumBytesMetricTests::test_cat, test/inductor/test_perf.py::NumBytesMetricTests::test_cat_pointwise, test/inductor/test_perf.py::NumBytesMetricTests::test_cat_pointwise_config_option, test/inductor/test_perf.py::NumBytesMetricTests::test_cat_pointwise_many_complex_inputs, test/inductor/test_perf.py::NumBytesMetricTests::test_cat_pointwise_many_simple_inputs, test/inductor/test_perf.py::NumBytesMetricTests::test_extern, test/inductor/test_perf.py::NumBytesMetricTests::test_index, test/inductor/test_perf.py::NumBytesMetricTests::test_pointwise, test/inductor/test_perf.py::NumBytesMetricTests::test_reduction, test/inductor/test_perf.py::FusionTests::test_create_block_mask, test/inductor/test_perf.py::FusionTests::test_double_softmax, test/inductor/test_perf.py::FusionTests::test_factory_reduction, test/inductor/test_perf.py::FusionTests::test_horizontal_reduction_outer_pointwise, test/inductor/test_perf.py::FusionTests::test_horizontal_reduction_pointwise, test/inductor/test_perf.py::FusionTests::test_horizontal_reduction_pointwise2, test/inductor/test_perf.py::FusionTests::test_horizontal_reduction_reduction, test/inductor/test_perf.py::FusionTests::test_horizontal_sum_pw_broadcast, test/inductor/test_perf.py::FusionTests::test_index_pointwise, test/inductor/test_perf.py::FusionTests::test_index_reduction, test/inductor/test_perf.py::FusionTests::test_layer_norm, test/inductor/test_perf.py::FusionTests::test_mutation_fusion, test/inductor/test_perf.py::FusionTests::test_neighbor, test/inductor/test_perf.py::FusionTests::test_norm_chain, test/inductor/test_perf.py::FusionTests::test_pointwise_multi_level_reduction, test/inductor/test_perf.py::FusionTests::test_reduction_pointwise_multi_level_reduction, test/inductor/test_perf.py::FusionTests::test_softmax_backward, test/inductor/test_perf.py::FusionTests::test_softmax_inner, test/inductor/test_perf.py::FusionTests::test_vertical_sum_pw, test/inductor/test_perf.py::SchedulerFusionTests::test_fusion_choice1, test/inductor/test_perf.py::SchedulerFusionTests::test_fusion_choice2, test/inductor/test_perf.py::SchedulerFusionTests::test_fusion_choice3, test/inductor/test_perf.py::SchedulerFusionTests::test_fusion_choice4_cpu, test/inductor/test_perf.py::TilingTests::test_tiling_simple, test/inductor/test_perf.py::TilingTests::test_tiling_three, test/inductor/test_perf.py::MinCutPartitioningTests::test_partitioning_cat, test/inductor/test_perf.py::MinCutPartitioningTests::test_partitioning_dtype, test/inductor/test_perf.py::MinCutPartitioningTests::test_partitioning_full_remat, test/inductor/test_perf.py::MinCutPartitioningTests::test_partitioning_keops, test/inductor/test_perf.py::MinCutPartitioningTests::test_partitioning_long_chain_add, test/inductor/test_perf.py::MinCutPartitioningTests::test_partitioning_partial_remat, test/inductor/test_perf.py::MinCutPartitioningTests::test_partitioning_relu, test/inductor/test_perf.py::MinCutPartitioningTests::test_partitioning_unremat_bw, test/inductor/test_perf.py::MinCutPartitioningTests::test_partitioning_unremat_bw2, test/inductor/test_perf.py::MinCutPartitioningTests::test_partitioning_with_view, test/inductor/test_perf.py::NoopTests::test_noop_cat, test/inductor/test_perf.py::NoopTests::test_noop_clones, test/inductor/test_perf.py::NoopTests::test_noop_device_conversion, test/inductor/test_perf.py::NoopTests::test_noop_dtype_conversion, test/inductor/test_perf.py::NoopTests::test_noop_int_ops, test/inductor/test_perf.py::NoopTests::test_noop_slice_scatter, test/inductor/test_perf.py::InplacingTests::test_inplace_custom_op, test/inductor/test_perf.py::InplacingTests::test_inplace_custom_op_intermediate, test/inductor/test_perf.py::InplacingTests::test_inplace_custom_op_training, test/inductor/test_perf.py::InplacingTests::test_inplace_custom_op_training_two_mutated_inputs, test/inductor/test_perf.py::InplacingTests::test_inplace_custom_op_two_mutated_inputs, test/inductor/test_perf.py::InplacingTests::test_inplace_randperm_scatter, test/inductor/test_perf.py::InplacingTests::test_inplace_scatter, test/inductor/test_perf.py::InplacingTests::test_inplace_scatter_noop_view, test/inductor/test_perf.py::InplacingTests::test_inplace_triton_kernel_training, test/inductor/test_perf.py::InplacingTests::test_inplace_triton_kernel_v1, test/inductor/test_perf.py::InplacingTests::test_inplace_triton_kernel_v2, test/inductor/test_perf.py::InplacingTests::test_inplace_triton_kernel_v3, test/inductor/test_perf.py::InplacingTests::test_inplace_triton_kernel_v4, test/inductor/test_perf.py::InplacingTests::test_inplace_triton_kernel_v5, test/inductor/test_perf.py::InplacingTests::test_inplace_triton_kernel_v6, test/inductor/test_perf.py::InplacingTests::test_triton_kernel_not_fusable_with_users
2025-12-04T15:36:27.7495006Z 
2025-12-04T15:36:27.7495331Z Finished inductor/test_perf 1/1 ... [2025-12-04 15:36:27.744639][23016.127530282], took 1.02min
2025-12-04T15:36:27.7713563Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_perf/inductor.test_perf-34de9a09a2935f8d.xml
2025-12-04T15:36:27.8555522Z Running inductor/test_binary_folding 1/1 ... [2025-12-04 15:36:27.855173][23016.23806694]
2025-12-04T15:36:27.8556211Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:36:27.8559427Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_binary_folding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:36:27.855634]
2025-12-04T15:38:20.0471537Z 
2025-12-04T15:38:20.0472841Z inductor/test_binary_folding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_binary_folding_1.1_181cb55db6266036_.log
2025-12-04T15:38:20.0476323Z Running 6 items in this shard: test/inductor/test_binary_folding.py::FreezingCpuTests::test_conv_binary_folding_cpu, test/inductor/test_binary_folding.py::FreezingCpuTests::test_conv_bn_folding_cpu, test/inductor/test_binary_folding.py::FreezingCpuTests::test_linear_binary_folding_cpu, test/inductor/test_binary_folding.py::FreezingGpuTests::test_conv_binary_folding_cuda, test/inductor/test_binary_folding.py::FreezingGpuTests::test_conv_bn_folding_cuda, test/inductor/test_binary_folding.py::FreezingGpuTests::test_linear_binary_folding_cuda
2025-12-04T15:38:20.0479083Z 
2025-12-04T15:38:20.0479460Z Finished inductor/test_binary_folding 1/1 ... [2025-12-04 15:38:20.046900][23128.429796054], took 1.87min
2025-12-04T15:38:20.0736656Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_binary_folding/inductor.test_binary_folding-0c797ad2be676af7.xml
2025-12-04T15:38:20.1619444Z Running inductor/test_mkldnn_pattern_matcher 3/3 ... [2025-12-04 15:38:20.161640][23128.544533143]
2025-12-04T15:38:20.1620118Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:38:20.1623208Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_mkldnn_pattern_matcher.py', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:38:20.162060]
2025-12-04T15:46:32.2305180Z 
2025-12-04T15:46:32.2306362Z inductor/test_mkldnn_pattern_matcher 3/3 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_mkldnn_pattern_matcher_3.3_de8f963f0fd4260a_.log
2025-12-04T15:46:32.2377458Z Running 95 items in this shard: test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_conv2d_binary_inplace_fusion_failed_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_True_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_False_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_False_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_True_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_True_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_False_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_False_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_False_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_dynamic_qlinear_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_dynamic_qlinear_qat_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_leaky_relu_pattern_fallback, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_binary_broadcast_shapes, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_fp32, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_relu_dynamic_fp16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d_add_relu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d_silu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv1d_relu_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_2, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_int8_mixed_bf16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_relu_int8_mixed_bf16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_relu_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_hardswish_int8_mixed_bf16_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_hardswish_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_hardtanh_int8_mixed_bf16_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_hardtanh_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_relu_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_cpu_use_relu_False_is_qat_False_is_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_cpu_use_relu_False_is_qat_False_is_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_cpu_use_relu_False_is_qat_True_is_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_cpu_use_relu_True_is_qat_True_is_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_cpu_use_relu_True_is_qat_True_is_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_int8_mixed_bf16_use_relu_True_is_qat_True_is_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_xpu_use_relu_True_is_qat_False_is_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_cpu_input_dim_exceeds_2, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_input_dim_exceeds_2_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_input_dim_exceeds_2_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16_input_dim_exceeds_2_and_not_contiguous_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16_input_dim_exceeds_2_use_autocast, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_mul, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_int8_mixed_bf16_input_dim_exceeds_2, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_False_float32_per_channel_quant_True_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_True_bfloat16_per_channel_quant_True_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_True_bfloat16_per_channel_quant_True_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_True_float32_per_channel_quant_True_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestDynamicPatternMatcher::test_linear_input_non_contiguous_3D_wo_bias_dynamic_shapes, test/inductor/test_mkldnn_pattern_matcher.py::TestDynamicPatternMatcher::test_linear_unary_dynamic_shapes, test/inductor/test_mkldnn_pattern_matcher.py::TestDynamicPatternMatcher::test_qat_bn_conv2d, test/inductor/test_mkldnn_pattern_matcher.py::TestDynamicPatternMatcher::test_qconv2d_maxpool2d_linear_dynamic_cpu
2025-12-04T15:46:32.2447340Z 
2025-12-04T15:46:32.2447759Z Finished inductor/test_mkldnn_pattern_matcher 3/3 ... [2025-12-04 15:46:32.230039][23620.612930542], took 8.20min
2025-12-04T15:46:32.2575093Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mkldnn_pattern_matcher/inductor.test_mkldnn_pattern_matcher-c93031a5b8f8293d.xml
2025-12-04T15:46:34.6055491Z Uploading artifacts took 2.28 seconds
2025-12-04T15:46:34.6059878Z Running inductor/test_cutlass_backend 1/1 ... [2025-12-04 15:46:34.605776][23622.988671298]
2025-12-04T15:46:34.6060518Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:46:34.6064845Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cutlass_backend.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:46:34.606229]
2025-12-04T15:46:44.9169026Z 
2025-12-04T15:46:44.9170163Z inductor/test_cutlass_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cutlass_backend_1.1_15c862b0fcbdbc05_.log
2025-12-04T15:46:44.9171223Z 
2025-12-04T15:46:44.9171623Z Finished inductor/test_cutlass_backend 1/1 ... [2025-12-04 15:46:44.916666][23633.299562898], took 0.17min
2025-12-04T15:46:44.9442153Z Running inductor/test_ck_backend 1/1 ... [2025-12-04 15:46:44.943893][23633.326787523]
2025-12-04T15:46:44.9442720Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:46:44.9445750Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_ck_backend.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:46:44.944323]
2025-12-04T15:46:55.2772575Z 
2025-12-04T15:46:55.2773759Z inductor/test_ck_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_ck_backend_1.1_578c7dfc11700a2c_.log
2025-12-04T15:46:55.2774608Z 
2025-12-04T15:46:55.2774975Z Finished inductor/test_ck_backend 1/1 ... [2025-12-04 15:46:55.277018][23643.659914217], took 0.17min
2025-12-04T15:46:55.3040869Z Running inductor/test_gpu_cpp_wrapper 1/1 ... [2025-12-04 15:46:55.303811][23643.686706801]
2025-12-04T15:46:55.3041485Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:46:55.3044858Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_gpu_cpp_wrapper.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:46:55.304229]
2025-12-04T15:53:29.6305490Z 
2025-12-04T15:53:29.6306588Z inductor/test_gpu_cpp_wrapper 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_gpu_cpp_wrapper_1.1_e2281895ade7355a_.log
2025-12-04T15:53:29.6453343Z Running 259 items in this shard: test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_add_complex4_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_add_complex_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_adding_tensor_offsets_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_addmm_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_aoti_debug_printer_works_on_constants, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_as_strided_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_batch_norm_2d_2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_bernoulli1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_bitwise_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_bmm1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_bmm2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_buffer_use_after_remove_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_cat_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_cat_slice_cat_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_consecutive_split_cumprod_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_conv_backward_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_convolution1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_custom_op_1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_custom_op_2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_custom_op_3_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_fusion_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dynamic_shapes_persistent_reduction_mixed_x_dim_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_embedding_bag_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_enable_dynamic_shapes_cpp_wrapper_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_fft_real_input_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_fft_real_input_real_output_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_foreach_cpp_wrapper_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_index_put_deterministic_fallback_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_index_tensor_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_inductor_layout_optimization_input_mutations_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_insignificant_strides_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_layer_norm_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_linear1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_linear2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_linear_relu_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_mm_plus_mm2_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_mm_plus_mm3_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_mm_views_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_multi_device_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_multi_threading_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_non_tensor_args_wrapped_on_cpu, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_pointwise_hermite_polynomial_h_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_pointwise_hermite_polynomial_he_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_pow3_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_profiler_mark_wrapper_call_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_randint_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_reduction1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_relu_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_repeat_interleave_2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_roi_align_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_scalar_input_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_scaled_dot_product_attention_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_scaled_dot_product_efficient_attention_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_silu_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_sort_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_sum_dtype_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_sum_int_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_transpose_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_add_complex4_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_add_complex_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_adding_tensor_offsets_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_addmm_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_annotation_training, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_as_strided_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_batch_norm_2d_2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_bernoulli1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_bitwise_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_bmm1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_bmm2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_buffer_use_after_remove_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_cat_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_cat_slice_cat_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_consecutive_split_cumprod_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_conv_backward_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_convolution1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_custom_op_1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_custom_op_2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_custom_op_3_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_fusion_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dynamic_shapes_persistent_reduction_mixed_x_dim_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_embedding_bag_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_enable_dynamic_shapes_cpp_wrapper_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_fft_real_input_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_fft_real_input_real_output_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_foreach_cpp_wrapper_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_index_put_deterministic_fallback_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_index_tensor_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_inductor_layout_optimization_input_mutations_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_insignificant_strides_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_layer_norm_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_linear1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_linear2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_linear_relu_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_mm_plus_mm2_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_mm_plus_mm3_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_mm_views_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_multi_device_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_multi_threading_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_pointwise_hermite_polynomial_h_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_pointwise_hermite_polynomial_he_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_pow3_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_profiler_mark_wrapper_call_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_randint_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_reduction1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_relu_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_repeat_interleave_2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_roi_align_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_scalar_input_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_scaled_dot_product_attention_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_scaled_dot_product_efficient_attention_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_silu_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_sort_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_sum_dtype_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_sum_int_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_transpose_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_uint8_cuda_dynamic_shapes_gpu_wrapper
2025-12-04T15:53:29.6598714Z 
2025-12-04T15:53:29.6599096Z Finished inductor/test_gpu_cpp_wrapper 1/1 ... [2025-12-04 15:53:29.630721][24038.013615316], took 6.57min
2025-12-04T15:53:29.6600462Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_gpu_cpp_wrapper/inductor.test_gpu_cpp_wrapper-c206afd337165094.xml
2025-12-04T15:53:29.7560339Z Running inductor/test_cutedsl_template 1/1 ... [2025-12-04 15:53:29.755698][24038.138590972]
2025-12-04T15:53:29.7560939Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:53:29.7563866Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cutedsl_template.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:53:29.756138]
2025-12-04T15:53:40.1861426Z 
2025-12-04T15:53:40.1862554Z inductor/test_cutedsl_template 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cutedsl_template_1.1_431b05ccc7f3aa92_.log
2025-12-04T15:53:40.1869224Z Running 13 items in this shard: test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cse_integration, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_add_e2e, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_add_e2e_autotune, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_op_overrides, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_gen_defines, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_gen_imports, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_get_output_hook, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_indented_buffer_usage, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_modification_subgraph, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_multiple_templates_unique_names, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_render_includes_imports, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_template_aliasing, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_template_env_contains_hooks
2025-12-04T15:53:40.1876502Z 
2025-12-04T15:53:40.1876896Z Finished inductor/test_cutedsl_template 1/1 ... [2025-12-04 15:53:40.185929][24048.568825606], took 0.17min
2025-12-04T15:53:40.2126259Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cutedsl_template/inductor.test_cutedsl_template-1780c0291e7a0397.xml
2025-12-04T15:53:40.2929666Z Running inductor/test_benchmark_fusion 1/1 ... [2025-12-04 15:53:40.292619][24048.675513853]
2025-12-04T15:53:40.2930269Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:53:40.2933490Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_benchmark_fusion.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:53:40.293060]
2025-12-04T15:54:10.6054931Z 
2025-12-04T15:54:10.6056442Z inductor/test_benchmark_fusion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_benchmark_fusion_1.1_06ce66c290620934_.log
2025-12-04T15:54:10.6065063Z Running 16 items in this shard: test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_avoid_register_spilling_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_foreach_kernel_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_register_spills_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_resnet18_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_softmax_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_tield_kernel_fusion_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkingTest::test_benchmark_on_non_zero_device, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionGpuTest::test_changed_layout, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionGpuTest::test_equivalent_extern_code, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionGpuTest::test_equivalent_template_code, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_avoid_register_spilling_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_foreach_kernel_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_register_spills_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_resnet18_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_softmax_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_tield_kernel_fusion_cpu
2025-12-04T15:54:10.6073412Z 
2025-12-04T15:54:10.6073816Z Finished inductor/test_benchmark_fusion 1/1 ... [2025-12-04 15:54:10.605282][24078.988178676], took 0.51min
2025-12-04T15:54:10.6326029Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_benchmark_fusion/inductor.test_benchmark_fusion-33e3c50f2f02127c.xml
2025-12-04T15:54:10.7148716Z Running dynamo/test_modules 1/1 ... [2025-12-04 15:54:10.714562][24079.097456204]
2025-12-04T15:54:10.7149283Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:54:10.7152621Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_modules.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:54:10.715015]
2025-12-04T15:54:47.9360740Z 
2025-12-04T15:54:47.9361838Z dynamo/test_modules 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_modules_1.1_8a3e7afe44c0508c_.log
2025-12-04T15:54:47.9410386Z Running 135 items in this shard: test/dynamo/test_modules.py::NNModuleTests::test_access_by_keys, test/dynamo/test_modules.py::NNModuleTests::test_basicmodule1, test/dynamo/test_modules.py::NNModuleTests::test_basicmodule2, test/dynamo/test_modules.py::NNModuleTests::test_call_fn_with_non_const_inputs_safe, test/dynamo/test_modules.py::NNModuleTests::test_cfgmod, test/dynamo/test_modules.py::NNModuleTests::test_children, test/dynamo/test_modules.py::NNModuleTests::test_constloop, test/dynamo/test_modules.py::NNModuleTests::test_conv_call_forward_directly, test/dynamo/test_modules.py::NNModuleTests::test_conv_call_super_forward_directly, test/dynamo/test_modules.py::NNModuleTests::test_conv_transpose_call_forward_directly, test/dynamo/test_modules.py::NNModuleTests::test_conv_transpose_call_super_forward_directly, test/dynamo/test_modules.py::NNModuleTests::test_densenet, test/dynamo/test_modules.py::NNModuleTests::test_enumvalues, test/dynamo/test_modules.py::NNModuleTests::test_fnmember, test/dynamo/test_modules.py::NNModuleTests::test_fnmembercmp1, test/dynamo/test_modules.py::NNModuleTests::test_fnmembercmp2, test/dynamo/test_modules.py::NNModuleTests::test_forward_directly, test/dynamo/test_modules.py::NNModuleTests::test_generation_tag, test/dynamo/test_modules.py::NNModuleTests::test_hasattr, test/dynamo/test_modules.py::NNModuleTests::test_inject_module_parameters, test/dynamo/test_modules.py::NNModuleTests::test_intarg, test/dynamo/test_modules.py::NNModuleTests::test_iseval1, test/dynamo/test_modules.py::NNModuleTests::test_iseval2, test/dynamo/test_modules.py::NNModuleTests::test_isnonelayer, test/dynamo/test_modules.py::NNModuleTests::test_istraining1, test/dynamo/test_modules.py::NNModuleTests::test_istraining2, test/dynamo/test_modules.py::NNModuleTests::test_layerlist, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module1, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module2, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module4, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module5, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module6, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module7, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module_bad_params, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module_bad_params_call_function, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module_kwargs, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module_no_cls_to_become, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module_speculation_log_divergence, test/dynamo/test_modules.py::NNModuleTests::test_module_attribute_precedence, test/dynamo/test_modules.py::NNModuleTests::test_module_call_module_with_static_forward, test/dynamo/test_modules.py::NNModuleTests::test_module_class_method, test/dynamo/test_modules.py::NNModuleTests::test_module_comparison, test/dynamo/test_modules.py::NNModuleTests::test_module_forward_has_graph_break, test/dynamo/test_modules.py::NNModuleTests::test_module_guard_name_is_valid, test/dynamo/test_modules.py::NNModuleTests::test_module_name_string, test/dynamo/test_modules.py::NNModuleTests::test_module_property, test/dynamo/test_modules.py::NNModuleTests::test_module_static_method, test/dynamo/test_modules.py::NNModuleTests::test_moduledict, test/dynamo/test_modules.py::NNModuleTests::test_moduledict_custom, test/dynamo/test_modules.py::NNModuleTests::test_modulelist, test/dynamo/test_modules.py::NNModuleTests::test_modulelist_custom, test/dynamo/test_modules.py::NNModuleTests::test_modulelist_nested, test/dynamo/test_modules.py::NNModuleTests::test_modulemethod1, test/dynamo/test_modules.py::NNModuleTests::test_modulemethod2, test/dynamo/test_modules.py::NNModuleTests::test_named_children, test/dynamo/test_modules.py::NNModuleTests::test_nn_module_setattr, test/dynamo/test_modules.py::NNModuleTests::test_nn_module_unspec_int_attr, test/dynamo/test_modules.py::NNModuleTests::test_nn_moduledict_contains, test/dynamo/test_modules.py::NNModuleTests::test_parameterdict, test/dynamo/test_modules.py::NNModuleTests::test_parameterdict_custom, test/dynamo/test_modules.py::NNModuleTests::test_parameters1, test/dynamo/test_modules.py::NNModuleTests::test_parameters2, test/dynamo/test_modules.py::NNModuleTests::test_parameters3, test/dynamo/test_modules.py::NNModuleTests::test_parameters4, test/dynamo/test_modules.py::NNModuleTests::test_parameters5, test/dynamo/test_modules.py::NNModuleTests::test_self_mutating1, test/dynamo/test_modules.py::NNModuleTests::test_seq, test/dynamo/test_modules.py::NNModuleTests::test_sequential_with_duplicated_module, test/dynamo/test_modules.py::NNModuleTests::test_sequential_with_duplicated_module2, test/dynamo/test_modules.py::NNModuleTests::test_simple_torch_function, test/dynamo/test_modules.py::NNModuleTests::test_stringmember, test/dynamo/test_modules.py::NNModuleTests::test_submodules1, test/dynamo/test_modules.py::NNModuleTests::test_submodules2, test/dynamo/test_modules.py::NNModuleTests::test_super1, test/dynamo/test_modules.py::NNModuleTests::test_super2, test/dynamo/test_modules.py::NNModuleTests::test_super_class_method, test/dynamo/test_modules.py::NNModuleTests::test_tensorlist, test/dynamo/test_modules.py::NNModuleTests::test_torch_function_with_closure, test/dynamo/test_modules.py::NNModuleTests::test_torch_mangled_class_name, test/dynamo/test_modules.py::NNModuleTests::test_unsupportedmethod, test/dynamo/test_modules.py::NNModuleTests::test_unsupportedmodule, test/dynamo/test_modules.py::NNModuleTests::test_viamodulecall, test/dynamo/test_modules.py::OptimizedModuleTest::test_assign_does_not_exist, test/dynamo/test_modules.py::OptimizedModuleTest::test_attr, test/dynamo/test_modules.py::OptimizedModuleTest::test_attr_precedence, test/dynamo/test_modules.py::OptimizedModuleTest::test_backward_hooks, test/dynamo/test_modules.py::OptimizedModuleTest::test_branch_on_nn_module_custom_bool, test/dynamo/test_modules.py::OptimizedModuleTest::test_branch_on_nn_module_custom_len, test/dynamo/test_modules.py::OptimizedModuleTest::test_buffer_order, test/dynamo/test_modules.py::OptimizedModuleTest::test_composition, test/dynamo/test_modules.py::OptimizedModuleTest::test_composition_with_opt_mod, test/dynamo/test_modules.py::OptimizedModuleTest::test_delattr_on_compiled_module, test/dynamo/test_modules.py::OptimizedModuleTest::test_dir, test/dynamo/test_modules.py::OptimizedModuleTest::test_dunder_call_explicitly, test/dynamo/test_modules.py::OptimizedModuleTest::test_globals_change_in_other_file, test/dynamo/test_modules.py::OptimizedModuleTest::test_guard_on_torch_nn_modules, test/dynamo/test_modules.py::OptimizedModuleTest::test_hooks_allowed_modules, test/dynamo/test_modules.py::OptimizedModuleTest::test_hooks_allowed_modules_compiles, test/dynamo/test_modules.py::OptimizedModuleTest::test_hooks_allowed_modules_compiles_self_contained, test/dynamo/test_modules.py::OptimizedModuleTest::test_hooks_inner, test/dynamo/test_modules.py::OptimizedModuleTest::test_hooks_outer, test/dynamo/test_modules.py::OptimizedModuleTest::test_hooks_skip_guards, test/dynamo/test_modules.py::OptimizedModuleTest::test_inline_inbuilt_nn_modules, test/dynamo/test_modules.py::OptimizedModuleTest::test_mark_static_nn_module_tensor, test/dynamo/test_modules.py::OptimizedModuleTest::test_mark_static_previously_seen_tensor, test/dynamo/test_modules.py::OptimizedModuleTest::test_mark_static_with_freezing, test/dynamo/test_modules.py::OptimizedModuleTest::test_module_dict_iter_keys, test/dynamo/test_modules.py::OptimizedModuleTest::test_module_dict_iter_name, test/dynamo/test_modules.py::OptimizedModuleTest::test_module_dict_iter_values, test/dynamo/test_modules.py::OptimizedModuleTest::test_module_order, test/dynamo/test_modules.py::OptimizedModuleTest::test_module_patch, test/dynamo/test_modules.py::OptimizedModuleTest::test_module_setattr, test/dynamo/test_modules.py::OptimizedModuleTest::test_monkeypatching_forward, test/dynamo/test_modules.py::OptimizedModuleTest::test_nn_module, test/dynamo/test_modules.py::OptimizedModuleTest::test_no_op_assignment, test/dynamo/test_modules.py::OptimizedModuleTest::test_no_recompile_on_nn_guarded_modules, test/dynamo/test_modules.py::OptimizedModuleTest::test_overridden_call, test/dynamo/test_modules.py::OptimizedModuleTest::test_param_order, test/dynamo/test_modules.py::OptimizedModuleTest::test_param_requires_grad, test/dynamo/test_modules.py::OptimizedModuleTest::test_patch_module, test/dynamo/test_modules.py::OptimizedModuleTest::test_recompile_limit_on_freed_module, test/dynamo/test_modules.py::OptimizedModuleTest::test_recompile_limit_on_guarded_nn_modules, test/dynamo/test_modules.py::OptimizedModuleTest::test_recursion, test/dynamo/test_modules.py::OptimizedModuleTest::test_save_and_load_all_backends, test/dynamo/test_modules.py::OptimizedModuleTest::test_save_and_load_inductor, test/dynamo/test_modules.py::OptimizedModuleTest::test_setattr_on_compiled_module, test/dynamo/test_modules.py::OptimizedModuleTest::test_specialized_module___iter__, test/dynamo/test_modules.py::OptimizedModuleTest::test_to, test/dynamo/test_modules.py::OptimizedModuleTest::test_trace_delattr, test/dynamo/test_modules.py::OptimizedModuleTest::test_udo_instance_method_as_hook, test/dynamo/test_modules.py::OptimizedModuleTest::test_unhashable_nn_submodule, test/dynamo/test_modules.py::OptimizedModuleTest::test_unspec_non_inlinable_module, test/dynamo/test_modules.py::OptimizedModuleTest::test_unspecialized_seq, test/dynamo/test_modules.py::OptimizedModuleTest::test_user_defined_nn_module_dynamic, test/dynamo/test_modules.py::NNModuleTestsDeviceCUDA::test_lazy_module3_cuda
2025-12-04T15:54:47.9457809Z 
2025-12-04T15:54:47.9458174Z Finished dynamo/test_modules 1/1 ... [2025-12-04 15:54:47.936073][24116.318967515], took 0.62min
2025-12-04T15:54:47.9636166Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_modules/dynamo.test_modules-f3674dc870090d50.xml
2025-12-04T15:54:48.0598055Z Running dynamo/test_recompiles 1/1 ... [2025-12-04 15:54:48.059482][24116.442376281]
2025-12-04T15:54:48.0598620Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:54:48.0602424Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_recompiles.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:54:48.059950]
2025-12-04T15:54:59.4413261Z 
2025-12-04T15:54:59.4414827Z dynamo/test_recompiles 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_recompiles_1.1_781d5b3da7b99916_.log
2025-12-04T15:54:59.4425082Z Running 18 items in this shard: test/dynamo/test_recompiles.py::RecompileTests::test_aliasing_guard_failures, test/dynamo/test_recompiles.py::RecompileTests::test_aliasing_guard_failures_with_globals, test/dynamo/test_recompiles.py::RecompileTests::test_ambient_autocast_recompile, test/dynamo/test_recompiles.py::RecompileTests::test_autocast_constant_fold, test/dynamo/test_recompiles.py::RecompileTests::test_automatic_dynamic_on_closed_ints, test/dynamo/test_recompiles.py::RecompileTests::test_automatic_dynamic_reduce_recompiles, test/dynamo/test_recompiles.py::RecompileTests::test_automatic_dynamic_shapes_mark_as_oblivious, test/dynamo/test_recompiles.py::RecompileTests::test_automatic_dynamic_shapes_mark_as_oblivious_fail_counterfactual, test/dynamo/test_recompiles.py::RecompileTests::test_automatic_dynamic_shapes_mark_as_unbacked, test/dynamo/test_recompiles.py::RecompileTests::test_automatic_dynamic_tensor_scalar_change, test/dynamo/test_recompiles.py::RecompileTests::test_dunder_call_recompile, test/dynamo/test_recompiles.py::RecompileTests::test_dynamic_shape_parameter_recompile, test/dynamo/test_recompiles.py::RecompileTests::test_inline_inbuilt_nn_modules_candidate, test/dynamo/test_recompiles.py::RecompileTests::test_no_recompile_over_unused_objects, test/dynamo/test_recompiles.py::RecompileTests::test_no_recursive_compile_after_cache_limit_hit, test/dynamo/test_recompiles.py::RecompileTests::test_recompiles_true_false_flop, test/dynamo/test_recompiles.py::RecompileTests::test_run_mode_after_cache_limit_hit, test/dynamo/test_recompiles.py::RecompileTests::test_simple_module_recompile
2025-12-04T15:54:59.4433593Z 
2025-12-04T15:54:59.4433940Z Finished dynamo/test_recompiles 1/1 ... [2025-12-04 15:54:59.441102][24127.823998208], took 0.19min
2025-12-04T15:54:59.4688190Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_recompiles/dynamo.test_recompiles-755ec9793479e2dd.xml
2025-12-04T15:54:59.5494716Z Running export/test_tree_utils 1/1 ... [2025-12-04 15:54:59.549127][24127.932021217]
2025-12-04T15:54:59.5495311Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:54:59.5498593Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_tree_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:54:59.549578]
2025-12-04T15:55:05.0223104Z 
2025-12-04T15:55:05.0224098Z export/test_tree_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_tree_utils_1.1_01fdd9412c3dc291_.log
2025-12-04T15:55:05.0225690Z Running 2 items in this shard: test/export/test_tree_utils.py::TestTreeUtils::test_equivalence_check, test/export/test_tree_utils.py::TestTreeUtils::test_reorder_kwargs
2025-12-04T15:55:05.0226558Z 
2025-12-04T15:55:05.0227114Z Finished export/test_tree_utils 1/1 ... [2025-12-04 15:55:05.022113][24133.405008916], took 0.09min
2025-12-04T15:55:05.0496619Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_tree_utils/export.test_tree_utils-4b33de82582b2e92.xml
2025-12-04T15:55:05.0829389Z Running inductor/test_triton_wrapper 1/1 ... [2025-12-04 15:55:05.082620][24133.465515595]
2025-12-04T15:55:05.0830338Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:55:05.0834467Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_triton_wrapper.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:55:05.083077]
2025-12-04T15:55:34.5922201Z 
2025-12-04T15:55:34.5923309Z inductor/test_triton_wrapper 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_triton_wrapper_1.1_aad0f3987661a0f9_.log
2025-12-04T15:55:34.5924719Z Running 1 items in this shard: test/inductor/test_triton_wrapper.py::TestTritonWrapper::test_wrapper_using_gpu_seed
2025-12-04T15:55:34.5925333Z 
2025-12-04T15:55:34.5925734Z Finished inductor/test_triton_wrapper 1/1 ... [2025-12-04 15:55:34.591979][24162.974875934], took 0.49min
2025-12-04T15:55:34.6201234Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_triton_wrapper/inductor.test_triton_wrapper-7697274370716365.xml
2025-12-04T15:55:34.6939152Z Running inductor/test_static_cuda_launcher 1/1 ... [2025-12-04 15:55:34.693602][24163.076496954]
2025-12-04T15:55:34.6939801Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:55:34.6943429Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_static_cuda_launcher.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:55:34.694088]
2025-12-04T15:55:59.9475061Z 
2025-12-04T15:55:59.9476225Z inductor/test_static_cuda_launcher 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_static_cuda_launcher_1.1_aa705837cbb50573_.log
2025-12-04T15:55:59.9485229Z Running 17 items in this shard: test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_basic, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_basic_1arg, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_constexpr, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_high_shared_mem, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_implied_constant, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_kernel_empty_tensor, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_kernel_many_args, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_kernel_no_args, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_signed_integers, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_too_high_shared_mem, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_unsigned_integers, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_any, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_basic_compile, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_disable_static_cuda_launcher, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_empty_tensor, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_incompatible_code, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_static_launch_user_defined_triton_kernels
2025-12-04T15:55:59.9493892Z 
2025-12-04T15:55:59.9494299Z Finished inductor/test_static_cuda_launcher 1/1 ... [2025-12-04 15:55:59.947292][24188.330187662], took 0.42min
2025-12-04T15:55:59.9752049Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_static_cuda_launcher/inductor.test_static_cuda_launcher-96effba66b878950.xml
2025-12-04T15:56:00.0525918Z Running export/test_dynamic_shapes 1/1 ... [2025-12-04 15:56:00.052211][24188.435104223]
2025-12-04T15:56:00.0526570Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:56:00.0529653Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_dynamic_shapes.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:56:00.052682]
2025-12-04T15:56:05.5251923Z 
2025-12-04T15:56:05.5253001Z export/test_dynamic_shapes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_dynamic_shapes_1.1_fa1beed2f0eed81a_.log
2025-12-04T15:56:05.5254640Z Running 2 items in this shard: test/export/test_dynamic_shapes.py::TestDimHint::test_dimhint_factory, test/export/test_dynamic_shapes.py::TestDimHint::test_dimhint_repr
2025-12-04T15:56:05.5255595Z 
2025-12-04T15:56:05.5256058Z Finished export/test_dynamic_shapes 1/1 ... [2025-12-04 15:56:05.525009][24193.90790566], took 0.09min
2025-12-04T15:56:05.5531056Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_dynamic_shapes/export.test_dynamic_shapes-6f817f896f94c83c.xml
2025-12-04T15:56:05.5842852Z Running dynamo/test_sdpa 1/1 ... [2025-12-04 15:56:05.584037][24193.966932613]
2025-12-04T15:56:05.5843477Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:56:05.5847018Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_sdpa.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:56:05.584436]
2025-12-04T15:56:14.9630348Z 
2025-12-04T15:56:14.9631426Z dynamo/test_sdpa 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_sdpa_1.1_5570cc8ef25d14ab_.log
2025-12-04T15:56:14.9634631Z Running 6 items in this shard: test/dynamo/test_sdpa.py::TestSDPA::test_graph_break_SDPAParams, test/dynamo/test_sdpa.py::TestSDPA::test_input_SDPAParams, test/dynamo/test_sdpa.py::TestSDPA::test_intermediate_attr_access_SDPAParams, test/dynamo/test_sdpa.py::TestSDPA::test_returns_SDPAParams, test/dynamo/test_sdpa.py::TestSDPA::test_sdpa_c_functions_no_graph_break, test/dynamo/test_sdpa.py::TestSDPA::test_sdpa_kernel_decorator_with_compile
2025-12-04T15:56:14.9636873Z 
2025-12-04T15:56:14.9637452Z Finished dynamo/test_sdpa 1/1 ... [2025-12-04 15:56:14.962803][24203.345699621], took 0.16min
2025-12-04T15:56:14.9920710Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_sdpa/dynamo.test_sdpa-3e0149796a415876.xml
2025-12-04T15:56:15.0747078Z Running dynamo/test_utils 1/1 ... [2025-12-04 15:56:15.074372][24203.457266368]
2025-12-04T15:56:15.0747672Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:56:15.0750394Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:56:15.074784]
2025-12-04T15:56:57.5034992Z 
2025-12-04T15:56:57.5036046Z dynamo/test_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_utils_1.1_31a21332cf86ab83_.log
2025-12-04T15:56:57.5043172Z Running 17 items in this shard: test/dynamo/test_utils.py::TestUtils::test_graph_break_counting, test/dynamo/test_utils.py::TestUtils::test_larger_multiplier_for_even_smaller_tensor, test/dynamo/test_utils.py::TestUtils::test_larger_multiplier_for_smaller_tensor, test/dynamo/test_utils.py::TestUtils::test_nan, test/dynamo/test_utils.py::TestUtils::test_traced_code_query, test/dynamo/test_utils.py::TestDynamoTimed::test_compiler_config, test/dynamo/test_utils.py::TestDynamoTimed::test_dynamic_shape_feature_use, test/dynamo/test_utils.py::TestDynamoTimed::test_dynamo_timed, test/dynamo/test_utils.py::TestDynamoTimed::test_exception_stack_trace, test/dynamo/test_utils.py::TestDynamoTimed::test_graph_node_shapes, test/dynamo/test_utils.py::TestDynamoTimed::test_inductor_provenance, test/dynamo/test_utils.py::TestDynamoTimed::test_ir_count, test/dynamo/test_utils.py::TestDynamoTimed::test_log_dynamo_start, test/dynamo/test_utils.py::TestDynamoTimed::test_num_params, test/dynamo/test_utils.py::TestDynamoTimed::test_stack_trace, test/dynamo/test_utils.py::TestInductorConfigParsingForLogging::test_inductor_config_jsonify, test/dynamo/test_utils.py::TestInductorConfigParsingForLogging::test_inductor_config_parsing_non_conforming_items
2025-12-04T15:56:57.5049552Z 
2025-12-04T15:56:57.5049876Z Finished dynamo/test_utils 1/1 ... [2025-12-04 15:56:57.503322][24245.886219167], took 0.71min
2025-12-04T15:56:57.5314974Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_utils/dynamo.test_utils-e6d94f5c34c685f8.xml
2025-12-04T15:56:57.6151387Z Running inductor/test_codegen_triton 1/1 ... [2025-12-04 15:56:57.614788][24245.997682791]
2025-12-04T15:56:57.6151998Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:56:57.6154614Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_codegen_triton.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:56:57.615208]
2025-12-04T15:57:08.0948490Z 
2025-12-04T15:57:08.0949644Z inductor/test_codegen_triton 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_codegen_triton_1.1_8e8a3c1b0bc12db7_.log
2025-12-04T15:57:08.0951097Z Running 1 items in this shard: test/inductor/test_codegen_triton.py::TestCodegenTriton::test_config_of_sizearg
2025-12-04T15:57:08.0951689Z 
2025-12-04T15:57:08.0952099Z Finished inductor/test_codegen_triton 1/1 ... [2025-12-04 15:57:08.094655][24256.477548352], took 0.17min
2025-12-04T15:57:08.1233956Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_codegen_triton/inductor.test_codegen_triton-f741c3b21cf28e3b.xml
2025-12-04T15:57:08.2150497Z Running dynamo/test_frame_init 1/1 ... [2025-12-04 15:57:08.214716][24256.597610845]
2025-12-04T15:57:08.2151075Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:57:08.2154505Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_frame_init.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:57:08.215180]
2025-12-04T15:57:13.7876024Z 
2025-12-04T15:57:13.7877013Z dynamo/test_frame_init 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_frame_init_1.1_2f60459938295159_.log
2025-12-04T15:57:13.7878224Z Running 1 items in this shard: test/dynamo/test_frame_init.py::FrameInitTests::test_frame_init
2025-12-04T15:57:13.7878744Z 
2025-12-04T15:57:13.7879089Z Finished dynamo/test_frame_init 1/1 ... [2025-12-04 15:57:13.787404][24262.170299591], took 0.09min
2025-12-04T15:57:13.8158733Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_frame_init/dynamo.test_frame_init-c2e1024fb8a07387.xml
2025-12-04T15:57:13.8434943Z Running inductor/test_device_assert 1/1 ... [2025-12-04 15:57:13.843190][24262.22608477]
2025-12-04T15:57:13.8435543Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:57:13.8438897Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_device_assert.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:57:13.843630]
2025-12-04T15:57:35.9420709Z 
2025-12-04T15:57:35.9422055Z inductor/test_device_assert 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_device_assert_1.1_d916ba60ad9d20e5_.log
2025-12-04T15:57:35.9427395Z Running 8 items in this shard: test/inductor/test_device_assert.py::TestTorchDeviceAssertTrigger::test_assert_fusion, test/inductor/test_device_assert.py::TestTorchDeviceAssertTrigger::test_assert_should_not_throw_backend_aot_eager, test/inductor/test_device_assert.py::TestTorchDeviceAssertTrigger::test_assert_should_not_throw_backend_eager, test/inductor/test_device_assert.py::TestTorchDeviceAssertTrigger::test_assert_should_not_throw_backend_inductor, test/inductor/test_device_assert.py::TestTorchDeviceAssertTrigger::test_assert_should_throw_backend_aot_eager, test/inductor/test_device_assert.py::TestTorchDeviceAssertTrigger::test_assert_should_throw_backend_eager, test/inductor/test_device_assert.py::TestTorchDeviceAssertTrigger::test_assert_should_throw_backend_inductor, test/inductor/test_device_assert.py::TestTorchDeviceAssertTrigger::test_run_assert_triton
2025-12-04T15:57:35.9431907Z 
2025-12-04T15:57:35.9432279Z Finished inductor/test_device_assert 1/1 ... [2025-12-04 15:57:35.941838][24284.324735187], took 0.37min
2025-12-04T15:57:35.9706978Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_device_assert/inductor.test_device_assert-451c7142fcd9d62b.xml
2025-12-04T15:57:36.0458623Z Running dynamo/test_skip_non_tensor 1/1 ... [2025-12-04 15:57:36.045543][24284.428437593]
2025-12-04T15:57:36.0459220Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:57:36.0462378Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_skip_non_tensor.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:57:36.045990]
2025-12-04T15:57:44.6731091Z 
2025-12-04T15:57:44.6732791Z dynamo/test_skip_non_tensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_skip_non_tensor_1.1_5109354b2e4bf091_.log
2025-12-04T15:57:44.6740447Z Running 8 items in this shard: test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_add_skip, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_add_tensor1, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_add_tensor2, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_add_tensor_dict, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_add_tensor_list, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_custom_list, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_do_not_skip_side_effects, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_recursive_list
2025-12-04T15:57:44.6746971Z 
2025-12-04T15:57:44.6747698Z Finished dynamo/test_skip_non_tensor 1/1 ... [2025-12-04 15:57:44.672850][24293.055746303], took 0.14min
2025-12-04T15:57:44.7028054Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_skip_non_tensor/dynamo.test_skip_non_tensor-f190ace25428cb94.xml
2025-12-04T15:57:44.7834040Z Running dynamo/test_skip_guard_eval_unsafe 1/1 ... [2025-12-04 15:57:44.782979][24293.165873571]
2025-12-04T15:57:44.7835059Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:57:44.7838228Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_skip_guard_eval_unsafe.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:57:44.783471]
2025-12-04T15:58:01.9734193Z 
2025-12-04T15:58:01.9735316Z dynamo/test_skip_guard_eval_unsafe 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_skip_guard_eval_unsafe_1.1_b141b115e14ff53c_.log
2025-12-04T15:58:01.9738797Z Running 5 items in this shard: test/dynamo/test_skip_guard_eval_unsafe.py::RunDiffGuardTests::test_bool_recompile, test/dynamo/test_skip_guard_eval_unsafe.py::RunDiffGuardTests::test_cache_line_pickup, test/dynamo/test_skip_guard_eval_unsafe.py::RunDiffGuardTests::test_fail_on_tensor_shape_change, test/dynamo/test_skip_guard_eval_unsafe.py::RunDiffGuardTests::test_post_recompile, test/dynamo/test_skip_guard_eval_unsafe.py::RunDiffGuardTests::test_tensor_recompile
2025-12-04T15:58:01.9741200Z 
2025-12-04T15:58:01.9741598Z Finished dynamo/test_skip_guard_eval_unsafe 1/1 ... [2025-12-04 15:58:01.973235][24310.356131087], took 0.29min
2025-12-04T15:58:02.0023158Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_skip_guard_eval_unsafe/dynamo.test_skip_guard_eval_unsafe-aa1ded9d0a4e400e.xml
2025-12-04T15:58:02.0771120Z Running inductor/test_control_deps 1/1 ... [2025-12-04 15:58:02.076802][24310.459696197]
2025-12-04T15:58:02.0771730Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:58:02.0775257Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_control_deps.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:58:02.077262]
2025-12-04T15:58:21.4708829Z 
2025-12-04T15:58:21.4710024Z inductor/test_control_deps 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_control_deps_1.1_e3804afa5ea10bb1_.log
2025-12-04T15:58:21.4711382Z Running 1 items in this shard: test/inductor/test_control_deps.py::TestControlDeps::test_control_deps_prevents_fusion
2025-12-04T15:58:21.4711991Z 
2025-12-04T15:58:21.4712385Z Finished inductor/test_control_deps 1/1 ... [2025-12-04 15:58:21.470610][24329.853502291], took 0.32min
2025-12-04T15:58:21.5001476Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_control_deps/inductor.test_control_deps-23047419ffe03376.xml
2025-12-04T15:58:21.5727269Z Running inductor/test_benchmarking 1/1 ... [2025-12-04 15:58:21.572380][24329.955275117]
2025-12-04T15:58:21.5727927Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:58:21.5731550Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_benchmarking.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:58:21.572862]
2025-12-04T15:58:34.9074345Z 
2025-12-04T15:58:34.9075429Z inductor/test_benchmarking 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_benchmarking_1.1_f947c0362e7ea45b_.log
2025-12-04T15:58:34.9082991Z Running 12 items in this shard: test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_cpu_smoke_benchmarker_cls0, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_cpu_smoke_benchmarker_cls1, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_gpu_smoke_benchmarker_cls0, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_gpu_smoke_benchmarker_cls1, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_safely_infers_device_many_devices_benchmarker_cls0, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_safely_infers_device_many_devices_benchmarker_cls1, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_safely_infers_device_no_devices_benchmarker_cls0, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_safely_infers_device_no_devices_benchmarker_cls1, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_smoke_benchmarker_cls0_device_cpu, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_smoke_benchmarker_cls0_device_cuda, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_smoke_benchmarker_cls1_device_cpu, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_smoke_benchmarker_cls1_device_cuda
2025-12-04T15:58:34.9089740Z 
2025-12-04T15:58:34.9090203Z Finished inductor/test_benchmarking 1/1 ... [2025-12-04 15:58:34.907244][24343.290137105], took 0.22min
2025-12-04T15:58:34.9368879Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_benchmarking/inductor.test_benchmarking-53f04a03954d2058.xml
2025-12-04T15:58:35.0052182Z Running inductor/test_helion_kernels 1/1 ... [2025-12-04 15:58:35.004844][24343.387738827]
2025-12-04T15:58:35.0052806Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:58:35.0055339Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_helion_kernels.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:58:35.005279]
2025-12-04T15:58:45.3354409Z 
2025-12-04T15:58:45.3355641Z inductor/test_helion_kernels 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_helion_kernels_1.1_7576dd76567d0db5_.log
2025-12-04T15:58:45.3357314Z Running 2 items in this shard: test/inductor/test_helion_kernels.py::HelionTests::test_add_kernel, test/inductor/test_helion_kernels.py::HelionTests::test_softmax_view_reshape
2025-12-04T15:58:45.3358218Z 
2025-12-04T15:58:45.3358592Z Finished inductor/test_helion_kernels 1/1 ... [2025-12-04 15:58:45.335202][24353.718099338], took 0.17min
2025-12-04T15:58:45.3650334Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_helion_kernels/inductor.test_helion_kernels-0df86f8cd24ea26a.xml
2025-12-04T15:58:45.4421994Z Running inductor/test_quantization 1/1 ... [2025-12-04 15:58:45.441847][24353.824740991]
2025-12-04T15:58:45.4422645Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:58:45.4425122Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_quantization.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:58:45.442262]
2025-12-04T15:59:05.8376479Z 
2025-12-04T15:59:05.8377617Z inductor/test_quantization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_quantization_1.1_84a522d95ca6c1ae_.log
2025-12-04T15:59:05.8379876Z Running 2 items in this shard: test/inductor/test_quantization.py::TestQuantization::test_activation_quantization_aten_with_scaling, test/inductor/test_quantization.py::TestQuantization::test_activation_quantization_aten_without_scaling
2025-12-04T15:59:05.8381216Z 
2025-12-04T15:59:05.8381660Z Finished inductor/test_quantization 1/1 ... [2025-12-04 15:59:05.837412][24374.220307718], took 0.34min
2025-12-04T15:59:05.8675446Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_quantization/inductor.test_quantization-951156711359c867.xml
2025-12-04T15:59:05.9464568Z Running export/test_tools 1/1 ... [2025-12-04 15:59:05.946121][24374.329015529]
2025-12-04T15:59:05.9465174Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:59:05.9467969Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_tools.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:59:05.946544]
2025-12-04T15:59:14.1730817Z 
2025-12-04T15:59:14.1731754Z export/test_tools 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_tools_1.1_7b301d5abd4a995c_.log
2025-12-04T15:59:14.1733371Z Running 2 items in this shard: test/export/test_tools.py::TestExportTools::test_report_exportability_basic, test/export/test_tools.py::TestExportTools::test_report_exportability_with_issues
2025-12-04T15:59:14.1734377Z 
2025-12-04T15:59:14.1734695Z Finished export/test_tools 1/1 ... [2025-12-04 15:59:14.172847][24382.555743637], took 0.14min
2025-12-04T15:59:14.2031072Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_tools/export.test_tools-c033e9415dabe65c.xml
2025-12-04T15:59:14.2826336Z Running inductor/test_compiled_optimizers 1/3 ... [2025-12-04 15:59:14.282326][24382.665219411]
2025-12-04T15:59:14.2827089Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:59:14.2830136Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compiled_optimizers.py', '--shard-id=1', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:59:14.282743]
2025-12-04T16:09:22.3028958Z 
2025-12-04T16:09:22.3030432Z inductor/test_compiled_optimizers 1/3 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compiled_optimizers_1.3_8b95325a31b7233d_.log
2025-12-04T16:09:22.3188328Z Running 248 items in this shard: test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_weight_decay_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_initial_accumulator_value_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_lr_decay_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_weight_decay_amsgrad_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_weight_decay_amsgrad_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_weight_decay_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_weight_decay_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_weight_decay_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_weight_decay_amsgrad_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_weight_decay_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_maximize_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_t0_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_weight_decay_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_weight_decay_maximize_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_closure_graph_break, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_foreach_map_adam, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_momentum_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_maximize_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_momentum_decay_decoupled_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_momentum_decay_decoupled_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_capturable_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_capturable_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_weight_decay_decoupled_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_weight_decay_decoupled_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_weight_decay_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_maximize_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_weight_decay_centered_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_weight_decay_maximize_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_weight_decay_maximize_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_step_sizes_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_recompile_single, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_weight_decay_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_ASGD_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adafactor_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adafactor_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adagrad_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adam_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adamax_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_LBFGS_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Muon_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_NAdam_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_RAdam_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_RMSprop_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Rprop_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_SparseAdam_use_closure_False_cuda_float32
2025-12-04T16:09:22.3343532Z 
2025-12-04T16:09:22.3343955Z Finished inductor/test_compiled_optimizers 1/3 ... [2025-12-04 16:09:22.303207][24990.686101481], took 10.13min
2025-12-04T16:09:22.3345406Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compiled_optimizers/inductor.test_compiled_optimizers-1745c9b9e5fc7ed3.xml
2025-12-04T16:09:23.7534319Z Uploading artifacts took 1.34 seconds
2025-12-04T16:09:23.7538675Z Running inductor/test_aot_inductor_utils 1/1 ... [2025-12-04 16:09:23.753659][24992.13655299]
2025-12-04T16:09:23.7539273Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:09:23.7543902Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:09:23.754148]
2025-12-04T16:09:34.1690039Z 
2025-12-04T16:09:34.1691279Z inductor/test_aot_inductor_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_utils_1.1_6e3c972b94953db6_.log
2025-12-04T16:09:34.1692302Z Running 0 items in this shard:
2025-12-04T16:09:34.1692541Z 
2025-12-04T16:09:34.1693178Z Finished inductor/test_aot_inductor_utils 1/1 ... [2025-12-04 16:09:34.168797][25002.551692764], took 0.17min
2025-12-04T16:09:34.1994633Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_utils/inductor.test_aot_inductor_utils-e7355f16ccb52d23.xml
2025-12-04T16:09:34.2735811Z Running inductor/test_control_flow 3/4 ... [2025-12-04 16:09:34.273251][25002.656145406]
2025-12-04T16:09:34.2736473Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:09:34.2739591Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_control_flow.py', '--shard-id=3', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:09:34.273714]
2025-12-04T16:24:52.4248050Z 
2025-12-04T16:24:52.4249183Z inductor/test_control_flow 3/4 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_control_flow_3.4_41808f1ad591b77f_.log
2025-12-04T16:24:52.4467092Z Running 183 items in this shard: test/inductor/test_control_flow.py::CondTests::test_cond_advanced_dynamic_shapes_device_cuda, test/inductor/test_control_flow.py::CondTests::test_cond_aliasing_outputs, test/inductor/test_control_flow.py::CondTests::test_cond_decompose_ops_in_subgraph_device_cpu, test/inductor/test_control_flow.py::CondTests::test_cond_functional_call_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_inductor_fx_passes_recursively_applied, test/inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_outer_code_before_after_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_outer_code_before_after_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_select_with_input_idx_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_subgraphs_with_parameters_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_closure_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_inner_to_outer_device_cpu, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_inner_to_outer_device_cuda, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_outer_to_inner_device_cuda, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_stack_output_simple_device_cpu_dynamic_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_stack_output_simple_device_cuda_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_zero_loop_device_cpu_dynamic_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_3_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_3_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_3_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_1_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_1_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_3_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_0_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_0_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_3_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_0_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_0_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_1_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_1_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_3_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_3_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_True_dim_0_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_False_dim_2_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_True_dim_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_False_dim_0_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_True_dim_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_False_dim_2_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_0_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_with_clamp_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_with_clamp_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_pytree_in_out_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_pytree_in_out_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cuda_dynamic_True_autograd_True
2025-12-04T16:24:52.4641975Z 
2025-12-04T16:24:52.4642418Z Finished inductor/test_control_flow 3/4 ... [2025-12-04 16:24:52.464020][25920.846908759], took 15.30min
2025-12-04T16:24:52.4947299Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-0b2081966a192cef.xml
2025-12-04T16:24:52.5944887Z Running inductor/test_minifier_isolate 1/1 ... [2025-12-04 16:24:52.594149][25920.977041386]
2025-12-04T16:24:52.5945516Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:24:52.5949306Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_minifier_isolate.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:24:52.594636]
2025-12-04T16:26:53.5379150Z 
2025-12-04T16:26:53.5380289Z inductor/test_minifier_isolate 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_minifier_isolate_1.1_057329f0cdaf132f_.log
2025-12-04T16:26:53.5382182Z Running 2 items in this shard: test/inductor/test_minifier_isolate.py::MinifierIsolateTests::test_after_aot_cpu_runtime_error, test/inductor/test_minifier_isolate.py::MinifierIsolateTests::test_after_aot_gpu_runtime_error
2025-12-04T16:26:53.5383329Z 
2025-12-04T16:26:53.5383966Z Finished inductor/test_minifier_isolate 1/1 ... [2025-12-04 16:26:53.537671][26041.92056799], took 2.02min
2025-12-04T16:26:53.5688849Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_minifier_isolate/inductor.test_minifier_isolate-f50615d1a1981661.xml
2025-12-04T16:26:53.6489372Z Running dynamo/test_error_messages 1/1 ... [2025-12-04 16:26:53.648631][26042.031524968]
2025-12-04T16:26:53.6489988Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:26:53.6493694Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_error_messages.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:26:53.649100]
2025-12-04T16:27:13.9945573Z 
2025-12-04T16:27:13.9946952Z dynamo/test_error_messages 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_error_messages_1.1_69ccbdbb7b8c4f0d_.log
2025-12-04T16:27:13.9974031Z Running 51 items in this shard: test/dynamo/test_error_messages.py::ErrorMessagesTest::test_assert_failure_in_generic_ctx_mgr, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_backend_fake_tensor_exc, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_class_property, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_cpp_extension_recommends_custom_ops, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_data_dependent_branching_fullgraph, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_data_dependent_branching_gb, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_data_dependent_operator2, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_dict_items_input, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_disable_message, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_dynamic_shape_operator_no_meta_kernel, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_dynamo_graph_break_fn, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_dynamo_graph_break_fn_with_msg, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_faketensor_nyi, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_generic_ctx_mgr_graph_break, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_graph_break_in_buggy_resume_prologue, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_graph_break_traceback_above_dynamo_shows_user_code, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_graph_break_traceback_collapsed_resume_frames, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_internal_compiler_stacktrace_verbose, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_latest_bytecode_to_graph_break, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_latest_bytecode_to_graph_break_fullgraph, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_latest_bytecode_to_graph_break_python_versioning, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_load_build_class, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_lru_cache_warning_logs_nested_call, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_lru_cache_warning_logs_user_stack_trace, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_nested_compile_user_frames, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_no_internal_compiler_stacktrace, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_observed_exception, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_optree_graph_break_message, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_reconstruction_failure, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_reconstruction_failure_gb, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_skip_frame_empty_function_message, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_skip_frame_in_loop_message, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_skipfile_call, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_skipfile_dynamo_call, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_skipfile_inline, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_skipped_frame_with_verbose_traceback, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_sort_with_nonconstant_keys, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_step_graph_break, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_store_attr_graph_break, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_super_call_function, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_super_call_method, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_uninitialized_module, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_unsupported_builtin, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_unsupported_bytecode, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_unsupported_context, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_variable_tracker_source_attribution, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_warnings, test/dynamo/test_error_messages.py::NestedGraphBreakLoggingTests::test_nested_graph_break_different_call_sites_not_suppressed, test/dynamo/test_error_messages.py::NestedGraphBreakLoggingTests::test_skip_frame_in_loop_message_nested, test/dynamo/test_error_messages.py::NestedGraphBreakLoggingTests::test_skipped_frame_with_verbose_traceback_nested, test/dynamo/test_error_messages.py::NestedGraphBreakLoggingTests::test_try_block_with_graph_break_suppression
2025-12-04T16:27:13.9998687Z 
2025-12-04T16:27:13.9999051Z Finished dynamo/test_error_messages 1/1 ... [2025-12-04 16:27:13.994343][26062.377238637], took 0.34min
2025-12-04T16:27:14.0254593Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_error_messages/dynamo.test_error_messages-36d8e363c2770c16.xml
2025-12-04T16:27:14.1594359Z Running dynamo/test_fake_distributed 1/1 ... [2025-12-04 16:27:14.159064][26062.541957658]
2025-12-04T16:27:14.1594995Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:27:14.1598229Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_fake_distributed.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:27:14.159543]
2025-12-04T16:27:23.4880384Z 
2025-12-04T16:27:23.4881535Z dynamo/test_fake_distributed 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_fake_distributed_1.1_14aa9693a6d04f2f_.log
2025-12-04T16:27:23.4884036Z Running 3 items in this shard: test/dynamo/test_fake_distributed.py::TestFakeDistributed::test_all_to_all_single_autograd, test/dynamo/test_fake_distributed.py::TestFakeDistributed::test_device_mesh_flatten, test/dynamo/test_fake_distributed.py::TestFakeDistributed::test_device_mesh_get_local_rank
2025-12-04T16:27:23.4885667Z 
2025-12-04T16:27:23.4886073Z Finished dynamo/test_fake_distributed 1/1 ... [2025-12-04 16:27:23.487798][26071.870694527], took 0.16min
2025-12-04T16:27:23.5188454Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_fake_distributed/dynamo.test_fake_distributed-b0f5d6fe6c345e8f.xml
2025-12-04T16:27:23.6234583Z Running dynamo/test_tree_map 1/1 ... [2025-12-04 16:27:23.623106][26072.006000616]
2025-12-04T16:27:23.6235255Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:27:23.6238404Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_tree_map.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:27:23.623563]
2025-12-04T16:27:33.1521205Z 
2025-12-04T16:27:33.1522297Z dynamo/test_tree_map 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_tree_map_1.1_63649f1aa127b381_.log
2025-12-04T16:27:33.1544254Z Running 31 items in this shard: test/dynamo/test_tree_map.py::TreeMapCompileTests::test_constantvariable_handles_none_is_leaf_kwarg, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_constantvariable_handles_python_and_dtype_leaves, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_is_leaf_handles_tensor_nodes, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_is_leaf_non_constant_fallback, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_none_nodes_default_behavior_tree_map_name_optree_tree_map_impl0, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_none_nodes_default_behavior_tree_map_name_pytree_cxx_tree_map_impl2, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_none_nodes_default_behavior_tree_map_name_pytree_python_tree_map_impl1, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_none_nodes_reject_mismatched_siblings, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_only_applies_to_tensor_nodes, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_only_handles_multiple_types, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_only_multiple_trees_falls_back, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_only_predicate_selector_skips_fastpath, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_rejects_mismatched_container_types, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_optree_tree_map_impl0_kwargs_name_default_kwargs0_allowed_impls0, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_optree_tree_map_impl0_kwargs_name_is_leaf_kwargs2_allowed_impls2, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_optree_tree_map_impl0_kwargs_name_namespace_and_none_is_leaf_kwargs4_allowed_impls4, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_optree_tree_map_impl0_kwargs_name_namespace_kwargs3_allowed_impls3, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_optree_tree_map_impl0_kwargs_name_namespace_none_is_leaf_predicate_kwargs5_allowed_impls5, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_optree_tree_map_impl0_kwargs_name_none_is_leaf_kwargs1_allowed_impls1, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_cxx_tree_map_impl2_kwargs_name_default_kwargs0_allowed_impls0, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_cxx_tree_map_impl2_kwargs_name_is_leaf_kwargs2_allowed_impls2, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_cxx_tree_map_impl2_kwargs_name_namespace_and_none_is_leaf_kwargs4_allowed_impls4, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_cxx_tree_map_impl2_kwargs_name_namespace_kwargs3_allowed_impls3, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_cxx_tree_map_impl2_kwargs_name_namespace_none_is_leaf_predicate_kwargs5_allowed_impls5, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_cxx_tree_map_impl2_kwargs_name_none_is_leaf_kwargs1_allowed_impls1, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_python_tree_map_impl1_kwargs_name_default_kwargs0_allowed_impls0, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_python_tree_map_impl1_kwargs_name_is_leaf_kwargs2_allowed_impls2, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_python_tree_map_impl1_kwargs_name_namespace_and_none_is_leaf_kwargs4_allowed_impls4, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_python_tree_map_impl1_kwargs_name_namespace_kwargs3_allowed_impls3, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_python_tree_map_impl1_kwargs_name_namespace_none_is_leaf_predicate_kwargs5_allowed_impls5, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_python_tree_map_impl1_kwargs_name_none_is_leaf_kwargs1_allowed_impls1
2025-12-04T16:27:33.1565118Z 
2025-12-04T16:27:33.1565444Z Finished dynamo/test_tree_map 1/1 ... [2025-12-04 16:27:33.151927][26081.534824081], took 0.16min
2025-12-04T16:27:33.1831687Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_tree_map/dynamo.test_tree_map-39d9c68e899fe910.xml
2025-12-04T16:27:33.2621561Z Running dynamo/test_minifier 1/1 ... [2025-12-04 16:27:33.261825][26081.644719054]
2025-12-04T16:27:33.2622128Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:27:33.2625415Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_minifier.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:27:33.262279]
2025-12-04T16:27:44.4432712Z 
2025-12-04T16:27:44.4433748Z dynamo/test_minifier 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_minifier_1.1_70592d9088ca13b1_.log
2025-12-04T16:27:44.4441803Z Running 15 items in this shard: test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cpu_accuracy_backend_passes_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cpu_accuracy_error_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cpu_compile_backend_passes_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cpu_compile_error_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cpu_runtime_backend_passes_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cpu_runtime_error_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cuda_accuracy_backend_passes_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cuda_accuracy_error_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cuda_compile_backend_passes_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cuda_compile_error_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cuda_runtime_backend_passes_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cuda_runtime_error_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_non_leaf_compile_error_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_cpu_cuda_module_after_dynamo_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_if_graph_minified_cuda
2025-12-04T16:27:44.4449220Z 
2025-12-04T16:27:44.4449546Z Finished dynamo/test_minifier 1/1 ... [2025-12-04 16:27:44.443259][26092.82615348], took 0.19min
2025-12-04T16:27:44.4753744Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_minifier/dynamo.test_minifier-9124cc51e1c5e7b6.xml
2025-12-04T16:27:44.5607593Z Running dynamo/test_guard_manager 1/1 ... [2025-12-04 16:27:44.560410][26092.943303797]
2025-12-04T16:27:44.5608180Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:27:44.5611560Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_guard_manager.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:27:44.560883]
2025-12-04T16:27:53.4890072Z 
2025-12-04T16:27:53.4891392Z dynamo/test_guard_manager 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_guard_manager_1.1_bfbfec93ec272b46_.log
2025-12-04T16:27:53.4907260Z Running 38 items in this shard: test/dynamo/test_guard_manager.py::GuardManagerTests::test_attr_guard_manager, test/dynamo/test_guard_manager.py::GuardManagerTests::test_call_function_no_args_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_clone, test/dynamo/test_guard_manager.py::GuardManagerTests::test_default_device_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_dict_contains_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_dict_getitem_accessor, test/dynamo/test_guard_manager.py::GuardManagerTests::test_dict_guard_manager, test/dynamo/test_guard_manager.py::GuardManagerTests::test_dict_version_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_diff_guard_manager, test/dynamo/test_guard_manager.py::GuardManagerTests::test_dynamic_indices_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_equals_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_framelocals_accessor, test/dynamo/test_guard_manager.py::GuardManagerTests::test_framelocals_guard_e2e, test/dynamo/test_guard_manager.py::GuardManagerTests::test_global_state_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_global_state_reason, test/dynamo/test_guard_manager.py::GuardManagerTests::test_global_weakref, test/dynamo/test_guard_manager.py::GuardManagerTests::test_globals, test/dynamo/test_guard_manager.py::GuardManagerTests::test_guard_manager_leaf_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_id_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_item_guard_manager, test/dynamo/test_guard_manager.py::GuardManagerTests::test_lambda_manager, test/dynamo/test_guard_manager.py::GuardManagerTests::test_length_check_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_no_hasattr_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_no_tensor_aliasing_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_python_lambda_leaf_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_tensor_aliasing_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_tensor_match_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_tuple_iterator_getitem, test/dynamo/test_guard_manager.py::GuardManagerTests::test_type_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_type_manager, test/dynamo/test_guard_manager.py::GuardManagerTests::test_weakref_alive_guard, test/dynamo/test_guard_manager.py::TypePropagationTests::test_basic_types, test/dynamo/test_guard_manager.py::DuplicateGuardTest::test_duplicate_guard, test/dynamo/test_guard_manager.py::TagSafetyChecks::test_dict_tag_safe, test/dynamo/test_guard_manager.py::TagSafetyChecks::test_immutable_tag_safe, test/dynamo/test_guard_manager.py::TagSafetyChecks::test_nn_module_tag_overridden_getattr_safe, test/dynamo/test_guard_manager.py::TagSafetyChecks::test_nn_module_tag_safe, test/dynamo/test_guard_manager.py::RecursiveDictGuardTests::test_disabling
2025-12-04T16:27:53.4922411Z 
2025-12-04T16:27:53.4922782Z Finished dynamo/test_guard_manager 1/1 ... [2025-12-04 16:27:53.488809][26101.871705616], took 0.15min
2025-12-04T16:27:53.5206664Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_guard_manager/dynamo.test_guard_manager-f0dd8a549f18516b.xml
2025-12-04T16:27:53.6119885Z Running export/test_schema 1/1 ... [2025-12-04 16:27:53.611657][26101.994550249]
2025-12-04T16:27:53.6120494Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:27:53.6123428Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_schema.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:27:53.612092]
2025-12-04T16:27:59.8361692Z 
2025-12-04T16:27:59.8362700Z export/test_schema 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_schema_1.1_81eb22b4e3e11516_.log
2025-12-04T16:27:59.8365051Z Running 5 items in this shard: test/export/test_schema.py::TestSchema::test_schema_check, test/export/test_schema.py::TestSchema::test_schema_comparison, test/export/test_schema.py::TestSchema::test_schema_compatibility, test/export/test_schema.py::TestSchema::test_schema_diff, test/export/test_schema.py::TestSchema::test_thrift_schema_unchanged
2025-12-04T16:27:59.8367021Z 
2025-12-04T16:27:59.8367355Z Finished export/test_schema 1/1 ... [2025-12-04 16:27:59.835954][26108.218850992], took 0.10min
2025-12-04T16:27:59.8677661Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_schema/export.test_schema-98e7fce7714746ab.xml
2025-12-04T16:27:59.9620452Z Running export/test_pass_infra 1/1 ... [2025-12-04 16:27:59.961725][26108.3446195]
2025-12-04T16:27:59.9621015Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:27:59.9624082Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_pass_infra.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:27:59.962165]
2025-12-04T16:28:08.7896350Z 
2025-12-04T16:28:08.7897411Z export/test_pass_infra 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_pass_infra_1.1_d5838225a9a8bb31_.log
2025-12-04T16:28:08.7900395Z Running 5 items in this shard: test/export/test_pass_infra.py::TestPassInfra::test_cond, test/export/test_pass_infra.py::TestPassInfra::test_export_pass_base, test/export/test_pass_infra.py::TestPassInfra::test_graph_signature_updated_after_transformation, test/export/test_pass_infra.py::TestPassInfra::test_node_name_stability, test/export/test_pass_infra.py::TestPassInfra::test_replace_hook_basic
2025-12-04T16:28:08.7902455Z 
2025-12-04T16:28:08.7902796Z Finished export/test_pass_infra 1/1 ... [2025-12-04 16:28:08.789420][26117.172317106], took 0.15min
2025-12-04T16:28:08.8212369Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_pass_infra/export.test_pass_infra-0489f34d1d482c78.xml
2025-12-04T16:28:08.9201997Z Running dynamo/test_recompile_ux 1/1 ... [2025-12-04 16:28:08.919844][26117.302737117]
2025-12-04T16:28:08.9202633Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:28:08.9205246Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_recompile_ux.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:28:08.920280]
2025-12-04T16:28:18.5494126Z 
2025-12-04T16:28:18.5495775Z dynamo/test_recompile_ux 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_recompile_ux_1.1_ac1d0051161f3db2_.log
2025-12-04T16:28:18.5505054Z Running 10 items in this shard: test/dynamo/test_recompile_ux.py::RecompileUxTests::test_drop_cache_on_skip, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_dynamic_input, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_fail_on_recompile_limit_hit, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_loop_torture, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_mismatched_type, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_multiple_guard_fails, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_multiple_guard_fails_report_all, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_nvfuser_guards, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_recompile_child_run_only, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_verbose_tensor_check
2025-12-04T16:28:18.5513001Z 
2025-12-04T16:28:18.5513693Z Finished dynamo/test_recompile_ux 1/1 ... [2025-12-04 16:28:18.549134][26126.932030107], took 0.16min
2025-12-04T16:28:18.5822713Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_recompile_ux/dynamo.test_recompile_ux-5436245cbc75fddd.xml
2025-12-04T16:28:18.6545742Z Running export/test_experimental 1/1 ... [2025-12-04 16:28:18.654172][26127.037066215]
2025-12-04T16:28:18.6546675Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:28:18.6550144Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_experimental.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:28:18.654673]
2025-12-04T16:28:29.8360553Z 
2025-12-04T16:28:29.8361601Z export/test_experimental 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_experimental_1.1_01776c650d6c59b4_.log
2025-12-04T16:28:29.8372074Z Running 22 items in this shard: test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture, test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture_closure, test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture_ctx_return, test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture_custom_pytree_type, test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture_default_args, test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture_dict_keys_getitem, test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture_full_tracing_context, test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture_fx_graph_annotate_overlap_pass, test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture_side_effects, test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture_with_call_override, test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture_with_tensor_constant, test/export/test_experimental.py::TestExperiment::test_export_add_in_out_info, test/export/test_experimental.py::TestExperiment::test_export_leaf, test/export/test_experimental.py::TestExperiment::test_joint_basic, test/export/test_experimental.py::TestExperiment::test_joint_buffer_input_mutations, test/export/test_experimental.py::TestExperiment::test_joint_cifar10_backwards, test/export/test_experimental.py::TestExperiment::test_joint_dynamic, test/export/test_experimental.py::TestExperiment::test_joint_loss_index, test/export/test_experimental.py::TestExperiment::test_side_effect, test/export/test_experimental.py::TestExperiment::test_sticky_export, test/export/test_experimental.py::TestExperiment::test_sticky_export_dynamic, test/export/test_experimental.py::TestExperiment::test_sticky_export_nested_inp
2025-12-04T16:28:29.8381720Z 
2025-12-04T16:28:29.8382082Z Finished export/test_experimental 1/1 ... [2025-12-04 16:28:29.835842][26138.2187343], took 0.19min
2025-12-04T16:28:29.8675886Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_experimental/export.test_experimental-4743e9a7200af635.xml
2025-12-04T16:28:29.9421936Z Running export/test_converter 1/1 ... [2025-12-04 16:28:29.941832][26138.324726122]
2025-12-04T16:28:29.9422528Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:28:29.9425567Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_converter.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:28:29.942291]
2025-12-04T16:28:55.2947619Z 
2025-12-04T16:28:55.2950505Z export/test_converter 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_converter_1.1_96408107873dd104_.log
2025-12-04T16:28:55.2969041Z Running 48 items in this shard: test/export/test_converter.py::TestConverter::test_aten___getitem___dict, test/export/test_converter.py::TestConverter::test_aten___getitem___list, test/export/test_converter.py::TestConverter::test_aten___is__, test/export/test_converter.py::TestConverter::test_aten___isnot__, test/export/test_converter.py::TestConverter::test_aten___not__, test/export/test_converter.py::TestConverter::test_aten_add_t, test/export/test_converter.py::TestConverter::test_aten_append_t, test/export/test_converter.py::TestConverter::test_aten_dim, test/export/test_converter.py::TestConverter::test_aten_floordiv, test/export/test_converter.py::TestConverter::test_aten_len, test/export/test_converter.py::TestConverter::test_aten_tensor_dtype_int, test/export/test_converter.py::TestConverter::test_aten_tensor_dynamic, test/export/test_converter.py::TestConverter::test_aten_tensor_prim_dtype, test/export/test_converter.py::TestConverter::test_aten_to_dtype_with_mutating_storage, test/export/test_converter.py::TestConverter::test_context_manager, test/export/test_converter.py::TestConverter::test_convert_func_without_param, test/export/test_converter.py::TestConverter::test_convert_if_basic, test/export/test_converter.py::TestConverter::test_convert_if_duplicate_attr_names, test/export/test_converter.py::TestConverter::test_convert_if_multiple_out, test/export/test_converter.py::TestConverter::test_convert_if_tuple_out, test/export/test_converter.py::TestConverter::test_convert_nn_module_with_nested_buffer, test/export/test_converter.py::TestConverter::test_convert_nn_module_with_nested_if_and_buffer, test/export/test_converter.py::TestConverter::test_convert_nn_module_with_nested_if_and_param, test/export/test_converter.py::TestConverter::test_convert_nn_module_with_nested_param, test/export/test_converter.py::TestConverter::test_convert_retrace_nested_scripted_modules, test/export/test_converter.py::TestConverter::test_convert_script_object, test/export/test_converter.py::TestConverter::test_get_tensor_constants, test/export/test_converter.py::TestConverter::test_hidden_input_name, test/export/test_converter.py::TestConverter::test_implicit_constant_to_tensor_handling, test/export/test_converter.py::TestConverter::test_prim_SetAttr, test/export/test_converter.py::TestConverter::test_prim_device, test/export/test_converter.py::TestConverter::test_prim_device_cuda, test/export/test_converter.py::TestConverter::test_prim_dtype, test/export/test_converter.py::TestConverter::test_prim_max, test/export/test_converter.py::TestConverter::test_prim_min, test/export/test_converter.py::TestConverter::test_prim_tolist, test/export/test_converter.py::TestConverter::test_profiler__record_function, test/export/test_converter.py::TestConverter::test_raise_exception, test/export/test_converter.py::TestConverter::test_ts2ep_convert_quantized_model1, test/export/test_converter.py::TestConverter::test_ts2ep_convert_quantized_model_with_opcontext, test/export/test_converter.py::TestConverter::test_ts2ep_convert_quantized_model_with_opcontext_and_constant, test/export/test_converter.py::TestConverter::test_ts2ep_converter_basic, test/export/test_converter.py::TestConverter::test_ts2ep_converter_container_output, test/export/test_converter.py::TestConverter::test_ts2ep_converter_contains, test/export/test_converter.py::TestConverter::test_ts2ep_converter_custom_op, test/export/test_converter.py::TestConverter::test_ts2ep_converter_unpack, test/export/test_converter.py::TestConverter::test_ts2ep_multi_outputs_on_call_ops, test/export/test_converter.py::TestConverter::test_ts2ep_with_loop
2025-12-04T16:28:55.2987173Z 
2025-12-04T16:28:55.2987539Z Finished export/test_converter 1/1 ... [2025-12-04 16:28:55.294603][26163.677498699], took 0.42min
2025-12-04T16:28:55.3274191Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_converter/export.test_converter-a6e4e9ebcfaea6df.xml
2025-12-04T16:28:55.3990035Z Running dynamo/test_reorder_logs 1/1 ... [2025-12-04 16:28:55.398682][26163.78157649]
2025-12-04T16:28:55.3990647Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:28:55.3993777Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_reorder_logs.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:28:55.399135]
2025-12-04T16:29:04.7776431Z 
2025-12-04T16:29:04.7777455Z dynamo/test_reorder_logs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_reorder_logs_1.1_c9bc43c050335e8d_.log
2025-12-04T16:29:04.7785263Z Running 14 items in this shard: test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method0_fn0_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method1_fn1_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method2_fn2_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method3_fn3_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method4_fn4_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method5_fn5_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method6_fn6_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method7_fn7_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_constant_mutation, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_dont_reorder_print, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_custom_log_fn, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_print, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_print_graph_break, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_warnings
2025-12-04T16:29:04.7792498Z 
2025-12-04T16:29:04.7792928Z Finished dynamo/test_reorder_logs 1/1 ... [2025-12-04 16:29:04.777447][26173.16034271], took 0.16min
2025-12-04T16:29:04.8093836Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_reorder_logs/dynamo.test_reorder_logs-d530254831fe0a21.xml
2025-12-04T16:29:04.8895297Z Running dynamo/test_subclasses 1/1 ... [2025-12-04 16:29:04.889193][26173.272087508]
2025-12-04T16:29:04.8895897Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:29:04.8898943Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_subclasses.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:29:04.889648]
2025-12-04T16:29:55.0805756Z 
2025-12-04T16:29:55.0809604Z dynamo/test_subclasses 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_subclasses_1.1_2bde93c2c59c5c84_.log
2025-12-04T16:29:55.0923001Z Running 126 items in this shard: test/dynamo/test_subclasses.py::SubclassTests::test_as_subclass_attr_mutation, test/dynamo/test_subclasses.py::SubclassTests::test_base_torch_function_tracing, test/dynamo/test_subclasses.py::SubclassTests::test_compile_higher_order_with_functionalization, test/dynamo/test_subclasses.py::SubclassTests::test_compile_with_fake_tensor_automatic_dynamic, test/dynamo/test_subclasses.py::SubclassTests::test_compile_with_fake_tensor_dynamic_dim, test/dynamo/test_subclasses.py::SubclassTests::test_compile_with_functionalization, test/dynamo/test_subclasses.py::SubclassTests::test_disable_all_torch_function, test/dynamo/test_subclasses.py::SubclassTests::test_disable_all_torch_function_restore_values, test/dynamo/test_subclasses.py::SubclassTests::test_disable_all_torch_function_restore_values_graph_break, test/dynamo/test_subclasses.py::SubclassTests::test_has_torch_function, test/dynamo/test_subclasses.py::SubclassTests::test_make_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_mark_static_with_subclass_desugaring_dynamic_False, test/dynamo/test_subclasses.py::SubclassTests::test_mark_static_with_subclass_desugaring_dynamic_True, test/dynamo/test_subclasses.py::SubclassTests::test_newly_constructed_tensor_subclass_attr_mutation, test/dynamo/test_subclasses.py::SubclassTests::test_njt_subclass_from_buffer, test/dynamo/test_subclasses.py::SubclassTests::test_njt_subclass_from_cat, test/dynamo/test_subclasses.py::SubclassTests::test_njt_subclass_simple, test/dynamo/test_subclasses.py::SubclassTests::test_no_call_to_new, test/dynamo/test_subclasses.py::SubclassTests::test_no_torch_function_on_size_bytecode, test/dynamo/test_subclasses.py::SubclassTests::test_no_torch_function_recompiles, test/dynamo/test_subclasses.py::SubclassTests::test_nontraceable_tensor_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_overridden_method_guarding, test/dynamo/test_subclasses.py::SubclassTests::test_parameter_subclass_custom_torch_func_and_dynamic_attr, test/dynamo/test_subclasses.py::SubclassTests::test_parameter_subclass_with_old_torch_function, test/dynamo/test_subclasses.py::SubclassTests::test_recompile_with_symbool_inputs, test/dynamo/test_subclasses.py::SubclassTests::test_recompiles_with_optional_inner_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_return_as_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_return_local_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_return_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_TwoTensor_TwoTensor_TwoTensor, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_TwoTensor_nested_diff_sizes, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_constructor_proxying, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_dont_invoke_torch_function_on_overridden_attr, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_dont_invoke_torch_function_on_overridden_method, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_override_shape_and_to, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_parameters_are_static_under_training, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_views_dynamic_False, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_views_dynamic_True, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_with_disabled_torch_function, test/dynamo/test_subclasses.py::SubclassTests::test_support_bases, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_automatic_dynamic_shapes, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_clone_view, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_different_shape, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_mark_dynamic_shapes, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_mul, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_nested, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_return_multiple, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_return_shape, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_return_tensor_and_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_simple, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_view, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_view_mul, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_attr_codegen_tos, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_ctx_custom_guards_error_arg_num, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_ctx_custom_guards_error_not_classmethod, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_ctx_custom_guards_override, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_ctx_guards, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_ctx_recursive_guards, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_custom_attr, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_with_non_classmethod_torch_function, test/dynamo/test_subclasses.py::SubclassTests::test_torch_dispatch_subclass_guard_recompile, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_call_on_attr, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_call_on_method, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_call_on_method_arg, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_list_args, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_state_graph_break, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_state_guards, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_state_nested, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_state_tracing, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_subclass_survives_into_aot_autograd, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_wrapper_class, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_wrapper_class_with_kwargs, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_equality_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_equality_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_identity_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_identity_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_isinstance_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_isinstance_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_user_overridden_attr_unsupported, test/dynamo/test_subclasses.py::SubclassTests::test_user_overridden_method_unsupported, test/dynamo/test_subclasses.py::SubclassTests::test_user_overridden_property_unsupported, test/dynamo/test_subclasses.py::SubclassTests::test_wrapper_subclass_dynamo_attribute_access_on_intermediate, test/dynamo/test_subclasses.py::SubclassTests::test_wrapper_subclass_guards_on_inner_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_wrapper_subclass_with_differently_sized_inner_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_wrapper_subclass_with_same_sized_inner_tensor, test/dynamo/test_subclasses.py::TestNestedTensor::test_basic_autograd, test/dynamo/test_subclasses.py::TestNestedTensor::test_basic_autograd_inductor, test/dynamo/test_subclasses.py::TestNestedTensor::test_binary_does_not_recompile, test/dynamo/test_subclasses.py::TestNestedTensor::test_binary_recompiles, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input_2, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input_4, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input_5, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input_6, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_intermediate, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_intermediate_2, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_intermediate_3, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_intermediate_4, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_intermediate_5, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_mixed, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_mixed_2, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_mixed_3, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_is_nested_call, test/dynamo/test_subclasses.py::TestNestedTensor::test_inference_tensor, test/dynamo/test_subclasses.py::TestNestedTensor::test_inline_nested_tensor_from_jagged, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_basic, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_leaf_False_False, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_leaf_False_True, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_leaf_True_False, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_leaf_True_True, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_obscure, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_basic, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_leaf_False_False, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_leaf_False_True, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_leaf_True_False, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_leaf_True_True, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_obscure, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_dense_subclass_dense_subclass, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_subclass_dense, test/dynamo/test_subclasses.py::TestNestedTensor::test_param_subclass_isinstance_input, test/dynamo/test_subclasses.py::TestNestedTensor::test_return_shape, test/dynamo/test_subclasses.py::TestNestedTensor::test_subclass_dense_subclass_dense_view, test/dynamo/test_subclasses.py::TestNestedTensor::test_subclass_gives_static_shapes_when_dynamic_false, test/dynamo/test_subclasses.py::TestNestedTensor::test_subclass_with_mutation_in_graph, test/dynamo/test_subclasses.py::TestNestedTensor::test_unary_does_not_recompile, test/dynamo/test_subclasses.py::TestNestedTensor::test_unbind
2025-12-04T16:29:55.1033744Z 
2025-12-04T16:29:55.1034448Z Finished dynamo/test_subclasses 1/1 ... [2025-12-04 16:29:55.080632][26223.463523815], took 0.84min
2025-12-04T16:29:55.1214437Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_subclasses/dynamo.test_subclasses-90ae20717b7fd572.xml
2025-12-04T16:29:56.6525350Z Uploading artifacts took 1.44 seconds
2025-12-04T16:29:56.6530668Z Running dynamo/test_python_autograd 1/1 ... [2025-12-04 16:29:56.652819][26225.035712755]
2025-12-04T16:29:56.6531588Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:29:56.6537773Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_python_autograd.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:29:56.653380]
2025-12-04T16:30:04.8300583Z 
2025-12-04T16:30:04.8301654Z dynamo/test_python_autograd 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_python_autograd_1.1_3d66bfb1c1737055_.log
2025-12-04T16:30:04.8304434Z Running 5 items in this shard: test/dynamo/test_python_autograd.py::TestPythonAutograd::test_backwards1, test/dynamo/test_python_autograd.py::TestPythonAutograd::test_backwards2, test/dynamo/test_python_autograd.py::TestPythonAutograd::test_forwards1, test/dynamo/test_python_autograd.py::TestPythonAutograd::test_forwards2, test/dynamo/test_python_autograd.py::TestPythonAutograd::test_split
2025-12-04T16:30:04.8306465Z 
2025-12-04T16:30:04.8306835Z Finished dynamo/test_python_autograd 1/1 ... [2025-12-04 16:30:04.829851][26233.212747616], took 0.14min
2025-12-04T16:30:04.8624242Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_python_autograd/dynamo.test_python_autograd-b76a60537c2ba691.xml
2025-12-04T16:30:04.9559978Z Running export/test_draft_export 1/1 ... [2025-12-04 16:30:04.955636][26233.338531574]
2025-12-04T16:30:04.9560568Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:30:04.9563496Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_draft_export.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:30:04.956109]
2025-12-04T16:30:25.0034861Z 
2025-12-04T16:30:25.0035916Z export/test_draft_export 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_draft_export_1.1_dc9e6c5dfafe9a68_.log
2025-12-04T16:30:25.0046132Z Running 21 items in this shard: test/export/test_draft_export.py::TestDraftExport::test_complex_data_dependent_expr, test/export/test_draft_export.py::TestDraftExport::test_constantify_unbacked_symbol, test/export/test_draft_export.py::TestDraftExport::test_cuda_memory_usage, test/export/test_draft_export.py::TestDraftExport::test_data_dependent_failure, test/export/test_draft_export.py::TestDraftExport::test_dedup_data_dependent_failure, test/export/test_draft_export.py::TestDraftExport::test_fake_infer_dense_in_memory_check, test/export/test_draft_export.py::TestDraftExport::test_masked_linear, test/export/test_draft_export.py::TestDraftExport::test_missing_meta_kernel_custom_op_basic, test/export/test_draft_export.py::TestDraftExport::test_missing_meta_kernel_custom_op_multiple_profiles, test/export/test_draft_export.py::TestDraftExport::test_missing_meta_kernel_custom_op_update_profile, test/export/test_draft_export.py::TestDraftExport::test_missing_meta_kernel_guard, test/export/test_draft_export.py::TestDraftExport::test_missing_meta_kernel_impl, test/export/test_draft_export.py::TestDraftExport::test_offsets, test/export/test_draft_export.py::TestDraftExport::test_override_incorrectly_aliasing_kernel, test/export/test_draft_export.py::TestDraftExport::test_override_mismatched_fake_kernel_with_unbacked_symbols, test/export/test_draft_export.py::TestDraftExport::test_override_size_and_dtype_mismatched_fake_kernels, test/export/test_draft_export.py::TestDraftExport::test_shape_failure, test/export/test_draft_export.py::TestDraftExport::test_side_effect1, test/export/test_draft_export.py::TestDraftExport::test_side_effect_inps, test/export/test_draft_export.py::TestDraftExport::test_torchbind, test/export/test_draft_export.py::TestDraftExport::test_unbacked_div_mod_replacement
2025-12-04T16:30:25.0055266Z 
2025-12-04T16:30:25.0055634Z Finished export/test_draft_export 1/1 ... [2025-12-04 16:30:25.003540][26253.386434637], took 0.33min
2025-12-04T16:30:25.0358503Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_draft_export/export.test_draft_export-0c8a812115433a7d.xml
2025-12-04T16:30:25.1197111Z Running test_package 1/1 ... [2025-12-04 16:30:25.119382][26253.50227625]
2025-12-04T16:30:25.1197670Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:30:25.1200610Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_package.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:30:25.119822]
2025-12-04T16:30:33.0964061Z 
2025-12-04T16:30:33.0965415Z test_package 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_package_1.1_34eeddca63aecf34_.log
2025-12-04T16:30:33.1008535Z Running 137 items in this shard: test/test_package.py::TestAnalyze::test_trace_dependencies, test/test_package.py::TestDependencyAPI::test_allow_empty_with_error, test/test_package.py::TestDependencyAPI::test_broken_dependency, test/test_package.py::TestDependencyAPI::test_deny, test/test_package.py::TestDependencyAPI::test_deny_glob, test/test_package.py::TestDependencyAPI::test_extern, test/test_package.py::TestDependencyAPI::test_extern_glob, test/test_package.py::TestDependencyAPI::test_extern_glob_allow_empty, test/test_package.py::TestDependencyAPI::test_externing_c_extension, test/test_package.py::TestDependencyAPI::test_implicit_intern, test/test_package.py::TestDependencyAPI::test_intern_error, test/test_package.py::TestDependencyAPI::test_invalid_import, test/test_package.py::TestDependencyAPI::test_mock, test/test_package.py::TestDependencyAPI::test_mock_glob, test/test_package.py::TestDependencyAPI::test_mock_glob_allow_empty, test/test_package.py::TestDependencyAPI::test_pickle_mocked, test/test_package.py::TestDependencyAPI::test_pickle_mocked_all, test/test_package.py::TestDependencyAPI::test_repackage_mocked_module, test/test_package.py::TestDependencyHooks::test_extern_and_mock_hook, test/test_package.py::TestDependencyHooks::test_multiple_extern_hooks, test/test_package.py::TestDependencyHooks::test_multiple_mock_hooks, test/test_package.py::TestDependencyHooks::test_remove_hooks, test/test_package.py::TestDependencyHooks::test_single_hook, test/test_package.py::TestDiGraph::test_all_paths, test/test_package.py::TestDiGraph::test_contains, test/test_package.py::TestDiGraph::test_contains_non_hashable, test/test_package.py::TestDiGraph::test_edges, test/test_package.py::TestDiGraph::test_forward_closure, test/test_package.py::TestDiGraph::test_iter, test/test_package.py::TestDiGraph::test_node_attr_update, test/test_package.py::TestDiGraph::test_node_attrs, test/test_package.py::TestDiGraph::test_predecessor_not_in_graph, test/test_package.py::TestDiGraph::test_predecessors, test/test_package.py::TestDiGraph::test_successor_not_in_graph, test/test_package.py::TestDiGraph::test_successors, test/test_package.py::DirectoryReaderTest::test_importer_access, test/test_package.py::DirectoryReaderTest::test_loading_has_record, test/test_package.py::DirectoryReaderTest::test_loading_module, test/test_package.py::DirectoryReaderTest::test_loading_pickle, test/test_package.py::DirectoryReaderTest::test_package_resource_access, test/test_package.py::DirectoryReaderTest::test_resource_access_by_path, test/test_package.py::DirectoryReaderTest::test_resource_reader, test/test_package.py::DirectoryReaderTest::test_scriptobject_failure_message, test/test_package.py::TestGlobGroup::test_exclude, test/test_package.py::TestGlobGroup::test_exclude_from_all, test/test_package.py::TestGlobGroup::test_invalid_raw, test/test_package.py::TestGlobGroup::test_list_include_exclude, test/test_package.py::TestGlobGroup::test_one_star, test/test_package.py::TestGlobGroup::test_one_star_middle, test/test_package.py::TestGlobGroup::test_one_star_multiple_in_component, test/test_package.py::TestGlobGroup::test_one_star_partial, test/test_package.py::TestGlobGroup::test_one_star_partial_extension, test/test_package.py::TestGlobGroup::test_raw_two_star, test/test_package.py::TestGlobGroup::test_two_star, test/test_package.py::TestGlobGroup::test_two_star_end, test/test_package.py::TestGlobGroup::test_two_star_middle, test/test_package.py::TestGlobGroup::test_two_star_multiple, test/test_package.py::TestImporter::test_ordered_importer_basic, test/test_package.py::TestImporter::test_ordered_importer_whichmodule, test/test_package.py::TestImporter::test_package_importer_whichmodule_no_dunder_module, test/test_package.py::TestImporter::test_single_ordered_importer, test/test_package.py::TestImporter::test_sys_importer, test/test_package.py::TestImporter::test_sys_importer_roundtrip, test/test_package.py::TestLoadBCPackages::test_load_bc_packages_fx_module, test/test_package.py::TestLoadBCPackages::test_load_bc_packages_nn_module, test/test_package.py::TestLoadBCPackages::test_load_bc_packages_torchscript_module, test/test_package.py::TestMangling::test_demangle_base, test/test_package.py::TestMangling::test_demangler_multiple_manglers, test/test_package.py::TestMangling::test_is_mangled, test/test_package.py::TestMangling::test_mangle_empty_errors, test/test_package.py::TestMangling::test_mangle_prefix, test/test_package.py::TestMangling::test_mangler_is_consistent, test/test_package.py::TestMangling::test_package_mangler, test/test_package.py::TestMangling::test_roundtrip_mangling, test/test_package.py::TestMangling::test_unique_manglers, test/test_package.py::TestMangling::test_unique_module_names, test/test_package.py::TestMisc::test_dunder_package_present, test/test_package.py::TestMisc::test_dunder_package_works_from_package, test/test_package.py::TestMisc::test_exporter_content_lists, test/test_package.py::TestMisc::test_file_structure, test/test_package.py::TestMisc::test_file_structure_has_file, test/test_package.py::TestMisc::test_inspect_class, test/test_package.py::TestMisc::test_is_from_package, test/test_package.py::TestMisc::test_load_python_version_from_package, test/test_package.py::TestMisc::test_loaders_that_remap_files_work_ok, test/test_package.py::TestMisc::test_python_version, test/test_package.py::TestMisc::test_std_lib_sys_hackery_checks, test/test_package.py::ModelTest::test_model_save, test/test_package.py::ModelTest::test_resnet, test/test_package.py::ModelTest::test_script_resnet, test/test_package.py::TestPackageFX::test_package_fx_custom_tracer, test/test_package.py::TestPackageFX::test_package_fx_package, test/test_package.py::TestPackageFX::test_package_fx_simple, test/test_package.py::TestPackageFX::test_package_fx_with_imports, test/test_package.py::TestPackageFX::test_package_fx_wrap, test/test_package.py::TestPackageFX::test_package_gm_preserve_stack_trace, test/test_package.py::TestPackageFX::test_package_then_fx, test/test_package.py::TestPackageScript::test_different_package_interface, test/test_package.py::TestPackageScript::test_different_package_script_class, test/test_package.py::TestPackageScript::test_load_shared_scriptmodules, test/test_package.py::TestPackageScript::test_load_shared_tensors, test/test_package.py::TestPackageScript::test_load_shared_tensors_repackaged, test/test_package.py::TestPackageScript::test_mixing_packaged_and_inline_modules, test/test_package.py::TestPackageScript::test_mixing_packaged_and_inline_modules_shared_code, test/test_package.py::TestPackageScript::test_package_interface, test/test_package.py::TestPackageScript::test_package_script_class, test/test_package.py::TestPackageScript::test_package_script_class_referencing_self, test/test_package.py::TestPackageScript::test_save_eager_mods_sharing_scriptmodule, test/test_package.py::TestPackageScript::test_save_independent_scriptmodules, test/test_package.py::TestPackageScript::test_save_repeat_scriptmodules, test/test_package.py::TestPackageScript::test_save_scriptmodule, test/test_package.py::TestPackageScript::test_save_scriptmodule_file, test/test_package.py::TestPackageScript::test_save_scriptmodule_only_necessary_code, test/test_package.py::TestPackageScript::test_save_scriptmodule_with_submods, test/test_package.py::TestPackageScript::test_save_scriptmodules_in_container, test/test_package.py::TestPackageScript::test_save_scriptmodules_submod_redefinition, test/test_package.py::TestPackageScript::test_save_shared_tensors, test/test_package.py::TestPackageScript::test_saving_and_scripting_packaged_mod, test/test_package.py::TestPackageScript::test_scriptmodules_repeat_save, test/test_package.py::TestPackageScript::test_tensor_sharing_pickle, test/test_package.py::TestRepackage::test_repackage_import_indirectly_via_parent_module, test/test_package.py::TestResources::test_importer_access, test/test_package.py::TestResources::test_package_resource_access, test/test_package.py::TestResources::test_resource_access_by_path, test/test_package.py::TestResources::test_resource_reader, test/test_package.py::TestSaveLoad::test_bad_dunder_imports, test/test_package.py::TestSaveLoad::test_dunder_imports, test/test_package.py::TestSaveLoad::test_exporting_mismatched_code, test/test_package.py::TestSaveLoad::test_pickle, test/test_package.py::TestSaveLoad::test_pickle_long_name_with_protocol_4, test/test_package.py::TestSaveLoad::test_save_imported_module, test/test_package.py::TestSaveLoad::test_save_imported_module_using_package_importer, test/test_package.py::TestSaveLoad::test_save_load_fp8, test/test_package.py::TestSaveLoad::test_save_module, test/test_package.py::TestSaveLoad::test_save_module_binary, test/test_package.py::TestSaveLoad::test_saving_source, test/test_package.py::TestSaveLoad::test_saving_string
2025-12-04T16:30:33.1050504Z 
2025-12-04T16:30:33.1050792Z Finished test_package 1/1 ... [2025-12-04 16:30:33.096408][26261.479301358], took 0.13min
2025-12-04T16:30:33.1291835Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_package/test_package-523d81f0792170f1.xml
2025-12-04T16:30:33.2170806Z Running test_mkl_verbose 1/1 ... [2025-12-04 16:30:33.216794][26261.599688784]
2025-12-04T16:30:33.2171628Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:30:33.2175333Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_mkl_verbose.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:30:33.217300]
2025-12-04T16:30:43.2487760Z 
2025-12-04T16:30:43.2488707Z test_mkl_verbose 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_mkl_verbose_1.1_8df5a0c4f0a0ed8d_.log
2025-12-04T16:30:43.2490081Z Running 2 items in this shard: test/test_mkl_verbose.py::TestMKLVerbose::test_verbose_off, test/test_mkl_verbose.py::TestMKLVerbose::test_verbose_on
2025-12-04T16:30:43.2490843Z 
2025-12-04T16:30:43.2491160Z Finished test_mkl_verbose 1/1 ... [2025-12-04 16:30:43.248501][26271.631396535], took 0.17min
2025-12-04T16:30:43.2818624Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_mkl_verbose/test_mkl_verbose-874cbf06946f8b3e.xml
2025-12-04T16:30:43.3444016Z Running test_comparison_utils 1/1 ... [2025-12-04 16:30:43.344052][26271.726946881]
2025-12-04T16:30:43.3444719Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:30:43.3447498Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_comparison_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:30:43.344494]
2025-12-04T16:30:48.8169404Z 
2025-12-04T16:30:48.8170375Z test_comparison_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_comparison_utils_1.1_bef8586b0834f006_.log
2025-12-04T16:30:48.8173979Z Running 7 items in this shard: test/test_comparison_utils.py::TestComparisonUtils::test_all_equal_no_assert, test/test_comparison_utils.py::TestComparisonUtils::test_all_equal_no_assert_nones, test/test_comparison_utils.py::TestComparisonUtils::test_assert_device, test/test_comparison_utils.py::TestComparisonUtils::test_assert_dtype, test/test_comparison_utils.py::TestComparisonUtils::test_assert_layout, test/test_comparison_utils.py::TestComparisonUtils::test_assert_sizes, test/test_comparison_utils.py::TestComparisonUtils::test_assert_strides
2025-12-04T16:30:48.8176743Z 
2025-12-04T16:30:48.8177084Z Finished test_comparison_utils 1/1 ... [2025-12-04 16:30:48.816752][26277.199649922], took 0.09min
2025-12-04T16:30:48.8502194Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_comparison_utils/test_comparison_utils-ce770324779d51b3.xml
2025-12-04T16:30:48.8839286Z Running functorch/test_ac_logging 1/1 ... [2025-12-04 16:30:48.883684][26277.266578608]
2025-12-04T16:30:48.8839864Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:30:48.8843425Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ac_logging.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:30:48.884101]
2025-12-04T16:30:54.4066782Z 
2025-12-04T16:30:54.4068054Z functorch/test_ac_logging 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ac_logging_1.1_7064fc1f81d9dc21_.log
2025-12-04T16:30:54.4070835Z Running 4 items in this shard: test/functorch/test_ac_logging.py::TestAcLogging::test_create_activation_checkpointing_logging_structure_payload, test/functorch/test_ac_logging.py::TestAcLogging::test_create_joint_graph_edges, test/functorch/test_ac_logging.py::TestAcLogging::test_create_joint_graph_node_information, test/functorch/test_ac_logging.py::TestAcLogging::test_create_structured_trace_for_min_cut_info
2025-12-04T16:30:54.4073321Z 
2025-12-04T16:30:54.4073678Z Finished functorch/test_ac_logging 1/1 ... [2025-12-04 16:30:54.406457][26282.789353096], took 0.09min
2025-12-04T16:30:54.4399043Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_ac_logging/functorch.test_ac_logging-f1c79a1c8c74be66.xml
2025-12-04T16:30:54.4758980Z Running test_mkldnn_verbose 1/1 ... [2025-12-04 16:30:54.475629][26282.858522976]
2025-12-04T16:30:54.4759534Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:30:54.4763053Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_mkldnn_verbose.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:30:54.476075]
2025-12-04T16:31:03.5038173Z 
2025-12-04T16:31:03.5039119Z test_mkldnn_verbose 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_mkldnn_verbose_1.1_7178d5eae573783e_.log
2025-12-04T16:31:03.5040629Z Running 2 items in this shard: test/test_mkldnn_verbose.py::TestMKLDNNVerbose::test_verbose_off, test/test_mkldnn_verbose.py::TestMKLDNNVerbose::test_verbose_on
2025-12-04T16:31:03.5041451Z 
2025-12-04T16:31:03.5042003Z Finished test_mkldnn_verbose 1/1 ... [2025-12-04 16:31:03.503614][26291.886505987], took 0.15min
2025-12-04T16:31:03.5369219Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_mkldnn_verbose/test_mkldnn_verbose-e983273d29ed8e1e.xml
2025-12-04T16:31:03.6170601Z Running test_cpp_api_parity 1/1 ... [2025-12-04 16:31:03.616768][26291.999662952]
2025-12-04T16:31:03.6171312Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:31:03.6174719Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cpp_api_parity.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:31:03.617246]
2025-12-04T16:31:30.8251011Z 
2025-12-04T16:31:30.8254323Z test_cpp_api_parity 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cpp_api_parity_1.1_286b24be771dc4b7_.log
2025-12-04T16:31:30.8476728Z Running 488 items in this shard: test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_circular_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_circular_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1size1, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1size1_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2size1, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2size1_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_valid, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_valid_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_reflect_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_reflect_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_replicate_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_replicate_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_stride, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_stride_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zero_batch, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zero_batch_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zeros_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zeros_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_circular_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_circular_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_padded, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_padded_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_strided, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_strided_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_with_multiplier, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_with_multiplier_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups_thnn, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups_thnn_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_valid, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_valid_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_padding, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_padding_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_reflect_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_reflect_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_replicate_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_replicate_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_strided, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_strided_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zero_batch, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zero_batch_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zeros_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zeros_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_1x1x1_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_1x1x1_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_circular_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_circular_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated_strided, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated_strided_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_valid, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_valid_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_replicate_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_replicate_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride_padding, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride_padding_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zero_batch, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zero_batch_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zeros_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zeros_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CrossMapLRN2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CrossMapLRN2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_discontiguous, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_discontiguous_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max_padding_idx, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max_padding_idx_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean_padding_idx, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean_padding_idx_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sparse, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sparse_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum_padding_idx, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum_padding_idx_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_discontiguous, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_discontiguous_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_sparse, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_sparse_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_int_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_int_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_int_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_int_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_LayerNorm_3d_no_affine_large_feature, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_LayerNorm_3d_no_affine_large_feature_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_lhs, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_lhs_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_rhs, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_rhs_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_with_non_default_args, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_with_non_default_args_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelShuffle, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelShuffle_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelUnshuffle, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelUnshuffle_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_complex, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_complex_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_has_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_has_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_no_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_no_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_gelu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_gelu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_relu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_relu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_gelu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_gelu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_relu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_relu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Transformer_multilayer_coder, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Transformer_multilayer_coder_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unflatten_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unflatten_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold_int_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold_int_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_legacy_enum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_legacy_enum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_margin_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_margin_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HuberLoss_delta, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HuberLoss_delta_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_log_target, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_log_target_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar_log_target, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar_log_target_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_log_target_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_log_target_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_target_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_target_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_complex, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_complex_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_0d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_0d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_1d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_1d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_index_neg, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_index_neg_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_weights_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_weights_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_1d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_1d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_margin_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_margin_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_p_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_p_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_weights_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_weights_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_weights, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_weights_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_weights, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_weights_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index_neg, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index_neg_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_PoissonNLLLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_PoissonNLLLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_beta, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_beta_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_zero_beta, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_zero_beta_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SoftMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SoftMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_shared_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_shared_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_shared_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_shared_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_tuple_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_tuple_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_launch_configs, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_launch_configs_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim0, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim0_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim3, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim3_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_lastdim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_lastdim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial_special, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial_special_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_multimarginloss_1d_input_0d_target_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_multimarginloss_1d_input_0d_target_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_has_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_has_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_no_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_no_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim0, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim0_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim3, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim3_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim_dtype, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim_dtype_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_dtype, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_dtype_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_special, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_special_cuda
2025-12-04T16:31:30.8697310Z 
2025-12-04T16:31:30.8697643Z Finished test_cpp_api_parity 1/1 ... [2025-12-04 16:31:30.825704][26319.208598972], took 0.45min
2025-12-04T16:31:30.8698842Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cpp_api_parity/test_cpp_api_parity-c6b7300fef8db168.xml
2025-12-04T16:31:30.9679641Z Running test_autoload 1/1 ... [2025-12-04 16:31:30.967659][26319.350552084]
2025-12-04T16:31:30.9680191Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:31:30.9683187Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_autoload.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:31:30.968091]
2025-12-04T16:31:36.4909482Z 
2025-12-04T16:31:36.4910417Z test_autoload 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_autoload_1.1_4b58ab9cd8e50318_.log
2025-12-04T16:31:36.4911549Z Running 1 items in this shard: test/test_autoload.py::TestDeviceBackendAutoload::test_autoload
2025-12-04T16:31:36.4912050Z 
2025-12-04T16:31:36.4912340Z Finished test_autoload 1/1 ... [2025-12-04 16:31:36.490709][26324.873605686], took 0.09min
2025-12-04T16:31:36.5248479Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_autoload/test_autoload-21f1eacf8f4a4d28.xml
2025-12-04T16:31:36.5513658Z Running nn/attention/test_open_registry 1/1 ... [2025-12-04 16:31:36.551028][26324.933922166]
2025-12-04T16:31:36.5514454Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:31:36.5516884Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/attention/test_open_registry.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:31:36.551442]
2025-12-04T16:31:42.1242598Z 
2025-12-04T16:31:42.1243687Z nn/attention/test_open_registry 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.attention.test_open_registry_1.1_52b8c107579dfb04_.log
2025-12-04T16:31:42.1245742Z Running 2 items in this shard: test/nn/attention/test_open_registry.py::TestFlashAttentionRegistry::test_activate_unknown_impl_errors, test/nn/attention/test_open_registry.py::TestFlashAttentionRegistry::test_register_and_activate_impl
2025-12-04T16:31:42.1246992Z 
2025-12-04T16:31:42.1247399Z Finished nn/attention/test_open_registry 1/1 ... [2025-12-04 16:31:42.124046][26330.506940553], took 0.09min
2025-12-04T16:31:42.1588792Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.attention.test_open_registry/nn.attention.test_open_registry-bacfee0084c93992.xml
2025-12-04T16:31:42.1910976Z Running test_as_strided 1/1 ... [2025-12-04 16:31:42.190809][26330.573703645]
2025-12-04T16:31:42.1911530Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:31:42.1914558Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_as_strided.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:31:42.191218]
2025-12-04T16:31:47.8140116Z 
2025-12-04T16:31:47.8140962Z test_as_strided 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_as_strided_1.1_915ecc12abd3e105_.log
2025-12-04T16:31:47.8142378Z Running 2 items in this shard: test/test_as_strided.py::TestAsStrided::test_size_10_exhaustive, test/test_as_strided.py::TestAsStrided::test_subset_property
2025-12-04T16:31:47.8143171Z 
2025-12-04T16:31:47.8143462Z Finished test_as_strided 1/1 ... [2025-12-04 16:31:47.813815][26336.196711787], took 0.09min
2025-12-04T16:31:47.8480051Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_as_strided/test_as_strided-4555079064233d7d.xml
2025-12-04T16:31:47.8814938Z Running test_foreach 1/1 ... [2025-12-04 16:31:47.881198][26336.264093037]
2025-12-04T16:31:47.8815720Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:31:47.8818541Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_foreach.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:31:47.881612]
2025-12-04T16:42:05.6629315Z 
2025-12-04T16:42:05.6630313Z test_foreach 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_foreach_1.1_754d93a1205d9df5_.log
2025-12-04T16:42:05.8367177Z Running 3577 items in this shard: test/test_foreach.py::TestForeachCUDA::test_0dim_tensor_overload_cpu_ok_cuda, test/test_foreach.py::TestForeachCUDA::test_0dim_tensor_overload_exception_cuda, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_abs_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_acos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_addcdiv_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_addcmul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_asin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_atan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_ceil_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_cos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_cosh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_erf_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_erfc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_floor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_frac_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_lerp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_lgamma_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_log10_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_log1p_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_log2_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_log_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_neg_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_norm_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_round_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sign_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sinh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_trunc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_zero_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_abs_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_abs_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_abs_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_abs_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_acos_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_acos_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_acos_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_acos_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_add_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_add_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_add_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_add_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcdiv_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcdiv_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcdiv_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcdiv_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcmul_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcmul_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcmul_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcmul_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_asin_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_asin_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_asin_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_asin_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_atan_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_atan_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_atan_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_atan_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_ceil_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_ceil_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_ceil_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_ceil_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_max_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_max_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_max_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_max_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_min_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_min_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_min_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_min_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_copy_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_copy_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_copy_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_copy_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cos_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cos_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cos_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cos_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cosh_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cosh_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cosh_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cosh_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_div_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_div_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_div_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_div_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erf_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erf_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erf_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erf_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erfc_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erfc_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erfc_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erfc_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_exp_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_exp_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_exp_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_exp_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_expm1_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_expm1_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_expm1_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_expm1_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_floor_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_floor_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_floor_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_floor_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_frac_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_frac_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_frac_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_frac_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lerp_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lerp_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lerp_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lerp_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lgamma_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lgamma_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lgamma_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lgamma_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log10_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log10_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log10_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log10_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log1p_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log1p_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log1p_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log1p_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log2_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log2_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log2_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log2_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_max_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_max_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_max_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_max_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_maximum_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_maximum_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_maximum_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_maximum_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_minimum_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_minimum_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_minimum_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_minimum_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_mul_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_mul_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_mul_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_mul_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_neg_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_neg_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_neg_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_neg_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_norm_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_norm_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_norm_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_norm_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_pow_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_pow_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_pow_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_pow_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_reciprocal_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_reciprocal_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_reciprocal_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_reciprocal_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_round_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_round_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_round_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_round_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_rsqrt_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_rsqrt_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_rsqrt_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_rsqrt_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sigmoid_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sigmoid_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sigmoid_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sigmoid_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sign_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sign_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sign_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sign_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sin_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sin_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sin_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sin_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sinh_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sinh_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sinh_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sinh_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sqrt_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sqrt_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sqrt_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sqrt_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sub_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sub_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sub_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sub_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tan_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tan_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tan_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tan_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tanh_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tanh_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tanh_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tanh_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_trunc_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_trunc_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_trunc_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_trunc_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_zero_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_zero_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_zero_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_zero_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_False_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_False_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_False_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_False_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_True_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_True_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_True_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_True_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_False_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_False_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_False_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_False_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_True_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_True_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_True_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_True_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_div_reciprocal_cuda, test/test_foreach.py::TestForeachCUDA::test_foreach_check_stride_ignore_dims_of_one_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes_large_input_cuda, test/test_foreach.py::TestForeachCUDA::test_foreach_l2_large_value_input__foreach_norm_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_l2_large_value_input__foreach_norm_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_abs_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_acos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_addcdiv_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_addcmul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_asin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_atan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_ceil_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_cos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_cosh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_erf_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_erfc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_floor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_frac_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_lerp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_lgamma_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_log10_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_log1p_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_log2_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_log_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_neg_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_round_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sign_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sinh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_trunc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_zero_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_abs_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_acos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_addcdiv_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_addcmul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_asin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_atan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_ceil_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_cos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_cosh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_erf_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_erfc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_floor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_frac_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_lerp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_lgamma_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_log10_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_log1p_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_log2_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_log_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_neg_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_round_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sign_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sinh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_trunc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_tensors_on_different_devices__foreach_addcdiv_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_tensors_on_different_devices__foreach_addcdiv_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_tensors_on_different_devices__foreach_addcmul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_tensors_on_different_devices__foreach_addcmul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_tensors_grouping_cuda, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_uint8
2025-12-04T16:42:06.0692010Z 
2025-12-04T16:42:06.0692365Z Finished test_foreach 1/1 ... [2025-12-04 16:42:05.668435][26954.051326727], took 10.30min
2025-12-04T16:42:06.0693476Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_foreach/test_foreach-aa4419a4e7b6d381.xml
2025-12-04T16:42:06.0694487Z Running xpu/test_gemm 1/1 ... [2025-12-04 16:42:05.855809][26954.23870292]
2025-12-04T16:42:06.0695285Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:42:06.0696550Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'xpu/test_gemm.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:42:05.856264]
2025-12-04T16:42:11.7427547Z 
2025-12-04T16:42:11.7428756Z xpu/test_gemm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/xpu.test_gemm_1.1_f9c98ad78a8f930f_.log
2025-12-04T16:42:11.7429612Z Running 0 items in this shard:
2025-12-04T16:42:11.7429834Z 
2025-12-04T16:42:11.7430122Z Finished xpu/test_gemm 1/1 ... [2025-12-04 16:42:11.742540][26960.125437229], took 0.10min
2025-12-04T16:42:11.7774086Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/xpu.test_gemm/xpu.test_gemm-2cb9cf39de6aa2cf.xml
2025-12-04T16:42:11.8063767Z Running test_numpy_interop 1/1 ... [2025-12-04 16:42:11.806095][26960.188989215]
2025-12-04T16:42:11.8064621Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:42:11.8068158Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_numpy_interop.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:42:11.806521]
2025-12-04T16:42:19.8327170Z 
2025-12-04T16:42:19.8328468Z test_numpy_interop 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_numpy_interop_1.1_0cfaaa8b9ef10506_.log
2025-12-04T16:42:19.8347483Z Running 46 items in this shard: test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_bool, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_complex128, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_complex64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_float16, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_float32, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_float64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_int16, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_int32, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_int64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_int8, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_uint8, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_copy_mode_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_ctor_with_invalid_numpy_array_sequence_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_ctor_with_numpy_scalar_ctor_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_empty_tensors_interop_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_from_list_of_ndarray_warning_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_from_numpy_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_from_numpy_no_leak_on_invalid_dtype_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_from_numpy_zero_element_type_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_has_storage_numpy_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_multiplication_numpy_scalar_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_ndarray_astype_object_graph_break_2_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_ndarray_astype_object_graph_break_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_array_interface_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_index_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_index_multi_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_non_writeable_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_bfloat16, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_bool, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_complex128, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_complex64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_float16, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_float32, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_float64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_int16, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_int32, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_int64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_int8, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_uint8, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_unresizable_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_parse_numpy_int_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_parse_numpy_int_overflow_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_to_numpy_bool_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_to_numpy_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_to_numpy_force_argument_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_to_numpy_zero_tensor_cuda
2025-12-04T16:42:19.8365969Z 
2025-12-04T16:42:19.8366347Z Finished test_numpy_interop 1/1 ... [2025-12-04 16:42:19.832534][26968.215430871], took 0.13min
2025-12-04T16:42:19.8670086Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_numpy_interop/test_numpy_interop-660870d95235d56d.xml
2025-12-04T16:42:19.9367942Z Running profiler/test_cpp_thread 1/1 ... [2025-12-04 16:42:19.936509][26968.319403393]
2025-12-04T16:42:19.9368509Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:42:19.9372173Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_cpp_thread.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:42:19.936951]
2025-12-04T16:42:30.7174079Z 
2025-12-04T16:42:30.7175359Z profiler/test_cpp_thread 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_cpp_thread_1.1_6bc17e34ef07b5a0_.log
2025-12-04T16:42:30.7179166Z Running 6 items in this shard: test/profiler/test_cpp_thread.py::CppThreadTestCUDA::test_profile_memory_cuda, test/profiler/test_cpp_thread.py::CppThreadTestCUDA::test_with_enable_profiler_in_child_thread_cuda, test/profiler/test_cpp_thread.py::CppThreadTestCUDA::test_without_enable_profiler_in_child_thread_cuda, test/profiler/test_cpp_thread.py::CppThreadTestXPU::test_profile_memory_xpu, test/profiler/test_cpp_thread.py::CppThreadTestXPU::test_with_enable_profiler_in_child_thread_xpu, test/profiler/test_cpp_thread.py::CppThreadTestXPU::test_without_enable_profiler_in_child_thread_xpu
2025-12-04T16:42:30.7182112Z 
2025-12-04T16:42:30.7182485Z Finished profiler/test_cpp_thread 1/1 ... [2025-12-04 16:42:30.717263][26979.100158034], took 0.18min
2025-12-04T16:42:30.7555963Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/profiler.test_cpp_thread/profiler.test_cpp_thread-31559e2ba96f64a3.xml
2025-12-04T16:42:30.8345064Z Running test_hub 1/1 ... [2025-12-04 16:42:30.834201][26979.21709463]
2025-12-04T16:42:30.8345601Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:42:30.8348712Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_hub.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:42:30.834637]
2025-12-04T16:42:46.2721822Z 
2025-12-04T16:42:46.2722703Z test_hub 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_hub_1.1_af317e8677316cdb_.log
2025-12-04T16:42:46.2728139Z Running 20 items in this shard: test/test_hub.py::TestHub::test_download_url_to_file, test/test_hub.py::TestHub::test_get_set_dir, test/test_hub.py::TestHub::test_hub_parse_repo_info, test/test_hub.py::TestHub::test_list_entrypoints, test/test_hub.py::TestHub::test_load_commit_from_forked_repo, test/test_hub.py::TestHub::test_load_from_branch, test/test_hub.py::TestHub::test_load_from_github, test/test_hub.py::TestHub::test_load_from_local_dir, test/test_hub.py::TestHub::test_load_legacy_zip_checkpoint, test/test_hub.py::TestHub::test_load_state_dict_from_url, test/test_hub.py::TestHub::test_load_zip_1_6_checkpoint, test/test_hub.py::TestHub::test_trust_repo_builtin_trusted_owners, test/test_hub.py::TestHub::test_trust_repo_check_no, test/test_hub.py::TestHub::test_trust_repo_check_yes, test/test_hub.py::TestHub::test_trust_repo_false_emptystring, test/test_hub.py::TestHub::test_trust_repo_false_no, test/test_hub.py::TestHub::test_trust_repo_legacy, test/test_hub.py::TestHub::test_trust_repo_none, test/test_hub.py::TestHub::test_trust_repo_true, test/test_hub.py::TestHub::test_trusted_repo_false_yes
2025-12-04T16:42:46.2733574Z 
2025-12-04T16:42:46.2733837Z Finished test_hub 1/1 ... [2025-12-04 16:42:46.271922][26994.654819168], took 0.26min
2025-12-04T16:42:46.3066338Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_hub/test_hub-33a47573ff45c77e.xml
2025-12-04T16:42:46.3867393Z Running test_segment_reductions 1/1 ... [2025-12-04 16:42:46.386407][26994.769301074]
2025-12-04T16:42:46.3868057Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:42:46.3871273Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_segment_reductions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:42:46.386854]
2025-12-04T16:42:56.7164090Z 
2025-12-04T16:42:56.7165402Z test_segment_reductions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_segment_reductions_1.1_c6d7e787931576c3_.log
2025-12-04T16:42:56.7207285Z Running 74 items in this shard: test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_unsafe_flag_cuda_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_unsafe_flag_cuda_int64
2025-12-04T16:42:56.7248030Z 
2025-12-04T16:42:56.7248377Z Finished test_segment_reductions 1/1 ... [2025-12-04 16:42:56.716270][27005.099164802], took 0.17min
2025-12-04T16:42:56.7518242Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_segment_reductions/test_segment_reductions-ad616dd6940e0de0.xml
2025-12-04T16:42:56.8307107Z Running test_autograd_fallback 1/1 ... [2025-12-04 16:42:56.830382][27005.213274943]
2025-12-04T16:42:56.8307694Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:42:56.8310564Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_autograd_fallback.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:42:56.830825]
2025-12-04T16:43:02.6538942Z 
2025-12-04T16:43:02.6539920Z test_autograd_fallback 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_autograd_fallback_1.1_60e7b253f9787096_.log
2025-12-04T16:43:02.6554505Z Running 28 items in this shard: test/test_autograd_fallback.py::TestAutogradFallback::test_autograd_function_registered_to_cpu_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_autograd_function_registered_to_cpu_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_base_does_not_require_grad_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_base_does_not_require_grad_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_composite_registered_to_cpu_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_composite_registered_to_cpu_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_cpu_return_self_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_cpu_return_self_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_inplace_autograd_function_registered_to_cpu_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_inplace_autograd_function_registered_to_cpu_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_inplace_on_tensor_that_does_not_require_grad_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_inplace_on_tensor_that_does_not_require_grad_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_no_autograd_kernel_inplace_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_no_autograd_kernel_inplace_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_no_autograd_kernel_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_no_autograd_kernel_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_no_grad_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_no_grad_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_post_autograd_returns_leaf_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_post_autograd_returns_leaf_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_post_autograd_returns_mix_of_requires_grad_tensors_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_post_autograd_returns_mix_of_requires_grad_tensors_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_supports_tensor_lists_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_supports_tensor_lists_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_undefined_grads_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_undefined_grads_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_undefined_inputs_outputs_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_undefined_inputs_outputs_mode_warn
2025-12-04T16:43:02.6568482Z 
2025-12-04T16:43:02.6568831Z Finished test_autograd_fallback 1/1 ... [2025-12-04 16:43:02.653714][27011.03661036], took 0.10min
2025-12-04T16:43:02.6889678Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_autograd_fallback/test_autograd_fallback-e1a7bbd98afc63dc.xml
2025-12-04T16:43:02.7247780Z Running test_type_hints 1/1 ... [2025-12-04 16:43:02.724484][27011.107378218]
2025-12-04T16:43:02.7248347Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:43:02.7251603Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_type_hints.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:43:02.724908]
2025-12-04T16:43:08.3477918Z 
2025-12-04T16:43:08.3478835Z test_type_hints 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_type_hints_1.1_d9336b501fe8992b_.log
2025-12-04T16:43:08.3479934Z Running 1 items in this shard: test/test_type_hints.py::TestTypeHints::test_doc_examples
2025-12-04T16:43:08.3480416Z 
2025-12-04T16:43:08.3480708Z Finished test_type_hints 1/1 ... [2025-12-04 16:43:08.347558][27016.730455185], took 0.09min
2025-12-04T16:43:08.3824812Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_type_hints/test_type_hints-d14fd0906e097d86.xml
2025-12-04T16:43:08.4278776Z Running functorch/test_aot_joint_with_descriptors 1/1 ... [2025-12-04 16:43:08.427573][27016.810466783]
2025-12-04T16:43:08.4279453Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:43:08.4282853Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_aot_joint_with_descriptors.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:43:08.428014]
2025-12-04T16:43:26.5693116Z 
2025-12-04T16:43:26.5694284Z functorch/test_aot_joint_with_descriptors 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_aot_joint_with_descriptors_1.1_948ec5a85f7c1f8f_.log
2025-12-04T16:43:26.5705337Z Running 17 items in this shard: test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_conv_bn_module, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_custom_op_stack_trace, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_export_and_compile, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_fx_utils_conv_bn_module, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_fx_utils_multiple_outputs, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_fx_utils_node_consistency, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_fx_utils_simple_linear, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_in_out_specs, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_module_with_kwargs, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_multiple_outputs_module, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_no_annotation_on_gradient_acc_nodes, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_preserve_annotate_flex_attention, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_preserve_annotate_function, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_preserve_annotate_replay_view, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_preserve_annotate_simple, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_simple_linear_module, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_static_input_indices
2025-12-04T16:43:26.5715274Z 
2025-12-04T16:43:26.5715822Z Finished functorch/test_aot_joint_with_descriptors 1/1 ... [2025-12-04 16:43:26.569070][27034.951967153], took 0.30min
2025-12-04T16:43:26.6048932Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_aot_joint_with_descriptors/functorch.test_aot_joint_with_descriptors-79fd9b229bc0c00b.xml
2025-12-04T16:43:26.6787286Z Running test_fx_reinplace_pass 1/1 ... [2025-12-04 16:43:26.678362][27035.061255298]
2025-12-04T16:43:26.6787896Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:43:26.6790297Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_fx_reinplace_pass.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:43:26.678792]
2025-12-04T16:43:32.8580596Z 
2025-12-04T16:43:32.8581590Z test_fx_reinplace_pass 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_fx_reinplace_pass_1.1_8f7033a49b0aaa2e_.log
2025-12-04T16:43:32.8587523Z Running 12 items in this shard: test/test_fx_reinplace_pass.py::TestReinplacePass::test_out_node_updated, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_basic, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_different_metadata, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_index_mutation, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_overlapping_memory, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_scatter_op, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_scatter_twice, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_scatter_twice_with_different_view_op_invalid, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_scatter_twice_with_different_view_op_invalid2, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_scatter_twice_with_different_view_op_valid, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_sym_input, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_with_view
2025-12-04T16:43:32.8593191Z 
2025-12-04T16:43:32.8593524Z Finished test_fx_reinplace_pass 1/1 ... [2025-12-04 16:43:32.857851][27041.240747659], took 0.10min
2025-12-04T16:43:32.8930572Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_fx_reinplace_pass/test_fx_reinplace_pass-047146b9ff22e4f6.xml
2025-12-04T16:43:32.9781035Z Running functorch/test_control_flow 2/2 ... [2025-12-04 16:43:32.977781][27041.360674385]
2025-12-04T16:43:32.9781638Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:43:32.9784654Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_control_flow.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:43:32.978208]
2025-12-04T17:04:34.4615330Z 
2025-12-04T17:04:34.4616379Z functorch/test_control_flow 2/2 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_control_flow_2.2_2e5432104edc7835_.log
2025-12-04T17:04:34.5414256Z Running 950 items in this shard: test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_different_pytree_output, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_grad_through_cond, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_nested, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_pytree_not_all_inputs_used, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_simple, test/functorch/test_control_flow.py::TestControlFlow::test_cond_gpu, test/functorch/test_control_flow.py::TestControlFlow::test_cond_in_forloop, test/functorch/test_control_flow.py::TestControlFlow::test_cond_no_trace, test/functorch/test_control_flow.py::TestControlFlow::test_map_autograd_no_grad_output, test/functorch/test_control_flow.py::TestControlFlow::test_map_autograd_simple_partial_grad, test/functorch/test_control_flow.py::TestControlFlow::test_map_gpu, test/functorch/test_control_flow.py::TestControlFlow::test_map_list_in_out, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_carry_output_alias, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_compile_mode_eager_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_compile_mode_none_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_additional_inputs_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_complex_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_random_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_xs_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_additional_inputs_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_complex_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_random_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_xs_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_complex_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_init_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_xs_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_none_partial_grad_additional_inputs_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_none_partial_grad_complex_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_none_partial_grad_complex_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_none_partial_grad_random_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_cnt_reverse_False_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_cnt_reverse_True_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_cnt_reverse_True_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_none_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_none_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_eager_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cpu_complex64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cpu_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cuda_float16, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cuda_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cpu_float16, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cpu_float32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cpu_int32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cuda_int32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cuda_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cpu_complex64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cpu_float16, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cpu_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cuda_float32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cuda_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cpu_float16, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cpu_float32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cpu_int32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cpu_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cuda_complex64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cuda_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_scanned_0, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_wrong_pytree_carry_shape, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_wrong_pytree_complex_reverse_False_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_wrong_pytree_complex_reverse_True_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_wrong_pytree_init_shorter_carry, test/functorch/test_control_flow.py::TestControlFlow::test_scan_input_carry_alias, test/functorch/test_control_flow.py::TestControlFlow::test_scan_input_mutation, test/functorch/test_control_flow.py::TestControlFlow::test_scan_input_output_alias, test/functorch/test_control_flow.py::TestControlFlow::test_scan_multiple_layers_gradient_layers_1_device_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_multiple_layers_gradient_layers_2_device_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_multiple_layers_gradient_layers_3_device_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_one_return, test/functorch/test_control_flow.py::TestControlFlow::test_scan_simple_graph, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_while_loop_gpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_generic_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_none_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_none_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_none_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_none_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_none_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_combine_mode_pointwise_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_dynamic_shape_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_eager_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_eager_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_eager_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_eager_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_none_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_none_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_none_combine_mode_generic_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_none_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_none_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_none_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_eager_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_eager_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_eager_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_none_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_none_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_none_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_compile_dynamic_shape_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_compile_dynamic_shape_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_eager_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_eager_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_none_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_none_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_combine_mode_pointwise_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_eager_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_eager_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_eager_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_eager_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_none_combine_mode_pointwise_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_none_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_combine_mode_pointwise_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_dynamic_shape_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_eager_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_eager_combine_mode_pointwise_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_eager_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_eager_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_none_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_eager_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_eager_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_none_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_none_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_none_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_eager_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_eager_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_eager_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_eager_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_none_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_none_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_eager_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_eager_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_eager_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_eager_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_none_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_eager_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_none_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_False_same_direction_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_False_same_direction_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_False_same_direction_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_False_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_True_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_True_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_eager_reverse_first_False_same_direction_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_eager_reverse_first_False_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_eager_reverse_first_True_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_eager_reverse_first_True_same_direction_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_none_reverse_first_False_same_direction_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_none_reverse_first_False_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_none_reverse_first_True_same_direction_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_none_reverse_first_True_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_False_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_False_same_direction_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_False_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_False_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_False_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_True_same_direction_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_True_same_direction_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_True_same_direction_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_True_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_eager_reverse_first_False_same_direction_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_eager_reverse_first_False_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_eager_reverse_first_False_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_eager_reverse_first_True_same_direction_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_False_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_False_same_direction_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_False_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_False_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_True_same_direction_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_True_same_direction_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_none_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_none_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_dynamic_shape_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_eager_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_eager_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_eager_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_eager_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_none_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_none_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_none_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_False_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_False_cpu_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_False_cuda_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_False_cuda_combine_mode_pointwise_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_True_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_True_cpu_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_True_cuda_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_True_cuda_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_True_cuda_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_False_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_False_cuda_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_False_cuda_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_True_cpu_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_True_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_True_cpu_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_True_cuda_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_True_cuda_combine_mode_pointwise_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_True_cuda_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cpu_combine_mode_pointwise_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cpu_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cuda_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cuda_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cuda_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_True_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_False_cpu_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_False_cuda_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_False_cuda_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cpu_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cpu_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cuda_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cuda_combine_mode_pointwise_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cuda_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_dynamic_shape_loop_type_for_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_dynamic_shape_loop_type_for_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_dynamic_shape_loop_type_for_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_dynamic_shape_loop_type_for_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_loop_type_for_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_loop_type_for_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_loop_type_for_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_loop_type_for_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_loop_type_for_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_eager_loop_type_for_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_eager_loop_type_for_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_eager_loop_type_for_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_eager_loop_type_for_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_none_loop_type_for_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_none_loop_type_for_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_none_loop_type_for_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_none_loop_type_for_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_none_loop_type_for_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_map_in_combine_fn, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_dynamic_shape_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_dynamic_shape_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_eager_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_eager_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_eager_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_eager_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_eager_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_compile_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_compile_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_compile_dynamic_shape_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_compile_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_compile_dynamic_shape_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_output_output_alias, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_compile_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_compile_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_eager_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_eager_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_none_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_none_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_none_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_pointwise_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_pointwise_compile_mode_compile_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_pointwise_compile_mode_eager_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_pointwise_compile_mode_eager_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_pointwise_compile_mode_none_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_generic_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_generic_compile_mode_eager_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_compile_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_compile_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_eager_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_eager_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_eager_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_none_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_eager_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_eager_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_none_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_none_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_none_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_none_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_wrong_pytree, test/functorch/test_control_flow.py::TestControlFlowTraced::test_compile_while_loop_stack_output_dynamic_False_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_compile_while_loop_stack_output_dynamic_False_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_compile_while_loop_stack_output_dynamic_True_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_autograd_backward, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_eager_run_with_item, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_data_dependent_pred, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_input_aliasing_with_aot_func, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_input_mutation_on_false_branch, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_input_mutation_on_true_branch, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_nested_input_mutation_with_aot_func, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_output_alias_input, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_mismatched_branch_output_dynamic_False_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_mismatched_branch_output_dynamic_False_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_mismatched_branch_strided_output_dynamic_True_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_nested_traced_fake_tensor, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_nested_traced_other_inputs, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_nested_traced_other_inputs_fake_tensor, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_nested_with_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_simple_with_linear_compile_check_graph, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_trace_set__and_mutate_intermediate, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_traced_not_nested_fake_tensor, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_function_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_function_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_function_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_function_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_object_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_object_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_object_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_function_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_function_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_function_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_object_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_object_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_object_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_object_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_function_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_function_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_function_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_function_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_function_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_module_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_module_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_module_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_object_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_object_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_object_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_module_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_module_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_module_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_multiple_args_with_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_multiple_outputs_nClosure_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_multiple_outputs_nClosure_1, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_function_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_function_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_module_nOperands_2_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_module_nOperands_2_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_object_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_object_nOperands_2_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_object_nOperands_2_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_single_input_with_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_with_consecutive_make_fx_symbolic, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_with_module_param_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_with_tensor_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_input_input_alias, test/functorch/test_control_flow.py::TestControlFlowTraced::test_input_mutation_inference_mode_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_map_functionalized_aot_func, test/functorch/test_control_flow.py::TestControlFlowTraced::test_map_functionalized_elem_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_merge_output, test/functorch/test_control_flow.py::TestControlFlowTraced::test_raise_error_on_mismatch_type_size, test/functorch/test_control_flow.py::TestControlFlowTraced::test_scan_in_vmap_mixed_batch_dims, test/functorch/test_control_flow.py::TestControlFlowTraced::test_scan_in_vmap_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_scan_in_vmap_unbatched_init_error, test/functorch/test_control_flow.py::TestControlFlowTraced::test_scan_in_vmap_unbatched_x, test/functorch/test_control_flow.py::TestControlFlowTraced::test_scan_vmap_scan_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_tracing_map_autograd_aot_functionalized, test/functorch/test_control_flow.py::TestControlFlowTraced::test_tracing_map_real, test/functorch/test_control_flow.py::TestControlFlowTraced::test_tracing_map_symbolic_list, test/functorch/test_control_flow.py::TestControlFlowTraced::test_tracing_map_symbolic_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_vmap_vmap_boolcond_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_autograd_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_const_and_symint_output, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_int_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_pytree_int_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_const_and_symint_output, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_int_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_nested2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_nested_with_linear, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_simple_with_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_simple_with_pytree_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_cpp_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_cpp_while_loop_test_nested2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_cpp_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_cpp_while_loop_test_simple_with_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_functorch_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_functorch_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_functorch_while_loop_test_simple_with_pytree_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_no_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_no_while_loop_test_simple_with_pytree_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_python_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_python_while_loop_test_nested2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_python_while_loop_test_simple_with_pytree_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_nested2_traced, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_nested_traced, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_constant_and_symint_output_export_strict_False_dynamic_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_constant_and_symint_output_export_strict_False_dynamic_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_constant_and_symint_output_export_strict_True_dynamic_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_constant_and_symint_output_export_strict_True_dynamic_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_int_carry_export_strict_False_dynamic_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_int_carry_export_strict_True_dynamic_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_compile_dynamic_False_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_compile_dynamic_False_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_compile_dynamic_True_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_compile_dynamic_True_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_export_strict_False_dynamic_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_export_strict_True_dynamic_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_export_strict_True_dynamic_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_simple_functionalize_check_graph_func_type_cpp, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_simple_functionalize_check_graph_func_type_functorch, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_tracing_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_tracing_while_loop_test_nested_with_linear, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_tracing_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_tracing_while_loop_test_simple_with_linear, test/functorch/test_control_flow.py::TestHopSchema::test_associative_scan_gen_schema_multiple_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_associative_scan_gen_schema_with_additional_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_cond_gen_schema_symbool_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_cond_gen_schema_tensor_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_ScriptObj, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_float, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_int, test/functorch/test_control_flow.py::TestHopSchema::test_schema_tree_spec, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_GraphModule, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_ScriptObj, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_SymBool, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_SymInt, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_Tensor, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_float, test/functorch/test_control_flow.py::TestHopSchema::test_while_loop_gen_schema_with_additional_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_while_loop_gen_schema_with_int_carries
2025-12-04T17:04:34.6142547Z 
2025-12-04T17:04:34.6142968Z Finished functorch/test_control_flow 2/2 ... [2025-12-04 17:04:34.484435][28302.86732389], took 21.03min
2025-12-04T17:04:34.6144317Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_control_flow/functorch.test_control_flow-922a9914156e0312.xml
2025-12-04T17:04:36.0860652Z Uploading artifacts took 1.47 seconds
2025-12-04T17:04:36.0864625Z Running test_subclass 1/1 ... [2025-12-04 17:04:36.086263][28304.469158108]
2025-12-04T17:04:36.0865223Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:04:36.0869673Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_subclass.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:04:36.086734]
2025-12-04T17:04:42.0600278Z 
2025-12-04T17:04:42.0601322Z test_subclass 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_subclass_1.1_b65d4f741f14f053_.log
2025-12-04T17:04:42.0645441Z Running 100 items in this shard: test/test_subclass.py::TestSubclass::test_deepcopy_base_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_base_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_diag_tensor_below_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_diag_tensor_below_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_logging_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_logging_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_non_wrapper_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_non_wrapper_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_sparse_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_sparse_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_wrapper_with_custom_sizes_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_wrapper_with_custom_sizes_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_wrapper_with_custom_strides_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_wrapper_with_custom_strides_as_param_True, test/test_subclass.py::TestSubclass::test_lazy_module_base_tensor, test/test_subclass.py::TestSubclass::test_lazy_module_diag_tensor_below, test/test_subclass.py::TestSubclass::test_lazy_module_logging_tensor, test/test_subclass.py::TestSubclass::test_lazy_module_non_wrapper_tensor, test/test_subclass.py::TestSubclass::test_lazy_module_sparse_tensor, test/test_subclass.py::TestSubclass::test_lazy_module_wrapper_with_custom_sizes, test/test_subclass.py::TestSubclass::test_lazy_module_wrapper_with_custom_strides, test/test_subclass.py::TestSubclass::test_module_optimization_base_tensor, test/test_subclass.py::TestSubclass::test_module_optimization_diag_tensor_below, test/test_subclass.py::TestSubclass::test_module_optimization_logging_tensor, test/test_subclass.py::TestSubclass::test_module_optimization_non_wrapper_tensor, test/test_subclass.py::TestSubclass::test_module_optimization_sparse_tensor, test/test_subclass.py::TestSubclass::test_module_optimization_wrapper_with_custom_sizes, test/test_subclass.py::TestSubclass::test_module_optimization_wrapper_with_custom_strides, test/test_subclass.py::TestSubclass::test_non_rewrapping_torch_dispatch_subclass_as_parameter_throws_for_detach, test/test_subclass.py::TestSubclass::test_param_invariants_base_tensor_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_base_tensor_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_diag_tensor_below_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_diag_tensor_below_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_logging_tensor_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_logging_tensor_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_non_wrapper_tensor_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_non_wrapper_tensor_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_sparse_tensor_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_sparse_tensor_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_wrapper_with_custom_sizes_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_wrapper_with_custom_sizes_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_wrapper_with_custom_strides_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_wrapper_with_custom_strides_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_parametrization_base_tensor_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_base_tensor_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_diag_tensor_below_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_diag_tensor_below_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_logging_tensor_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_logging_tensor_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_non_wrapper_tensor_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_non_wrapper_tensor_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_sparse_tensor_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_sparse_tensor_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_wrapper_with_custom_sizes_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_wrapper_with_custom_sizes_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_wrapper_with_custom_strides_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_wrapper_with_custom_strides_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_repr_base_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_repr_base_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_repr_diag_tensor_below_as_param_False, test/test_subclass.py::TestSubclass::test_repr_diag_tensor_below_as_param_True, test/test_subclass.py::TestSubclass::test_repr_logging_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_repr_logging_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_repr_non_wrapper_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_repr_non_wrapper_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_repr_sparse_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_repr_sparse_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_repr_wrapper_with_custom_sizes_as_param_False, test/test_subclass.py::TestSubclass::test_repr_wrapper_with_custom_sizes_as_param_True, test/test_subclass.py::TestSubclass::test_repr_wrapper_with_custom_strides_as_param_False, test/test_subclass.py::TestSubclass::test_repr_wrapper_with_custom_strides_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_base_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_base_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_diag_tensor_below_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_diag_tensor_below_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_logging_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_logging_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_non_wrapper_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_non_wrapper_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_sparse_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_sparse_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_wrapper_with_custom_sizes_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_wrapper_with_custom_sizes_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_wrapper_with_custom_strides_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_wrapper_with_custom_strides_as_param_True, test/test_subclass.py::TestSubclass::test_tensor_subclass_storage_data_accesses_throw, test/test_subclass.py::TestSubclass::test_type_propagation_base_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_base_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_diag_tensor_below_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_diag_tensor_below_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_logging_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_logging_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_non_wrapper_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_non_wrapper_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_sparse_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_sparse_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_wrapper_with_custom_sizes_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_wrapper_with_custom_sizes_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_wrapper_with_custom_strides_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_wrapper_with_custom_strides_as_param_True
2025-12-04T17:04:42.0689007Z 
2025-12-04T17:04:42.0689299Z Finished test_subclass 1/1 ... [2025-12-04 17:04:42.059996][28310.442891746], took 0.10min
2025-12-04T17:04:42.0959595Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_subclass/test_subclass-68565895e4fc66ea.xml
2025-12-04T17:04:42.1234970Z Running functorch/test_vmap_registrations 1/1 ... [2025-12-04 17:04:42.123242][28310.506136608]
2025-12-04T17:04:42.1235601Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:04:42.1239024Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_vmap_registrations.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:04:42.123655]
2025-12-04T17:04:52.2530499Z 
2025-12-04T17:04:52.2531627Z functorch/test_vmap_registrations 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_vmap_registrations_1.1_8a0424ce5b3ca65e_.log
2025-12-04T17:04:52.3812424Z Running 1723 items in this shard: test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[_test::cat], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[_test::get_first], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[_test::leaky_relu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__and__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__and__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__iand__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__iand__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__ior__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__ior__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__ixor__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__ixor__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__or__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__or__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__xor__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__xor__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_add_batch_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_autocast_to_full_precision], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_autocast_to_reduced_precision], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_batch_norm_impl_index], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_batch_norm_impl_index_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cast_Byte], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cast_Char], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cast_Double], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cast_Float], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cast_Half], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cast_Int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cast_Long], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cast_Short], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_choose_qparams_per_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_convolution.deprecated], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_convolution_double_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_convolution_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cufft_clear_plan_cache], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cufft_get_plan_cache_max_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cufft_get_plan_cache_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cufft_set_plan_cache_max_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_debug_has_internal_overlap], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_dim_arange], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_embedding_bag_sparse_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_fused_rms_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_gather_sparse_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_grid_sampler_2d_cpu_fallback_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_has_compatible_shallow_copy_type], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_is_zerotensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_lu_with_info], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_nnpack_available], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_pack_padded_sequence_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_pad_circular], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_pad_enum], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_pad_packed_sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_propagate_xla_data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_remove_batch_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_reshape_from_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_rowwise_prune], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_saturate_weight_to_fp16], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_scaled_dot_product_attention_math], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_shape_as_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sobol_engine_draw], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sobol_engine_ff_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sobol_engine_initialize_state_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sobol_engine_scramble_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_bsc_tensor_unsafe], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_bsr_tensor_unsafe], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_compressed_tensor_unsafe], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_coo_tensor_unsafe], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_csc_tensor_unsafe], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_csr_tensor_unsafe], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_log_softmax.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_log_softmax.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_mm.reduce], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_mm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_softmax.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_softmax.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_sum.dim_dtype], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_sum.dtype], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_sum], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_test_ambiguous_defaults.a], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_test_ambiguous_defaults.b], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_test_autograd_multiple_dispatch.ntonly], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_test_check_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_test_serialization_subcmul], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_test_string_default], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_thnn_differentiable_gru_cell_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_thnn_differentiable_lstm_cell_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_thnn_fused_lstm_cell_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_to_cpu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_unpack_dual], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_upsample_bicubic2d_aa.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_upsample_bilinear2d_aa.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_upsample_nearest_exact1d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_upsample_nearest_exact2d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_upsample_nearest_exact3d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_use_cudnn_rnn_flatten_weight], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_validate_sparse_bsc_tensor_args], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_validate_sparse_bsr_tensor_args], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_validate_sparse_compressed_tensor_args], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_validate_sparse_coo_tensor_args], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_validate_sparse_csc_tensor_args], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_validate_sparse_csr_tensor_args], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_version], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_weight_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_weight_norm_differentiable_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_wrapped_linear_prepack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_wrapped_quantized_linear_prepacked], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::absolute.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::absolute], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::absolute_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::adaptive_avg_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::adaptive_avg_pool2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::adaptive_avg_pool3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::adaptive_max_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::adjoint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::affine_grid_generator_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::align_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::align_tensors], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::align_to.ellipsis_idx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::align_to], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::all.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::all.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::alpha_dropout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::alpha_dropout_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::any.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::any.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arccos.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arccos], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arccos_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arccosh.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arccosh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arccosh_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arcsin.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arcsin], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arcsin_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arcsinh.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arcsinh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arcsinh_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arctan.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arctan2.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arctan2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arctan2_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arctan], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arctan_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arctanh.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arctanh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arctanh_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::argsort.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::argsort.stable], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::argsort.stable_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::argsort], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::argwhere], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::atleast_1d.Sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::atleast_1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::atleast_2d.Sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::atleast_2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::atleast_3d.Sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::atleast_3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::avg_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::batch_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::bilinear], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::broadcast_tensors], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::broadcast_to], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::can_cast], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cartesian_prod], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cat.names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cat.names_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cdist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::chain_matmul.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::chain_matmul], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::chalf], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::choose_qparams_optimized], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::chunk], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::clip.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::clip.Tensor_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::clip.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::clip], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::clip_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::clip_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::coalesce], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::column_stack.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::column_stack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::combinations], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::concat.names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::concat.names_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::concat.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::concat], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::concatenate.names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::concatenate.names_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::concatenate.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::concatenate], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conj], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conj_physical], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::contiguous], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conv1d.padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conv1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conv2d.padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conv2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conv3d.padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conv3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conv_tbc_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conv_transpose1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conv_transpose2d.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conv_transpose3d.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::corrcoef], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cosine_embedding_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cosine_similarity], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cov], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cross.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cross], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cross_entropy_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::ctc_loss.IntList], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::ctc_loss.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cudnn_is_acceptable], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cummax.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cummax.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cummaxmin_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cummin.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cummin.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cumprod.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cumprod.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cumprod_.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cumprod_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cumsum.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cumsum.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cumsum_.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cumulative_trapezoid.dx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cumulative_trapezoid.x], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::det], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::diag.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::diag], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::diagflat], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::diagonal.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::diff.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::diff], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::divide.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::divide.Scalar_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::divide.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::divide.Tensor_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::divide.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::divide.out_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::divide_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::divide_.Scalar_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::divide_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::divide_.Tensor_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::dropout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::dropout_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::dsplit.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::dsplit.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::dstack.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::dstack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::einsum], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::embedding_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::embedding_bag.padding_idx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::embedding_bag], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::embedding_sparse_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::empty.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::expand_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fake_quantize_per_channel_affine], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fake_quantize_per_channel_affine_cachemask_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fake_quantize_per_tensor_affine.tensor_qparams], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fake_quantize_per_tensor_affine], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fake_quantize_per_tensor_affine_cachemask_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fbgemm_linear_fp16_weight.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fbgemm_linear_fp16_weight], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fbgemm_linear_fp16_weight_fp32_activation.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fbgemm_linear_fp16_weight_fp32_activation], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fbgemm_linear_int8_weight], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fbgemm_linear_int8_weight_fp32_activation], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fbgemm_linear_quantize_weight], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fbgemm_pack_gemm_matrix_fp16], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fbgemm_pack_quantized_matrix.KN], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fbgemm_pack_quantized_matrix], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::feature_alpha_dropout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::feature_alpha_dropout_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::feature_dropout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::feature_dropout_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_fft.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_fft2.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_fft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_fft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_fftn.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_fftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_fftshift], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_hfft.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_hfft2.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_hfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_hfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_hfftn.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_hfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ifft.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ifft2.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ifft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ifft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ifftn.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ifftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ifftshift], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ihfft.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ihfft2.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ihfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ihfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ihfftn.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ihfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_irfft.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_irfft2.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_irfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_irfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_irfftn.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_irfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_rfft.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_rfft2.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_rfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_rfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_rfftn.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_rfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fill_diagonal_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fix.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fix], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fix_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::flatten.DimnameList], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::flatten.named_out_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::flatten.using_ints], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::flatten.using_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::flatten_dense_tensors], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fliplr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::flipud], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::float_power.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::float_power.Scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::float_power.Tensor_Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::float_power.Tensor_Scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::float_power.Tensor_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::float_power.Tensor_Tensor_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::float_power_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::float_power_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::frobenius_norm.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::frobenius_norm.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fused_moving_avg_obs_fake_quant], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gather.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gather.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gather_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::ger.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::ger], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::get_gradients], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gradient.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gradient.scalararray], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gradient.scalarint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gradient.scalarrayarray], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gradient.scalarrayint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gradient.tensorarray], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gradient.tensorarrayint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater.Scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater.Tensor_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater_equal.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater_equal.Scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater_equal.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater_equal.Tensor_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater_equal_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater_equal_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::grid_sampler], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::group_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gru.data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gru.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gru_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::hinge_embedding_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::histogramdd.TensorList_bins], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::histogramdd.int_bins], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::histogramdd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::hsplit.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::hsplit.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::hstack.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::hstack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::imag], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::index_add.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::index_copy.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::index_copy_.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::index_fill.Dimname_Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::index_fill.Dimname_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::index_fill_.Dimname_Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::index_fill_.Dimname_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::index_select.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::index_select.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::index_select_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::infinitely_differentiable_gelu_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::inner.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::inner], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::instance_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::inverse.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::inverse], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::is_complex], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::is_conj], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::is_distributed], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::is_floating_point], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::is_inference], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::is_leaf], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::is_neg], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::is_nonzero], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::is_signed], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::is_vulkan_available], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::isclose], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::isfinite], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::isreal], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::istft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::item], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::kl_div], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::kron.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::kron], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::kthvalue.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::kthvalue.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::l1_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::layer_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::ldexp.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::ldexp.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::ldexp_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less.Scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less.Tensor_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less_equal.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less_equal.Scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less_equal.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less_equal.Tensor_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less_equal_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less_equal_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_cholesky.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_cholesky], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_cond.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_cond.p_str], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_cond.p_str_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_cond], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_det.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_det], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_diagonal], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_eigh.eigvals], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_eigh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_eigvals], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_eigvalsh.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_eigvalsh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_inv.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_inv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_ldl_factor.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_ldl_factor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_lu_factor.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_lu_factor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matmul.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matmul], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_norm.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_norm.str_ord], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_norm.str_ord_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_power.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_power], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_rank.atol_rtol_float], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_rank.atol_rtol_float_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_rank.atol_rtol_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_rank.atol_rtol_tensor_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_rank.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_rank.out_tol_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_rank.tol_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_rank], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_multi_dot.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_multi_dot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_norm.ord_str], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_norm.ord_str_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_norm.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_pinv.atol_rtol_float], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_pinv.atol_rtol_float_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_pinv.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_pinv.out_rcond_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_pinv.rcond_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_pinv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_slogdet.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_slogdet], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_solve.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_solve], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_solve_ex.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_solve_ex], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_svd.U], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_svd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_svdvals.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_svdvals], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_tensorinv.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_tensorinv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_tensorsolve.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_tensorsolve], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_vander], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_vecdot.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_vecdot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linear], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::log_sigmoid.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::log_sigmoid], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::log_softmax.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::log_softmax.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::logcumsumexp.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::logcumsumexp.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::logdet], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::logsumexp.names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::logsumexp.names_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::lstm.data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::lstm.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::lstm_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::lu_solve.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::lu_solve], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::mH], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::mT], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::margin_ranking_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::masked_select_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::matmul.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::matmul], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::matrix_H], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::matrix_exp], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::matrix_exp_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::matrix_power.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::matrix_power], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::max.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::max.names_dim_max], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::max.other], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::max.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::max_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::max_pool1d_with_indices], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::max_pool2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::max_pool3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::mean.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::mean.names_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::median.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::median.names_dim_values], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::meshgrid.indexing], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::meshgrid], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::min.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::min.names_dim_min], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::min.other], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::min.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::mish_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::mode.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::mode.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::moveaxis.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::moveaxis.intlist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::movedim.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::movedim.intlist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::msort.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::msort], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::multilabel_margin_loss.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::multilabel_margin_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::multiply.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::multiply.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::multiply.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::multiply_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::multiply_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nanmean.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nanmean], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nanmedian.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nanmedian.names_dim_values], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nanquantile.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nanquantile.scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nanquantile.scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nanquantile], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::narrow.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::narrow], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::native_channel_shuffle], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::negative.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::negative], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::negative_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nested_to_padded_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nll_loss.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nll_loss2d.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nll_loss2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nll_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nll_loss_nd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nonzero_numpy], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::norm.names_ScalarOpt_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::norm.names_ScalarOpt_dim_dtype], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::norm.names_dtype_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::norm.names_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::norm_except_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::not_equal.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::not_equal.Scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::not_equal.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::not_equal.Tensor_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::not_equal_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::not_equal_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nuclear_norm.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nuclear_norm.dim_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nuclear_norm.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nuclear_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::numpy_T], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::one_hot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::orgqr.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::orgqr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::outer.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::outer], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::output_nr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::pad], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::pad_sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::pairwise_distance], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::pdist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::pin_memory], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::pinverse], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::poisson_nll_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::positive], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::prelu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::prod.Dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::prod.dim_Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::promote_types], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::qr.Q], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::qr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::quantile.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::quantile.scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::quantile.scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::quantile], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::quantized_gru_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::quantized_lstm_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::quantized_rnn_relu_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::quantized_rnn_tanh_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rand.generator_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::randn.generator_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::randn.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::ravel], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::real], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::refine_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::relu6], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::relu6_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rename], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rename_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::repeat_interleave.self_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::repeat_interleave.self_int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::requires_grad_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::reshape], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::reshape_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::resolve_conj], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::resolve_neg], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::result_type.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::result_type.Scalar_Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::result_type.Scalar_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::result_type.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::retain_grad], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::retains_grad], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rms_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rnn_relu.data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rnn_relu.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rnn_relu_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rnn_tanh.data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rnn_tanh.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rnn_tanh_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::row_stack.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::row_stack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rrelu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rrelu_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::scaled_dot_product_attention], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::scatter.dimname_src], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::scatter.dimname_value], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::scatter_add.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::select.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::selu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::selu_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::set_.source_Tensor_storage_offset], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::set_data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::silu_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::size.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::size.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::slogdet.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::slogdet], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::slow_conv3d.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::slow_conv3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::smm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::softmax.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::softmax.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sort.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sort.dimname_stable], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sort.dimname_values], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sort.dimname_values_stable], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sparse_bsc_tensor.ccol_row_value], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sparse_bsc_tensor.ccol_row_value_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sparse_bsr_tensor.crow_col_value], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sparse_bsr_tensor.crow_col_value_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sparse_coo_tensor.indices], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sparse_coo_tensor.indices_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sparse_csc_tensor.ccol_row_value], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sparse_csc_tensor.ccol_row_value_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sparse_csr_tensor.crow_col_value], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sparse_csr_tensor.crow_col_value_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_digamma.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_digamma], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_erf.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_erf], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_erfc.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_erfc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_erfinv.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_erfinv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_exp2.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_exp2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_expit.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_expit], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_expm1.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_expm1], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_gammainc.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_gammainc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_gammaincc.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_gammaincc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_gammaln.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_gammaln], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_i0.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_i0], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_log1p.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_log1p], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_log_softmax], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_logit.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_logit], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_logsumexp.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_logsumexp], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_multigammaln.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_multigammaln], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_ndtr.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_ndtr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_polygamma.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_polygamma], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_psi.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_psi], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_round.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_round], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_sinc.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_sinc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_softmax], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_xlogy.other_scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_xlogy.other_scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_xlogy.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_xlogy.self_scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_xlogy.self_scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_xlogy], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::split.sizes], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::square.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::square], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::square_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::squeeze.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::squeeze_.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sspaddmm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std.correction_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std.correction_names_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std.names_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std_mean.correction_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std_mean.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std_mean.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std_mean], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::stft.center], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::stft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::stride.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::stride.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::subtract.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::subtract.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::subtract.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::subtract_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::subtract_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sum.DimnameList_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sum.dim_DimnameList], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sum_to_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::svd.U], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::svd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::swapaxes], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::swapaxes_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::swapdims], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::swapdims_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sym_is_contiguous], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sym_numel], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sym_size.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sym_storage_offset], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sym_stride.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::take_along_dim.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::take_along_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::tensor_split.indices], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::tensor_split.sections], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::tensor_split.tensor_indices_or_sections], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::tensordot.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::tensordot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::thnn_conv2d.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::thnn_conv2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::tile], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to.device], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to.dtype], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to.dtype_layout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to.other], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to_dense], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to_dense_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to_mkldnn_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to_sparse.sparse_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to_sparse], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to_sparse_bsc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to_sparse_bsr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to_sparse_csc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to_sparse_csr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::trace_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::transpose.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::trapezoid.dx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::trapezoid.x], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::trapz.dx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::trapz.x], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::triplet_margin_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::true_divide.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::true_divide.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::true_divide.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::true_divide_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::true_divide_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::type_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::unbind.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::unflatten.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::unflatten.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::unflatten_dense_tensors], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::unsafe_chunk], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::upsample_bicubic2d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::upsample_bilinear2d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::upsample_linear1d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::upsample_nearest1d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::upsample_nearest2d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::upsample_nearest3d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::upsample_trilinear3d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::value_selecting_reduction_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::vander], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var.correction_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var.correction_names_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var.names_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var_mean.correction_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var_mean.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var_mean.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var_mean], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::view_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::vsplit.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::vsplit.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::vstack.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::vstack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::where.ScalarOther], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::where.ScalarSelf], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::where.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::where], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[c10d_functional::all_gather_into_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[c10d_functional::all_gather_into_tensor_coalesced], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[c10d_functional::all_reduce], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[c10d_functional::all_reduce_coalesced], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[c10d_functional::all_to_all_single], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[c10d_functional::broadcast], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[c10d_functional::reduce_scatter_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[c10d_functional::reduce_scatter_tensor_coalesced], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[c10d_functional::wait_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[inductor::_alloc_from_pool], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[mkldnn::_is_mkldnn_acl_supported], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[mkldnn::_is_mkldnn_bf16_supported], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[mkldnn::_is_mkldnn_fp16_supported], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[prepacked::unpack_prepacked_sizes_conv2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[prepacked::unpack_prepacked_sizes_linear], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[profiler::_record_function_enter], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[profiler::_record_function_enter_new], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[profiler::_record_function_exit._RecordFunction], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[profiler::_record_function_exit], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv1d_unpack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv2d_dilation], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv2d_groups], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv2d_output_padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv2d_padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv2d_stride], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv2d_transpose], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv2d_unpack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv2d_unpack_sizes], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv3d_dilation], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv3d_groups], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv3d_output_padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv3d_padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv3d_stride], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv3d_transpose], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv3d_unpack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose1d_unpack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose2d_dilation], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose2d_groups], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose2d_output_padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose2d_padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose2d_stride], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose2d_transpose], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose2d_unpack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose3d_dilation], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose3d_groups], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose3d_output_padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose3d_padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose3d_stride], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose3d_transpose], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose3d_unpack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_unpack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::embedding_bag_unpack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::linear_unpack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::linear_unpack_fp16], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::make_quantized_cell_params_fp16], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[sparse::qlinear_unpack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__and__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__and__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__iand__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__iand__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__ior__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__ior__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__ixor__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__ixor__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__or__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__or__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__xor__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__xor__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_batch_norm_impl_index], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_convolution_double_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_convolution_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_fused_rms_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_has_compatible_shallow_copy_type], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_lu_with_info], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_pad_circular], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_scaled_dot_product_attention_math], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_test_check_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_upsample_bicubic2d_aa.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_upsample_bilinear2d_aa.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::absolute], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::absolute_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::adaptive_avg_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::adaptive_avg_pool2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::adaptive_avg_pool3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::adaptive_max_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::adjoint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::alias_copy], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arccos], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arccos_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arccosh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arccosh_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arcsin], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arcsin_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arcsinh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arcsinh_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arctan2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arctan2_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arctan], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arctan_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arctanh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arctanh_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::argsort.stable], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::argsort], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::as_strided_copy], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::atleast_1d.Sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::atleast_1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::atleast_2d.Sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::atleast_2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::atleast_3d.Sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::atleast_3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::avg_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::batch_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::broadcast_tensors], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::broadcast_to], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::cartesian_prod], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::cdist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::chunk], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::clip.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::clip], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::combinations], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::concat], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::concatenate], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::conj_physical], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::contiguous], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::conv1d.padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::conv1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::conv2d.padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::conv2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::conv3d.padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::conv3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::conv_transpose1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::conv_transpose2d.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::conv_transpose3d.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::corrcoef], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::cosine_embedding_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::cosine_similarity], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::cov], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::cross], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::cross_entropy_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::cumprod_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::cumulative_trapezoid.dx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::cumulative_trapezoid.x], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::det], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::diag], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::diagonal_copy], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::diff], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::divide.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::divide.Scalar_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::divide.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::divide.Tensor_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::divide_.Scalar_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::divide_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::divide_.Tensor_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::dropout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::dsplit.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::dsplit.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::dstack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::einsum], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::embedding_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::expand_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_fft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_fft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_fftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_fftshift], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_hfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_hfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_hfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_ifft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_ifft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_ifftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_ifftshift], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_ihfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_irfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_irfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_irfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_rfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_rfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_rfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fix], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::flatten.using_ints], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fliplr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::flipud], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::float_power.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::float_power.Tensor_Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::float_power.Tensor_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::frobenius_norm.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::gather_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::ger], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::gradient.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::gradient.scalararray], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::gradient.scalarint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::gradient.scalarrayarray], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::gradient.scalarrayint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::gradient.tensorarray], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::gradient.tensorarrayint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::greater.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::greater.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::greater_equal.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::greater_equal.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::grid_sampler], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::group_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::hinge_embedding_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::hsplit.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::hsplit.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::hstack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::imag], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::index_select_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::inner], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::instance_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::inverse], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::is_complex], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::is_same_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::isfinite], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::isreal], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::kron], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::l1_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::layer_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::ldexp.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::less.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::less.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::less_equal.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::less_equal.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_cholesky], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_cond], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_det], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_diagonal], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_eigh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_eigvals], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_eigvalsh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_inv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_ldl_factor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_lu_factor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_matmul], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_matrix_norm.str_ord], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_matrix_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_matrix_power], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_matrix_rank.atol_rtol_float], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_matrix_rank.atol_rtol_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_multi_dot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_norm.ord_str], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_pinv.atol_rtol_float], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_pinv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_solve], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_solve_ex], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_svd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_svdvals], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_tensorinv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_vander], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_vecdot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linear], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::log_sigmoid], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::log_softmax.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::logdet], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::mH], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::mT], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::matmul], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::matrix_H], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::matrix_exp], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::matrix_power], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::max.other], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::max_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::max_pool1d_with_indices], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::max_pool2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::max_pool3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::meshgrid.indexing], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::meshgrid], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::min.other], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::moveaxis.intlist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::movedim.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::movedim.intlist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::msort], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::multiply.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::multiply.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::multiply_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::multiply_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::nanmean], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::narrow], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::negative], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::nll_loss2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::nll_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::nll_loss_nd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::not_equal.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::not_equal.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::nuclear_norm.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::nuclear_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::numpy_T], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::orgqr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::outer], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::pad], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::pairwise_distance], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::pinverse], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::poisson_nll_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::positive], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::prelu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::qr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::ravel], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::real], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::relu6], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::relu6_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::repeat_interleave.self_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::repeat_interleave.self_int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::reshape], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::reshape_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::resolve_conj], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::resolve_neg], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::result_type.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::result_type.Scalar_Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::result_type.Scalar_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::result_type.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::rms_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::row_stack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::rrelu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::rrelu_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::scaled_dot_product_attention], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::selu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::selu_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::size.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::slogdet], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::softmax.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_digamma], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_erf], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_erfc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_erfinv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_exp2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_expit], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_expm1], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_gammainc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_gammaincc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_gammaln], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_i0], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_log1p], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_log_softmax], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_logit], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_logsumexp], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_multigammaln], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_ndtr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_polygamma], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_psi], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_round], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_sinc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_softmax], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_xlogy.other_scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_xlogy.self_scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_xlogy], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::split.sizes], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::square], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::std.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::std], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::std_mean.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::std_mean], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::subtract.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::sum_to_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::svd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::swapaxes], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::swapaxes_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::swapdims], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::swapdims_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::take_along_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::tensor_split.indices], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::tensor_split.sections], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::tensordot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::tile], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::to.device], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::to.dtype], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::to.dtype_layout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::to.other], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::trapezoid.dx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::trapezoid.x], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::trapz.dx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::trapz.x], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::true_divide.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::true_divide.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::true_divide_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::true_divide_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::type_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::unflatten.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::unfold_copy], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::unsafe_chunk], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::upsample_bicubic2d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::upsample_bilinear2d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::upsample_linear1d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::upsample_nearest1d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::upsample_nearest2d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::upsample_nearest3d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::upsample_trilinear3d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::value_selecting_reduction_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::var.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::var], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::var_mean.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::var_mean], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::view_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::vsplit.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::vsplit.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::vstack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::where.ScalarOther], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::where.ScalarSelf], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::where.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::absolute], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::absolute_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::adaptive_avg_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::adaptive_avg_pool2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::adaptive_avg_pool3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::adaptive_max_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::adjoint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::affine_grid_generator_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::align_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::align_tensors], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::align_to.ellipsis_idx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::align_to], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::alpha_dropout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::alpha_dropout_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arccos], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arccos_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arccosh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arccosh_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arcsin], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arcsin_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arcsinh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arcsinh_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arctan2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arctan2_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arctan], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arctan_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arctanh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arctanh_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::argsort.stable], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::argsort], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::argwhere], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::atleast_1d.Sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::atleast_1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::atleast_2d.Sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::atleast_2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::atleast_3d.Sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::atleast_3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::avg_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::batch_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::bilinear], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::broadcast_tensors], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::broadcast_to], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::can_cast], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cartesian_prod], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cat.names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cdist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::chain_matmul], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::chalf], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::choose_qparams_optimized], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::chunk], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::clip.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::clip], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::clip_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::clip_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::coalesce], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::column_stack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::combinations], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::concat.names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::concat], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::concatenate.names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::concatenate], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conj], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conj_physical], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::contiguous], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conv1d.padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conv1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conv2d.padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conv2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conv3d.padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conv3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conv_tbc_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conv_transpose1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conv_transpose2d.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conv_transpose3d.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::corrcoef], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cosine_embedding_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cosine_similarity], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cov], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cross], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cross_entropy_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::ctc_loss.IntList], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::ctc_loss.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cudnn_is_acceptable], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cummaxmin_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cumprod_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cumulative_trapezoid.dx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cumulative_trapezoid.x], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::det], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::diag], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::diagflat], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::diff], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::divide.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::divide.Scalar_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::divide.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::divide.Tensor_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::divide.out_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::divide_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::divide_.Scalar_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::divide_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::divide_.Tensor_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::dropout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::dropout_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::dsplit.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::dsplit.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::dstack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::einsum], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::embedding_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::embedding_bag.padding_idx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::embedding_bag], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::expand_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::feature_alpha_dropout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::feature_alpha_dropout_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::feature_dropout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::feature_dropout_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_fft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_fft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_fftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_fftshift], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_hfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_hfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_hfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_ifft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_ifft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_ifftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_ifftshift], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_ihfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_ihfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_ihfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_irfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_irfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_irfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_rfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_rfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_rfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fill_diagonal_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fix], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fix_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::flatten.named_out_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::flatten.using_ints], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::flatten.using_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::flatten_dense_tensors], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fliplr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::flipud], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::float_power.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::float_power.Tensor_Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::float_power.Tensor_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::float_power_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::float_power_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::frobenius_norm.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fused_moving_avg_obs_fake_quant], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gather_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::ger], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::get_gradients], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gradient.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gradient.scalararray], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gradient.scalarint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gradient.scalarrayarray], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gradient.scalarrayint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gradient.tensorarray], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gradient.tensorarrayint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::greater.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::greater.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::greater_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::greater_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::greater_equal.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::greater_equal.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::greater_equal_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::greater_equal_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::grid_sampler], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::group_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gru.data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gru.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gru_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::hinge_embedding_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::histogramdd.TensorList_bins], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::histogramdd.int_bins], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::histogramdd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::hsplit.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::hsplit.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::hstack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::imag], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::index_select_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::infinitely_differentiable_gelu_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::inner], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::instance_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::inverse], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::isclose], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::isfinite], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::isreal], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::istft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::item], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::kl_div], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::kron], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::l1_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::layer_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::ldexp.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::ldexp_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::less.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::less.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::less_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::less_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::less_equal.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::less_equal.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::less_equal_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::less_equal_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_cholesky], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_cond.p_str], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_cond], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_det], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_diagonal], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_eigh.eigvals], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_eigh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_eigvals], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_eigvalsh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_inv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_ldl_factor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_lu_factor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_matmul], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_matrix_norm.str_ord], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_matrix_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_matrix_power], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_matrix_rank.atol_rtol_float], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_matrix_rank.atol_rtol_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_matrix_rank.out_tol_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_matrix_rank.tol_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_matrix_rank], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_multi_dot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_norm.ord_str], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_pinv.atol_rtol_float], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_pinv.out_rcond_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_pinv.rcond_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_pinv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_slogdet], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_solve], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_solve_ex], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_svd.U], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_svd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_svdvals], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_tensorinv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_tensorsolve], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_vander], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_vecdot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linear], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::log_sigmoid], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::log_softmax.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::logdet], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::logsumexp.names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::lstm.data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::lstm.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::lstm_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::lu_solve], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::mH], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::mT], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::margin_ranking_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::masked_select_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::matmul], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::matrix_H], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::matrix_exp], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::matrix_exp_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::matrix_power], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::max.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::max.names_dim_max], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::max.other], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::max_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::max_pool1d_with_indices], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::max_pool2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::max_pool3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::mean.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::median.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::median.names_dim_values], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::meshgrid.indexing], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::meshgrid], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::min.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::min.names_dim_min], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::min.other], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::mish_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::moveaxis.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::moveaxis.intlist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::movedim.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::movedim.intlist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::msort], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::multilabel_margin_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::multiply.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::multiply.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::multiply_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::multiply_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nanmean], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nanmedian.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nanmedian.names_dim_values], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nanquantile.scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nanquantile], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::narrow.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::narrow], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::native_channel_shuffle], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::negative], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::negative_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nested_to_padded_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nll_loss2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nll_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nll_loss_nd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nonzero_numpy], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::norm.names_ScalarOpt_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::norm.names_ScalarOpt_dim_dtype], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::norm_except_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::not_equal.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::not_equal.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::not_equal_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::not_equal_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nuclear_norm.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nuclear_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::numpy_T], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::one_hot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::orgqr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::outer], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::output_nr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::pad], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::pad_sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::pairwise_distance], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::pdist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::pin_memory], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::pinverse], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::poisson_nll_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::positive], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::prelu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::promote_types], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::qr.Q], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::qr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::quantile.scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::quantile], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::ravel], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::real], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::refine_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::relu6], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::relu6_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rename], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rename_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::repeat_interleave.self_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::repeat_interleave.self_int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::requires_grad_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::reshape], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::reshape_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::resolve_conj], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::resolve_neg], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::result_type.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::result_type.Scalar_Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::result_type.Scalar_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::result_type.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::retain_grad], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::retains_grad], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rms_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rnn_relu.data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rnn_relu.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rnn_relu_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rnn_tanh.data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rnn_tanh.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rnn_tanh_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::row_stack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rrelu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rrelu_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::scaled_dot_product_attention], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::selu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::selu_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::set_.source_Tensor_storage_offset], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::set_data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::silu_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::size.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::slogdet], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::slow_conv3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::smm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::softmax.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_digamma], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_erf], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_erfc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_erfinv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_exp2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_expit], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_expm1], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_gammainc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_gammaincc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_gammaln], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_i0], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_log1p], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_log_softmax], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_logit], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_logsumexp], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_multigammaln], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_ndtr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_polygamma], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_psi], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_round], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_sinc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_softmax], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_xlogy.other_scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_xlogy.self_scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_xlogy], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::split.sizes], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::square], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::square_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::sspaddmm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::std.correction_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::std.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::std.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::std], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::std_mean.correction_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::std_mean.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::std_mean.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::std_mean], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::stft.center], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::stft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::stride.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::subtract.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::subtract.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::subtract_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::subtract_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::sum_to_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::svd.U], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::svd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::swapaxes], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::swapaxes_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::swapdims], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::swapdims_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::sym_is_contiguous], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::sym_numel], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::sym_size.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::sym_storage_offset], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::sym_stride.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::take_along_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::tensor_split.indices], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::tensor_split.sections], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::tensor_split.tensor_indices_or_sections], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::tensordot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::thnn_conv2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::tile], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::to.device], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::to.dtype], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::to.dtype_layout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::to.other], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::to_dense], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::to_dense_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::to_mkldnn_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::trace_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::trapezoid.dx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::trapezoid.x], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::trapz.dx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::trapz.x], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::triplet_margin_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::true_divide.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::true_divide.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::true_divide_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::true_divide_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::type_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::unflatten.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::unflatten_dense_tensors], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::unsafe_chunk], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::upsample_bicubic2d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::upsample_bilinear2d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::upsample_linear1d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::upsample_nearest1d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::upsample_nearest2d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::upsample_nearest3d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::upsample_trilinear3d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::value_selecting_reduction_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::vander], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::var.correction_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::var.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::var.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::var], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::var_mean.correction_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::var_mean.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::var_mean.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::var_mean], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::view_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::vsplit.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::vsplit.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::vstack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::where.ScalarOther], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::where.ScalarSelf], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::where.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::where]
2025-12-04T17:04:52.5062064Z 
2025-12-04T17:04:52.5062564Z Finished functorch/test_vmap_registrations 1/1 ... [2025-12-04 17:04:52.256357][28320.639245214], took 0.17min
2025-12-04T17:04:52.5064050Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_vmap_registrations/functorch.test_vmap_registrations-40d5b566ee6986dc.xml
2025-12-04T17:04:52.5065358Z Running nn/test_parametrization 1/1 ... [2025-12-04 17:04:52.405089][28320.787982798]
2025-12-04T17:04:52.5065929Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:04:52.5067181Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_parametrization.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:04:52.405560]
2025-12-04T17:05:01.8838286Z 
2025-12-04T17:05:01.8839296Z nn/test_parametrization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_parametrization_1.1_0b836fe205c49662_.log
2025-12-04T17:05:01.8871946Z Running 58 items in this shard: test/nn/test_parametrization.py::TestNNParametrization::test_caching_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_caching_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_caching_parametrization_with_transfer_parametrizations_and_params_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_caching_parametrization_with_transfer_parametrizations_and_params_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_deepcopy_after_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_deepcopy_after_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_errors_parametrized_tensor_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_errors_parametrized_tensor_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_errors_unparametrized_tensor_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_errors_unparametrized_tensor_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_initialization_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_initialization_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_multiple_inputs_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_multiple_inputs_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_dim_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_dim_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_forward_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_forward_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_load_state_dict_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_load_state_dict_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_value_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_value_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_orthogonal_errors_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_orthogonal_errors_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_orthogonal_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_orthogonal_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_parametrization_same_training_mode_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_parametrization_same_training_mode_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_register_and_remove_buffer_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_register_and_remove_buffer_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_register_and_remove_nested_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_register_and_remove_nested_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_register_and_remove_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_register_and_remove_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_register_parametrization_no_grad, test/nn/test_parametrization.py::TestNNParametrization::test_serialization_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_serialization_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_many_to_one_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_many_to_one_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_right_inverse_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_right_inverse_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_single_param_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_single_param_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_type_before_parametrizations_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_type_before_parametrizations_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_weight_norm_deepcopy_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_weight_norm_deepcopy_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_weight_norm_pickle_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_weight_norm_pickle_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_weight_norm_state_dict_compat_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_weight_norm_state_dict_compat_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_wrapper_subclass_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrizationDeviceCUDA::test_weight_norm_parametrization_swap_False_cuda, test/nn/test_parametrization.py::TestNNParametrizationDeviceCUDA::test_weight_norm_parametrization_swap_True_cuda
2025-12-04T17:05:01.8903480Z 
2025-12-04T17:05:01.8903839Z Finished nn/test_parametrization 1/1 ... [2025-12-04 17:05:01.883696][28330.266591922], took 0.16min
2025-12-04T17:05:01.9191101Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_parametrization/nn.test_parametrization-ed4e97080833ff92.xml
2025-12-04T17:05:01.9944797Z Running test_dynamic_shapes 1/1 ... [2025-12-04 17:05:01.994162][28330.377055699]
2025-12-04T17:05:01.9945352Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:05:01.9948527Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_dynamic_shapes.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:05:01.994612]
2025-12-04T17:06:11.8184055Z 
2025-12-04T17:06:11.8185234Z test_dynamic_shapes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_dynamic_shapes_1.1_f2bbcf4caeac0628_.log
2025-12-04T17:06:11.8363079Z Running 378 items in this shard: test/test_dynamic_shapes.py::TestPySymInt::test_arith_ops, test/test_dynamic_shapes.py::TestPySymInt::test_aten_ops, test/test_dynamic_shapes.py::TestPySymInt::test_avoid_unbacked_substitution, test/test_dynamic_shapes.py::TestPySymInt::test_backed_size_oblivious_01_spec, test/test_dynamic_shapes.py::TestPySymInt::test_baddbmm_symint, test/test_dynamic_shapes.py::TestPySymInt::test_binary, test/test_dynamic_shapes.py::TestPySymInt::test_data_dependent_guard, test/test_dynamic_shapes.py::TestPySymInt::test_data_dependent_guard_propagate_real_tensors, test/test_dynamic_shapes.py::TestPySymInt::test_debug_has_internal_overlap_unbacked, test/test_dynamic_shapes.py::TestPySymInt::test_deepcopy, test/test_dynamic_shapes.py::TestPySymInt::test_duck_shape, test/test_dynamic_shapes.py::TestPySymInt::test_ephemeral_source_simplification, test/test_dynamic_shapes.py::TestPySymInt::test_ephemeral_source_unified_with_non_ephemeral_source, test/test_dynamic_shapes.py::TestPySymInt::test_expect_true_basic, test/test_dynamic_shapes.py::TestPySymInt::test_expect_true_double_digits, test/test_dynamic_shapes.py::TestPySymInt::test_expect_true_prefer_later, test/test_dynamic_shapes.py::TestPySymInt::test_expect_true_refine_range, test/test_dynamic_shapes.py::TestPySymInt::test_expect_true_with_s0, test/test_dynamic_shapes.py::TestPySymInt::test_floor_clean_div_axioms, test/test_dynamic_shapes.py::TestPySymInt::test_floordiv_static, test/test_dynamic_shapes.py::TestPySymInt::test_fx_trace_intlist, test/test_dynamic_shapes.py::TestPySymInt::test_guard_int, test/test_dynamic_shapes.py::TestPySymInt::test_guard_refine_range, test/test_dynamic_shapes.py::TestPySymInt::test_hash_size, test/test_dynamic_shapes.py::TestPySymInt::test_int_bool, test/test_dynamic_shapes.py::TestPySymInt::test_int_conversion, test/test_dynamic_shapes.py::TestPySymInt::test_int_to_float, test/test_dynamic_shapes.py::TestPySymInt::test_max_of_unique_summation_opt, test/test_dynamic_shapes.py::TestPySymInt::test_meta_symint, test/test_dynamic_shapes.py::TestPySymInt::test_mul_int_oo_nan, test/test_dynamic_shapes.py::TestPySymInt::test_non_overlapping_and_dense_backed, test/test_dynamic_shapes.py::TestPySymInt::test_non_overlapping_and_dense_unbacked, test/test_dynamic_shapes.py::TestPySymInt::test_numel, test/test_dynamic_shapes.py::TestPySymInt::test_numpy_sym_max, test/test_dynamic_shapes.py::TestPySymInt::test_numpy_sym_min, test/test_dynamic_shapes.py::TestPySymInt::test_prefer_deferred_runtime_assertions_over_guards, test/test_dynamic_shapes.py::TestPySymInt::test_prims_non_overlapping_and_dense, test/test_dynamic_shapes.py::TestPySymInt::test_print_readable_with_symints, test/test_dynamic_shapes.py::TestPySymInt::test_reverse_arith_ops, test/test_dynamic_shapes.py::TestPySymInt::test_roundtrip, test/test_dynamic_shapes.py::TestPySymInt::test_size_expressions, test/test_dynamic_shapes.py::TestPySymInt::test_slice_backed_size_oblivious, test/test_dynamic_shapes.py::TestPySymInt::test_specialize_zero_one, test/test_dynamic_shapes.py::TestPySymInt::test_statically_known_false, test/test_dynamic_shapes.py::TestPySymInt::test_statically_known_true, test/test_dynamic_shapes.py::TestPySymInt::test_stride, test/test_dynamic_shapes.py::TestPySymInt::test_sym_ceil, test/test_dynamic_shapes.py::TestPySymInt::test_sym_floor, test/test_dynamic_shapes.py::TestPySymInt::test_sym_int, test/test_dynamic_shapes.py::TestPySymInt::test_sym_ite, test/test_dynamic_shapes.py::TestPySymInt::test_sym_log2, test/test_dynamic_shapes.py::TestPySymInt::test_sym_max_multi_max_simplify, test/test_dynamic_shapes.py::TestPySymInt::test_sym_sqrt, test/test_dynamic_shapes.py::TestPySymInt::test_sym_sum, test/test_dynamic_shapes.py::TestPySymInt::test_sym_trunc, test/test_dynamic_shapes.py::TestPySymInt::test_symint_args, test/test_dynamic_shapes.py::TestPySymInt::test_symint_as_scalar, test/test_dynamic_shapes.py::TestPySymInt::test_symint_bitwise_and, test/test_dynamic_shapes.py::TestPySymInt::test_symint_bitwise_or, test/test_dynamic_shapes.py::TestPySymInt::test_symint_bitwise_xor, test/test_dynamic_shapes.py::TestPySymInt::test_symint_vargs, test/test_dynamic_shapes.py::TestPySymInt::test_sympify_symint, test/test_dynamic_shapes.py::TestPySymInt::test_sympy_optimized_add, test/test_dynamic_shapes.py::TestPySymInt::test_sympy_optimized_add_binary_search, test/test_dynamic_shapes.py::TestPySymInt::test_tensor_factory_with_symint, test/test_dynamic_shapes.py::TestPySymInt::test_tracing_sym_ite, test/test_dynamic_shapes.py::TestPySymInt::test_unbacked_substitution, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_abs, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_add, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_and, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_bitwise_and, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_bitwise_or, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_bitwise_xor, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_ceil, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_eq, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_float_pow, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_float_truediv, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_floor, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_ge, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_gt, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_int_floordiv, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_int_truediv, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_is_integer, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_le, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_lshift, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_lt, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_mod, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_mul, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_ne, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_neg, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_or, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_pos, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_pow_by_natural, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_round, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_rshift, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sub, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_acos, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_asin, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_atan, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_cos, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_cosh, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_ite, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_log2, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_max, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_min, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_not, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_sin, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_sinh, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_sqrt, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_tan, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_tanh, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_trunc, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_dynamic_int_basic_compile_backend_eager, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_dynamic_int_basic_compile_backend_inductor, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_dynamic_int_eager_usage, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_abs_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_abs_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_abs_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_abs_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_add_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_add_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_add_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_add_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_and_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_and_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_and_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_and_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_and_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_and_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_and_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_and_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_or_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_or_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_or_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_or_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_xor_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_xor_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_xor_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_xor_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ceil_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ceil_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ceil_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ceil_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_eq_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_eq_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_eq_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_eq_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_float_pow_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_float_pow_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_float_pow_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_float_pow_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_float_truediv_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_float_truediv_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_float_truediv_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_float_truediv_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_floor_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_floor_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_floor_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_floor_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ge_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ge_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ge_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ge_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_gt_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_gt_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_gt_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_gt_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_int_floordiv_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_int_floordiv_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_int_floordiv_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_int_floordiv_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_int_truediv_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_int_truediv_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_int_truediv_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_int_truediv_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_is_integer_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_is_integer_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_is_integer_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_is_integer_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_le_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_le_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_le_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_le_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_lshift_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_lshift_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_lshift_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_lshift_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_lt_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_lt_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_lt_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_lt_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_mod_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_mod_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_mod_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_mod_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_mul_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_mul_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_mul_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_mul_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ne_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ne_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ne_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ne_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_neg_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_neg_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_neg_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_neg_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_or_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_or_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_or_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_or_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_pos_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_pos_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_pos_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_pos_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_pow_by_natural_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_pow_by_natural_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_pow_by_natural_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_pow_by_natural_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_round_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_round_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_round_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_round_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_rshift_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_rshift_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_rshift_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_rshift_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sub_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sub_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sub_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sub_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_acos_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_acos_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_acos_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_acos_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_asin_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_asin_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_asin_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_asin_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_atan_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_atan_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_atan_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_atan_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_cos_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_cos_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_cos_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_cos_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_cosh_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_cosh_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_cosh_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_cosh_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_float_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_float_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_float_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_float_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_ite_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_ite_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_ite_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_ite_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_log2_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_log2_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_log2_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_log2_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_max_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_max_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_max_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_max_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_min_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_min_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_min_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_min_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_not_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_not_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_not_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_not_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sin_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sin_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sin_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sin_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sinh_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sinh_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sinh_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sinh_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sqrt_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sqrt_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sqrt_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sqrt_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_tan_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_tan_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_tan_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_tan_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_tanh_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_tanh_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_tanh_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_tanh_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_trunc_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_trunc_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_trunc_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_trunc_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_non_symbolic_symnode, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_stride_symnode, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_symint_deepcopy, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_symint_hashing, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_symnode_hashing, test/test_dynamic_shapes.py::TestFloorDiv::test_floordiv_assumptions, test/test_dynamic_shapes.py::TestFloorDiv::test_floordiv_div_by_one, test/test_dynamic_shapes.py::TestFloorDiv::test_floordiv_div_does_not_generate_non_int_rational, test/test_dynamic_shapes.py::TestFloorDiv::test_floordiv_float_int, test/test_dynamic_shapes.py::TestFloorDiv::test_floordiv_simplify, test/test_dynamic_shapes.py::TestDimConstraints::test_dim_constraints_reduce_congruences_simple, test/test_dynamic_shapes.py::TestDimConstraints::test_dim_constraints_reduce_inequalities_error, test/test_dynamic_shapes.py::TestDimConstraints::test_dim_constraints_reduce_inequalities_simple, test/test_dynamic_shapes.py::TestDimConstraints::test_dim_constraints_solve_full, test/test_dynamic_shapes.py::TestDimConstraints::test_simplify_max_1_0, test/test_dynamic_shapes.py::TestGuardsExpressions::test_guard_or_false, test/test_dynamic_shapes.py::TestGuardsExpressions::test_guard_or_true, test/test_dynamic_shapes.py::TestGuardsExpressions::test_guards_float_div, test/test_dynamic_shapes.py::TestGuardsExpressions::test_guards_float_print, test/test_dynamic_shapes.py::TestGuardsExpressions::test_guards_gt_lt, test/test_dynamic_shapes.py::TestGuardsExpressions::test_remove_symbols_without_guarding, test/test_dynamic_shapes.py::TestGuardsExpressions::test_size_comparison_no_recompile, test/test_dynamic_shapes.py::TestUnbacked::test_deferred_neq_assert_backend_eager, test/test_dynamic_shapes.py::TestUnbacked::test_deferred_neq_assert_backend_inductor, test/test_dynamic_shapes.py::TestUnbacked::test_deferred_sym_eq_assert_backend_eager, test/test_dynamic_shapes.py::TestUnbacked::test_deferred_sym_eq_assert_backend_inductor, test/test_dynamic_shapes.py::TestUnbacked::test_deferred_sym_or_assert_backend_eager, test/test_dynamic_shapes.py::TestUnbacked::test_deferred_sym_or_assert_backend_inductor, test/test_dynamic_shapes.py::TestUnbacked::test_deferred_with_unbacked_input_backend_eager, test/test_dynamic_shapes.py::TestUnbacked::test_deferred_with_unbacked_input_backend_inductor, test/test_dynamic_shapes.py::TestUnbacked::test_div_unbacked_eq_globals, test/test_dynamic_shapes.py::TestUnbacked::test_div_unbacked_eq_input_ints, test/test_dynamic_shapes.py::TestUnbacked::test_div_unbacked_eq_input_tensors, test/test_dynamic_shapes.py::TestUnbacked::test_div_unbacked_eq_item, test/test_dynamic_shapes.py::TestUnbacked::test_do_not_guard_unbacked_inputs, test/test_dynamic_shapes.py::TestUnbacked::test_has_free_symbols, test/test_dynamic_shapes.py::TestUnbacked::test_post_specialize_runtime_assert1_backend_eager, test/test_dynamic_shapes.py::TestUnbacked::test_post_specialize_runtime_assert1_backend_inductor, test/test_dynamic_shapes.py::TestUnbacked::test_post_specialize_runtime_assert2_backend_eager, test/test_dynamic_shapes.py::TestUnbacked::test_post_specialize_runtime_assert2_backend_inductor, test/test_dynamic_shapes.py::TestUbackedOps::test_backed_size_oblivious_broadcast, test/test_dynamic_shapes.py::TestUbackedOps::test_backed_size_oblivious_expand, test/test_dynamic_shapes.py::TestUbackedOps::test_invalid_view_unbacked_view, test/test_dynamic_shapes.py::TestUbackedOps::test_narrow_unbacked_start, test/test_dynamic_shapes.py::TestUbackedOps::test_narrow_unbacked_start_cpp_wrapper, test/test_dynamic_shapes.py::TestUbackedOps::test_narrow_with_tensor_start, test/test_dynamic_shapes.py::TestUbackedOps::test_nonzero_select, test/test_dynamic_shapes.py::TestUbackedOps::test_nonzero_select_cpp_wrapper, test/test_dynamic_shapes.py::TestUbackedOps::test_nonzero_slice, test/test_dynamic_shapes.py::TestUbackedOps::test_nonzero_slice_cpp_wrapper, test/test_dynamic_shapes.py::TestUbackedOps::test_padnd, test/test_dynamic_shapes.py::TestUbackedOps::test_select_scatter_unbacked_index, test/test_dynamic_shapes.py::TestUbackedOps::test_slice_with_tensor_indices, test/test_dynamic_shapes.py::TestUbackedOps::test_slice_with_tensor_indices_cpp_wrapper, test/test_dynamic_shapes.py::TestUbackedOps::test_tensor_split, test/test_dynamic_shapes.py::TestUbackedOps::test_tensor_split_cpp_wrapper, test/test_dynamic_shapes.py::TestUbackedOps::test_trunc_int_div_true, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_contiguous, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_item, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_item_set_item, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_item_set_item2, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_item_set_item3, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_non_contigious_reshape_failing, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_reshape1, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_reshape2, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_reshape3, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_reshape_copy, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_select2, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_select_2, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_select_index, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_select_index_cpp_wrapper, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_select_index_with_check, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_slice, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_slice_cpp_wrapper, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_slice_with_step, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_slice_with_step_cpp_wrapper, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_view_extra, test/test_dynamic_shapes.py::TestUbackedOps::test_unbind_not_dynamic
2025-12-04T17:06:11.8533297Z 
2025-12-04T17:06:11.8533652Z Finished test_dynamic_shapes 1/1 ... [2025-12-04 17:06:11.818874][28400.201766344], took 1.16min
2025-12-04T17:06:11.8556534Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_dynamic_shapes/test_dynamic_shapes-07075f000d166d21.xml
2025-12-04T17:06:11.9585570Z Running test_dispatch 1/1 ... [2025-12-04 17:06:11.958266][28400.341160247]
2025-12-04T17:06:11.9586100Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:06:11.9589593Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_dispatch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:06:11.958714]
2025-12-04T17:06:43.9213019Z 
2025-12-04T17:06:43.9213935Z test_dispatch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_dispatch_1.1_a7d630610c114c46_.log
2025-12-04T17:06:43.9226100Z Running 32 items in this shard: test/test_dispatch.py::TestDispatch::test_all_invariants, test/test_dispatch.py::TestDispatch::test_computed_table, test/test_dispatch.py::TestDispatch::test_computed_table_with_ambiguous_autogradother, test/test_dispatch.py::TestDispatch::test_computed_table_with_autograd, test/test_dispatch.py::TestDispatch::test_computed_table_with_cpu_autograd_defaultbackend, test/test_dispatch.py::TestDispatch::test_computed_table_with_cpu_autograd_math, test/test_dispatch.py::TestDispatch::test_computed_table_with_cpu_autograd_math_defaultbackend, test/test_dispatch.py::TestDispatch::test_computed_table_with_cpu_defaultbackend, test/test_dispatch.py::TestDispatch::test_computed_table_with_cpu_math, test/test_dispatch.py::TestDispatch::test_computed_table_with_cpu_math_autogradcpu_fallthrough, test/test_dispatch.py::TestDispatch::test_computed_table_with_math, test/test_dispatch.py::TestDispatch::test_def, test/test_dispatch.py::TestDispatch::test_def_impl_schema_mismatch, test/test_dispatch.py::TestDispatch::test_def_only, test/test_dispatch.py::TestDispatch::test_def_with_explicit_alias, test/test_dispatch.py::TestDispatch::test_def_with_inference, test/test_dispatch.py::TestDispatch::test_dispatch_print_registrations_for_dispatch_key_invalid, test/test_dispatch.py::TestDispatch::test_find_dangling_impls, test/test_dispatch.py::TestDispatch::test_find_dangling_impls_ext, test/test_dispatch.py::TestDispatch::test_impl_only, test/test_dispatch.py::TestDispatch::test_multiple_def_alias_defaulting, test/test_dispatch.py::TestDispatch::test_multiple_def_alias_mismatch, test/test_dispatch.py::TestDispatch::test_multiple_def_error, test/test_dispatch.py::TestDispatch::test_multiple_fallback, test/test_dispatch.py::TestDispatch::test_overwrite_math, test/test_dispatch.py::TestPythonDispatcher::test_autogradother, test/test_dispatch.py::TestPythonDispatcher::test_basic, test/test_dispatch.py::TestPythonDispatcher::test_defaultbackend_autogradcpu, test/test_dispatch.py::TestPythonDispatcher::test_defaultbackend_math, test/test_dispatch.py::TestPythonDispatcher::test_duplicate_registrations, test/test_dispatch.py::TestPythonDispatcher::test_math_autogradcpu, test/test_dispatch.py::TestPythonDispatcher::test_quantized_structured_not_implemented
2025-12-04T17:06:43.9237546Z 
2025-12-04T17:06:43.9237844Z Finished test_dispatch 1/1 ... [2025-12-04 17:06:43.921094][28432.303986641], took 0.53min
2025-12-04T17:06:43.9572419Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_dispatch/test_dispatch-bf1fd68f7abb7228.xml
2025-12-04T17:06:44.0382376Z Running test_numba_integration 1/1 ... [2025-12-04 17:06:44.037893][28432.420786236]
2025-12-04T17:06:44.0382966Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:06:44.0386061Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_numba_integration.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:06:44.038345]
2025-12-04T17:06:51.4639057Z 
2025-12-04T17:06:51.4640056Z test_numba_integration 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_numba_integration_1.1_4248037d4c172e88_.log
2025-12-04T17:06:51.4644305Z Running 8 items in this shard: test/test_numba_integration.py::TestNumbaIntegration::test_active_device, test/test_numba_integration.py::TestNumbaIntegration::test_array_adaptor, test/test_numba_integration.py::TestNumbaIntegration::test_conversion_errors, test/test_numba_integration.py::TestNumbaIntegration::test_cuda_array_interface, test/test_numba_integration.py::TestNumbaIntegration::test_from_cuda_array_interface, test/test_numba_integration.py::TestNumbaIntegration::test_from_cuda_array_interface_active_device, test/test_numba_integration.py::TestNumbaIntegration::test_from_cuda_array_interface_inferred_strides, test/test_numba_integration.py::TestNumbaIntegration::test_from_cuda_array_interface_lifetime
2025-12-04T17:06:51.4648169Z 
2025-12-04T17:06:51.4648527Z Finished test_numba_integration 1/1 ... [2025-12-04 17:06:51.463709][28439.846601623], took 0.12min
2025-12-04T17:06:51.5000410Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_numba_integration/test_numba_integration-edcc49db775b9990.xml
2025-12-04T17:06:51.5795394Z Running test_functional_optim 1/1 ... [2025-12-04 17:06:51.579175][28439.962069932]
2025-12-04T17:06:51.5795971Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:06:51.5798700Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_functional_optim.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:06:51.579625]
2025-12-04T17:06:57.3524435Z 
2025-12-04T17:06:57.3525437Z test_functional_optim 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_functional_optim_1.1_82fdba90420e8f47_.log
2025-12-04T17:06:57.3528382Z Running 4 items in this shard: test/test_functional_optim.py::TestFunctionalOptimParity::test_functional_optim_parity_adam, test/test_functional_optim.py::TestFunctionalOptimParity::test_functional_optim_parity_adam_w, test/test_functional_optim.py::TestFunctionalOptimParity::test_functional_optim_parity_sgd, test/test_functional_optim.py::TestFunctionalOptimParity::test_functional_optim_registration
2025-12-04T17:06:57.3530572Z 
2025-12-04T17:06:57.3530924Z Finished test_functional_optim 1/1 ... [2025-12-04 17:06:57.352257][28445.735151173], took 0.10min
2025-12-04T17:06:57.3886612Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_functional_optim/test_functional_optim-389cbc1bb3d61470.xml
2025-12-04T17:06:57.4235801Z Running test_maskedtensor 1/1 ... [2025-12-04 17:06:57.423281][28445.806174687]
2025-12-04T17:06:57.4236357Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:06:57.4239632Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_maskedtensor.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:06:57.423729]
2025-12-04T17:09:12.8903389Z 
2025-12-04T17:09:12.8904305Z test_maskedtensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_maskedtensor_1.1_4e0623e742dfe084_.log
2025-12-04T17:09:12.9298748Z Running 958 items in this shard: test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn0, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn1, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn10, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn11, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn12, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn13, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn14, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn15, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn16, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn17, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn18, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn19, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn2, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn20, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn21, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn22, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn23, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn24, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn25, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn26, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn27, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn28, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn29, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn3, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn30, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn31, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn32, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn33, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn34, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn35, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn36, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn37, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn38, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn39, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn4, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn40, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn41, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn42, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn43, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn44, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn45, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn46, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn47, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn48, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn49, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn5, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn50, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn51, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn52, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn53, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn54, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn55, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn56, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn57, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn6, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn7, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn8, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn9, test/test_maskedtensor.py::TestUnary::test_unary_fn0, test/test_maskedtensor.py::TestUnary::test_unary_fn1, test/test_maskedtensor.py::TestUnary::test_unary_fn10, test/test_maskedtensor.py::TestUnary::test_unary_fn11, test/test_maskedtensor.py::TestUnary::test_unary_fn12, test/test_maskedtensor.py::TestUnary::test_unary_fn13, test/test_maskedtensor.py::TestUnary::test_unary_fn14, test/test_maskedtensor.py::TestUnary::test_unary_fn15, test/test_maskedtensor.py::TestUnary::test_unary_fn16, test/test_maskedtensor.py::TestUnary::test_unary_fn17, test/test_maskedtensor.py::TestUnary::test_unary_fn18, test/test_maskedtensor.py::TestUnary::test_unary_fn19, test/test_maskedtensor.py::TestUnary::test_unary_fn2, test/test_maskedtensor.py::TestUnary::test_unary_fn20, test/test_maskedtensor.py::TestUnary::test_unary_fn21, test/test_maskedtensor.py::TestUnary::test_unary_fn22, test/test_maskedtensor.py::TestUnary::test_unary_fn23, test/test_maskedtensor.py::TestUnary::test_unary_fn24, test/test_maskedtensor.py::TestUnary::test_unary_fn25, test/test_maskedtensor.py::TestUnary::test_unary_fn26, test/test_maskedtensor.py::TestUnary::test_unary_fn27, test/test_maskedtensor.py::TestUnary::test_unary_fn28, test/test_maskedtensor.py::TestUnary::test_unary_fn29, test/test_maskedtensor.py::TestUnary::test_unary_fn3, test/test_maskedtensor.py::TestUnary::test_unary_fn30, test/test_maskedtensor.py::TestUnary::test_unary_fn31, test/test_maskedtensor.py::TestUnary::test_unary_fn32, test/test_maskedtensor.py::TestUnary::test_unary_fn33, test/test_maskedtensor.py::TestUnary::test_unary_fn34, test/test_maskedtensor.py::TestUnary::test_unary_fn35, test/test_maskedtensor.py::TestUnary::test_unary_fn36, test/test_maskedtensor.py::TestUnary::test_unary_fn37, test/test_maskedtensor.py::TestUnary::test_unary_fn38, test/test_maskedtensor.py::TestUnary::test_unary_fn39, test/test_maskedtensor.py::TestUnary::test_unary_fn4, test/test_maskedtensor.py::TestUnary::test_unary_fn40, test/test_maskedtensor.py::TestUnary::test_unary_fn41, test/test_maskedtensor.py::TestUnary::test_unary_fn42, test/test_maskedtensor.py::TestUnary::test_unary_fn43, test/test_maskedtensor.py::TestUnary::test_unary_fn44, test/test_maskedtensor.py::TestUnary::test_unary_fn45, test/test_maskedtensor.py::TestUnary::test_unary_fn46, test/test_maskedtensor.py::TestUnary::test_unary_fn47, test/test_maskedtensor.py::TestUnary::test_unary_fn48, test/test_maskedtensor.py::TestUnary::test_unary_fn49, test/test_maskedtensor.py::TestUnary::test_unary_fn5, test/test_maskedtensor.py::TestUnary::test_unary_fn50, test/test_maskedtensor.py::TestUnary::test_unary_fn51, test/test_maskedtensor.py::TestUnary::test_unary_fn52, test/test_maskedtensor.py::TestUnary::test_unary_fn53, test/test_maskedtensor.py::TestUnary::test_unary_fn54, test/test_maskedtensor.py::TestUnary::test_unary_fn55, test/test_maskedtensor.py::TestUnary::test_unary_fn56, test/test_maskedtensor.py::TestUnary::test_unary_fn57, test/test_maskedtensor.py::TestUnary::test_unary_fn58, test/test_maskedtensor.py::TestUnary::test_unary_fn59, test/test_maskedtensor.py::TestUnary::test_unary_fn6, test/test_maskedtensor.py::TestUnary::test_unary_fn60, test/test_maskedtensor.py::TestUnary::test_unary_fn61, test/test_maskedtensor.py::TestUnary::test_unary_fn7, test/test_maskedtensor.py::TestUnary::test_unary_fn8, test/test_maskedtensor.py::TestUnary::test_unary_fn9, test/test_maskedtensor.py::TestBinary::test_binary_fn0, test/test_maskedtensor.py::TestBinary::test_binary_fn1, test/test_maskedtensor.py::TestBinary::test_binary_fn10, test/test_maskedtensor.py::TestBinary::test_binary_fn11, test/test_maskedtensor.py::TestBinary::test_binary_fn12, test/test_maskedtensor.py::TestBinary::test_binary_fn13, test/test_maskedtensor.py::TestBinary::test_binary_fn14, test/test_maskedtensor.py::TestBinary::test_binary_fn15, test/test_maskedtensor.py::TestBinary::test_binary_fn16, test/test_maskedtensor.py::TestBinary::test_binary_fn17, test/test_maskedtensor.py::TestBinary::test_binary_fn18, test/test_maskedtensor.py::TestBinary::test_binary_fn19, test/test_maskedtensor.py::TestBinary::test_binary_fn2, test/test_maskedtensor.py::TestBinary::test_binary_fn20, test/test_maskedtensor.py::TestBinary::test_binary_fn21, test/test_maskedtensor.py::TestBinary::test_binary_fn22, test/test_maskedtensor.py::TestBinary::test_binary_fn23, test/test_maskedtensor.py::TestBinary::test_binary_fn24, test/test_maskedtensor.py::TestBinary::test_binary_fn25, test/test_maskedtensor.py::TestBinary::test_binary_fn26, test/test_maskedtensor.py::TestBinary::test_binary_fn27, test/test_maskedtensor.py::TestBinary::test_binary_fn28, test/test_maskedtensor.py::TestBinary::test_binary_fn29, test/test_maskedtensor.py::TestBinary::test_binary_fn3, test/test_maskedtensor.py::TestBinary::test_binary_fn30, test/test_maskedtensor.py::TestBinary::test_binary_fn31, test/test_maskedtensor.py::TestBinary::test_binary_fn32, test/test_maskedtensor.py::TestBinary::test_binary_fn33, test/test_maskedtensor.py::TestBinary::test_binary_fn34, test/test_maskedtensor.py::TestBinary::test_binary_fn35, test/test_maskedtensor.py::TestBinary::test_binary_fn4, test/test_maskedtensor.py::TestBinary::test_binary_fn5, test/test_maskedtensor.py::TestBinary::test_binary_fn6, test/test_maskedtensor.py::TestBinary::test_binary_fn7, test/test_maskedtensor.py::TestBinary::test_binary_fn8, test/test_maskedtensor.py::TestBinary::test_binary_fn9, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn0, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn1, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn10, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn11, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn12, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn13, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn14, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn15, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn16, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn17, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn18, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn19, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn2, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn20, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn21, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn22, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn23, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn24, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn25, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn26, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn27, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn28, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn29, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn3, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn4, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn5, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn6, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn7, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn8, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn9, test/test_maskedtensor.py::TestBinary::test_masks_match_fn_name_add, test/test_maskedtensor.py::TestBinary::test_masks_match_fn_name_add_, test/test_maskedtensor.py::TestReductions::test__is_any_true, test/test_maskedtensor.py::TestReductions::test__is_any_true_false, test/test_maskedtensor.py::TestReductions::test_all, test/test_maskedtensor.py::TestReductions::test_amax, test/test_maskedtensor.py::TestReductions::test_amax_grad, test/test_maskedtensor.py::TestReductions::test_amin, test/test_maskedtensor.py::TestReductions::test_amin_grad, test/test_maskedtensor.py::TestReductions::test_any_true_dtype, test/test_maskedtensor.py::TestReductions::test_backward, test/test_maskedtensor.py::TestReductions::test_grad_dtype, test/test_maskedtensor.py::TestReductions::test_max_not_implemented, test/test_maskedtensor.py::TestReductions::test_mean, test/test_maskedtensor.py::TestReductions::test_mean_dim_grad, test/test_maskedtensor.py::TestReductions::test_mean_grad_case_1a, test/test_maskedtensor.py::TestReductions::test_mean_grad_case_1b, test/test_maskedtensor.py::TestReductions::test_mean_grad_case_1c, test/test_maskedtensor.py::TestReductions::test_mean_grad_case_1d, test/test_maskedtensor.py::TestReductions::test_mean_grad_case_1e, test/test_maskedtensor.py::TestReductions::test_mean_grad_case_1f, test/test_maskedtensor.py::TestReductions::test_prod, test/test_maskedtensor.py::TestReductions::test_prod_grad, test/test_maskedtensor.py::TestReductions::test_sum, test/test_maskedtensor.py::TestReductions::test_sum_grad, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_angle_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_angle_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_angle_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_angle_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_angle_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_angle_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout2_cuda_float64, test/test_maskedtensor.py::TestBasicsCUDA::test_add_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_contiguous_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_diff_dim_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_diff_layouts_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_diff_sizes_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_grad_warning_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_invalid_sparse_coo_values_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_invalid_sparse_csr_values_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_invalid_sparse_layout_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_invalid_tensor_inputs_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_nn_unfold_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_softmax_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_stack_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_to_dense_and_sparse_coo_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_to_dense_and_sparse_csr_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_to_dense_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_to_device_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_to_dtype_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_to_sparse_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_unfold_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_where_cuda
2025-12-04T17:09:12.9682864Z 
2025-12-04T17:09:12.9683237Z Finished test_maskedtensor 1/1 ... [2025-12-04 17:09:12.891560][28581.274453082], took 2.26min
2025-12-04T17:09:12.9684680Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_maskedtensor/test_maskedtensor-1089c4e953521eec.xml
2025-12-04T17:09:13.0411969Z Running benchmark_utils/test_benchmark_utils 1/1 ... [2025-12-04 17:09:13.040837][28581.423729815]
2025-12-04T17:09:13.0412621Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:09:13.0415623Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'benchmark_utils/test_benchmark_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:09:13.041291]
2025-12-04T17:09:22.3696873Z 
2025-12-04T17:09:22.3698045Z benchmark_utils/test_benchmark_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/benchmark_utils.test_benchmark_utils_1.1_63175fb80c7f9ea7_.log
2025-12-04T17:09:22.3702957Z Running 9 items in this shard: test/benchmark_utils/test_benchmark_utils.py::TestBenchmarkUtils::test_adaptive_timer, test/benchmark_utils/test_benchmark_utils.py::TestBenchmarkUtils::test_collect_callgrind, test/benchmark_utils/test_benchmark_utils.py::TestBenchmarkUtils::test_collect_cpp_callgrind, test/benchmark_utils/test_benchmark_utils.py::TestBenchmarkUtils::test_compare, test/benchmark_utils/test_benchmark_utils.py::TestBenchmarkUtils::test_cpp_timer, test/benchmark_utils/test_benchmark_utils.py::TestBenchmarkUtils::test_fuzzer, test/benchmark_utils/test_benchmark_utils.py::TestBenchmarkUtils::test_manipulate_callgrind_stats, test/benchmark_utils/test_benchmark_utils.py::TestBenchmarkUtils::test_timer, test/benchmark_utils/test_benchmark_utils.py::TestBenchmarkUtils::test_timer_tiny_fast_snippet
2025-12-04T17:09:22.3707071Z 
2025-12-04T17:09:22.3707744Z Finished benchmark_utils/test_benchmark_utils 1/1 ... [2025-12-04 17:09:22.369471][28590.752366852], took 0.16min
2025-12-04T17:09:22.4059839Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/benchmark_utils.test_benchmark_utils/benchmark_utils.test_benchmark_utils-76c10c33afe299c4.xml
2025-12-04T17:09:22.4849813Z Running test_scaled_matmul_cuda 1/1 ... [2025-12-04 17:09:22.484682][28590.867576586]
2025-12-04T17:09:22.4850394Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:09:22.4853747Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_scaled_matmul_cuda.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:09:22.485110]
2025-12-04T17:09:31.7636167Z 
2025-12-04T17:09:31.7637381Z test_scaled_matmul_cuda 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_scaled_matmul_cuda_1.1_751f5e87909cbd5d_.log
2025-12-04T17:09:31.8337474Z Running 893 items in this shard: test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_compile_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_error_messages_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_error_messages_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_compile_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_1023_64_48_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_1025_128_96_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_127_96_1024_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_128_128_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_128_256_512_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_256_256_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_256_512_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_2_1024_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_31_1024_64_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_45_96_1024_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_512_128_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_error_message_fp8_pre_sm89_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float32_output_errors_with_bias_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_basics_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_bias_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_bias_relu_edgecase_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_error_messages_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_rowwise_scaling_sanity_use_fast_accum_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_rowwise_scaling_sanity_use_fast_accum_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_scale_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_scale_fast_accum_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_honor_sm_carveout_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_16_M_2048_N_8192_K_16640_format_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_16_M_2048_N_8192_K_16640_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_16_M_2048_N_8192_K_16640_format_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_16_M_2049_N_8192_K_16640_format_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_16_M_2049_N_8192_K_16640_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_16_M_2049_N_8192_K_16640_format_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_1_M_2048_N_8192_K_16640_format_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_1_M_2048_N_8192_K_16640_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_1_M_2048_N_8192_K_16640_format_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_1_M_2049_N_8192_K_16640_format_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_1_M_2049_N_8192_K_16640_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_1_M_2049_N_8192_K_16640_format_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_4_M_2048_N_8192_K_16640_format_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_4_M_2048_N_8192_K_16640_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_4_M_2048_N_8192_K_16640_format_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_4_M_2049_N_8192_K_16640_format_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_4_M_2049_N_8192_K_16640_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_4_M_2049_N_8192_K_16640_format_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_16_M_16640_N_8192_K_4096_format_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_16_M_16640_N_8192_K_4096_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_16_M_16640_N_8192_K_4096_format_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_1_M_16640_N_8192_K_4096_format_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_1_M_16640_N_8192_K_4096_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_1_M_16640_N_8192_K_4096_format_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_4_M_16640_N_8192_K_4096_format_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_4_M_16640_N_8192_K_4096_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_4_M_16640_N_8192_K_4096_format_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_non_divisible_leading_dim_bias_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_non_divisible_leading_dim_bias_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_pack_uint4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_False_strided_False_wrap_v2_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_False_strided_False_wrap_v2_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_False_strided_True_wrap_v2_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_False_strided_True_wrap_v2_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_True_strided_False_wrap_v2_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_True_strided_False_wrap_v2_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_True_strided_True_wrap_v2_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_True_strided_True_wrap_v2_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_False_strided_False_wrap_v2_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_False_strided_False_wrap_v2_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_False_strided_True_wrap_v2_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_False_strided_True_wrap_v2_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_True_strided_False_wrap_v2_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_True_strided_False_wrap_v2_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_True_strided_True_wrap_v2_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_True_strided_True_wrap_v2_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_2d_fast_accum_False_strided_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_2d_fast_accum_False_strided_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_2d_fast_accum_True_strided_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_2d_fast_accum_True_strided_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_3d_fast_accum_False_strided_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_3d_fast_accum_False_strided_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_3d_fast_accum_True_strided_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_3d_fast_accum_True_strided_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_change_stride_bfloat16_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_change_stride_float16_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_change_stride_float32_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_deepseek_error_messages_bfloat16_lhs_block_128_rhs_block_1_M_256_N_256_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_deepseek_error_messages_bfloat16_lhs_block_128_rhs_block_1_M_256_N_256_K_512_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_deepseek_error_messages_bfloat16_lhs_block_1_rhs_block_128_M_256_N_256_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_deepseek_error_messages_bfloat16_lhs_block_1_rhs_block_128_M_256_N_256_K_512_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_deepseek_error_messages_bfloat16_lhs_block_1_rhs_block_1_M_256_N_256_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_deepseek_error_messages_bfloat16_lhs_block_1_rhs_block_1_M_256_N_256_K_512_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_bfloat16_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_bfloat16_lhs_block_128_rhs_block_1_M_256_N_128_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_bfloat16_lhs_block_128_rhs_block_1_M_256_N_256_K_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_bfloat16_lhs_block_1_rhs_block_128_M_256_N_128_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_bfloat16_lhs_block_1_rhs_block_128_M_256_N_256_K_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_bfloat16_lhs_block_1_rhs_block_1_M_256_N_128_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_bfloat16_lhs_block_1_rhs_block_1_M_256_N_256_K_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_float32_lhs_block_128_rhs_block_1_M_256_N_128_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_float32_lhs_block_128_rhs_block_1_M_256_N_256_K_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_float32_lhs_block_1_rhs_block_128_M_256_N_128_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_float32_lhs_block_1_rhs_block_128_M_256_N_256_K_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_float32_lhs_block_1_rhs_block_1_M_256_N_128_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_float32_lhs_block_1_rhs_block_1_M_256_N_256_K_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_float16_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_float32_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_row_wise_bfloat16_shapes0_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_row_wise_float16_shapes0_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_row_wise_float32_shapes0_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_0_use_torch_compile_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_0_use_torch_compile_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_1_use_torch_compile_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_1_use_torch_compile_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_2_use_torch_compile_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_2_use_torch_compile_True_cuda
2025-12-04T17:09:31.9023384Z 
2025-12-04T17:09:31.9024093Z Finished test_scaled_matmul_cuda 1/1 ... [2025-12-04 17:09:31.765677][28600.14856865], took 0.15min
2025-12-04T17:09:31.9025467Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_scaled_matmul_cuda/test_scaled_matmul_cuda-d1f8763e6c1869e6.xml
2025-12-04T17:09:31.9320250Z Running torch_np/numpy_tests/core/test_shape_base 1/1 ... [2025-12-04 17:09:31.931701][28600.314594588]
2025-12-04T17:09:31.9320899Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:09:31.9324151Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_shape_base.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:09:31.932154]
2025-12-04T17:09:38.1561817Z 
2025-12-04T17:09:38.1563009Z torch_np/numpy_tests/core/test_shape_base 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_shape_base_1.1_0a0e6d68a930787e_.log
2025-12-04T17:09:38.1621540Z Running 119 items in this shard: test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast1d::test_0D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast1d::test_1D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast1d::test_2D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast1d::test_3D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast1d::test_r1array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast2d::test_0D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast2d::test_1D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast2d::test_2D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast2d::test_3D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast2d::test_r2array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast3d::test_0D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast3d::test_1D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast3d::test_2D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast3d::test_3D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestHstack::test_0D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestHstack::test_1D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestHstack::test_2D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestHstack::test_casting_and_dtype, test/torch_np/numpy_tests/core/test_shape_base.py::TestHstack::test_casting_and_dtype_type_error, test/torch_np/numpy_tests/core/test_shape_base.py::TestHstack::test_empty_input, test/torch_np/numpy_tests/core/test_shape_base.py::TestHstack::test_generator, test/torch_np/numpy_tests/core/test_shape_base.py::TestHstack::test_non_iterable, test/torch_np/numpy_tests/core/test_shape_base.py::TestVstack::test_0D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestVstack::test_1D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestVstack::test_2D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestVstack::test_2D_array2, test/torch_np/numpy_tests/core/test_shape_base.py::TestVstack::test_casting_and_dtype, test/torch_np/numpy_tests/core/test_shape_base.py::TestVstack::test_casting_and_dtype_type_error, test/torch_np/numpy_tests/core/test_shape_base.py::TestVstack::test_empty_input, test/torch_np/numpy_tests/core/test_shape_base.py::TestVstack::test_generator, test/torch_np/numpy_tests/core/test_shape_base.py::TestVstack::test_non_iterable, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_bad_out_shape, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_concatenate, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_concatenate_axis_None, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_exceptions, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_large_concatenate_axis_None, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_operator_concat, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_c8_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_c8_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_c8_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_c8_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_c8_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_f4_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_f4_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_f4_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_f4_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_f4_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_f8_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_f8_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_f8_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_f8_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_f8_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_i8_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_i8_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_i8_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_i8_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_i8_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_c8_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_c8_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_c8_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_c8_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_c8_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_f4_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_f4_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_f4_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_f4_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_f4_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_f8_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_f8_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_f8_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_f8_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_f8_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_i8_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_i8_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_i8_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_i8_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_i8_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_simple, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_returns_copy, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_c8_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_c8_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_c8_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_c8_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_c8_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_f4_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_f4_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_f4_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_f4_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_f4_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_f8_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_f8_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_f8_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_f8_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_f8_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_i8_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_i8_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_i8_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_i8_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_i8_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_3d, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_block_complicated, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_block_memory_order, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_block_mixed_1d_and_2d, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_block_simple_column_wise, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_block_simple_row_wise, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_block_total_size_estimate, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_block_with_1d_arrays_column_wise, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_block_with_1d_arrays_multiple_rows, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_block_with_1d_arrays_row_wise, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_block_with_mismatched_shape, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_different_ndims, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_different_ndims_depths, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_empty_lists, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_invalid_nesting, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_nested, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_no_lists, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_returns_copy, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_tuple
2025-12-04T17:09:38.1678836Z 
2025-12-04T17:09:38.1679273Z Finished torch_np/numpy_tests/core/test_shape_base 1/1 ... [2025-12-04 17:09:38.156253][28606.539148309], took 0.10min
2025-12-04T17:09:38.1934191Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.core.test_shape_base/torch_np.numpy_tests.core.test_shape_base-b9eed7c143bc9bc3.xml
2025-12-04T17:09:38.2640860Z Running test_vulkan 1/1 ... [2025-12-04 17:09:38.263784][28606.646679071]
2025-12-04T17:09:38.2641366Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:09:38.2644936Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_vulkan.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:09:38.264227]
2025-12-04T17:09:43.2861253Z 
2025-12-04T17:09:43.2862143Z test_vulkan 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_vulkan_1.1_2892328dc9a2ec74_.log
2025-12-04T17:09:43.2863169Z Running 1 items in this shard: test/test_vulkan.py::TestVulkanRewritePass::test_conv
2025-12-04T17:09:43.2863635Z 
2025-12-04T17:09:43.2863906Z Finished test_vulkan 1/1 ... [2025-12-04 17:09:43.285916][28611.668812892], took 0.08min
2025-12-04T17:09:43.3234076Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_vulkan/test_vulkan-b25d187bf3baa78a.xml
2025-12-04T17:09:43.3507281Z Running lazy/test_generator 1/1 ... [2025-12-04 17:09:43.350427][28611.733322111]
2025-12-04T17:09:43.3507861Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:09:43.3511337Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'lazy/test_generator.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:09:43.350878]
2025-12-04T17:09:49.0240164Z 
2025-12-04T17:09:49.0241729Z lazy/test_generator 1/1 was successful, full logs can be found in artifacts with path test/test-reports/lazy.test_generator_1.1_9fb7d5917fd83b83_.log
2025-12-04T17:09:49.0244756Z Running 2 items in this shard: test/lazy/test_generator.py::LazyGeneratorTest::test_generator, test/lazy/test_generator.py::LazyGeneratorTest::test_generator_causes_multiple_compiles
2025-12-04T17:09:49.0246602Z 
2025-12-04T17:09:49.0247237Z Finished lazy/test_generator 1/1 ... [2025-12-04 17:09:49.023775][28617.406671142], took 0.09min
2025-12-04T17:09:49.0621487Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/lazy.test_generator/lazy.test_generator-42072a3593c4e25d.xml
2025-12-04T17:09:49.0923040Z Running torch_np/numpy_tests/linalg/test_linalg 1/1 ... [2025-12-04 17:09:49.091883][28617.474777746]
2025-12-04T17:09:49.0924300Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:09:49.0927148Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/linalg/test_linalg.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:09:49.092364]
2025-12-04T17:10:03.5289562Z 
2025-12-04T17:10:03.5290992Z torch_np/numpy_tests/linalg/test_linalg 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.linalg.test_linalg_1.1_f8a6a4a0c07965ac_.log
2025-12-04T17:10:03.5405053Z Running 268 items in this shard: test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_0_size_k, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_empty_identity, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_generalized_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_generalized_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_basic_nonsvd, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_nan, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_singular, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_stacked_singular, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_empty_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_generalized_empty_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_generalized_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinvHermitian::test_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinvHermitian::test_generalized_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinvHermitian::test_generalized_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinvHermitian::test_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_zero, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_0_n_0_n_rhs_0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_0_n_4_n_rhs_1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_0_n_4_n_rhs_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_4_n_0_n_rhs_1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_4_n_0_n_rhs_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_4_n_2_n_rhs_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_future_rcond, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_incompatible_dims, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalshCases::test_generalized_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalshCases::test_generalized_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalshCases::test_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalshCases::test_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_UPLO, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_invalid, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEighCases::test_generalized_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEighCases::test_generalized_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEighCases::test_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEighCases::test_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_UPLO, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_invalid, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNorm_NonSystematic::test_intmin, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_axis, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_bad_args, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_keepdims, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_matrix_2x2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_matrix_3x3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_matrix_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_matrix_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_vector, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_vector_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_axis, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_bad_args, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_keepdims, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_matrix_2x2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_matrix_3x3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_matrix_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_matrix_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_vector, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_vector_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_axis, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_bad_args, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_keepdims, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_matrix_2x2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_matrix_3x3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_matrix_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_matrix_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_vector, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_vector_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMatrixRank::test_matrix_rank, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMatrixRank::test_reduced_rank, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMatrixRank::test_symmetric_rank, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_mode_all_but_economic, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_mode_raw, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_qr_empty_m_0_n_0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_qr_empty_m_0_n_3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_qr_empty_m_3_n_0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape1_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape1_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape1_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape1_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape2_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape2_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape2_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape2_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape3_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape3_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape3_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape3_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape4_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape4_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape4_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape4_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc::test_byteorder_check, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc::test_generalized_raise_multiloop, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc::test_sdot_bug_8577, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc::test_xerbla_override, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_basic_function_with_dynamic_programming_optimization, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_basic_function_with_three_arguments, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_basic_function_with_two_arguments, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_dynamic_programming_logic, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_dynamic_programming_optimization_and_out, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_three_arguments_and_out, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_too_few_input_arrays, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_two_arguments_and_out, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_vector_as_first_and_last_argument, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_vector_as_first_argument, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_vector_as_last_argument, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_non_square_handling_arr0_ind_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_non_square_handling_arr1_ind_1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_ind_limit_ind_-2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_ind_limit_ind_0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_result, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_shape_shape0_ind_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_shape_shape1_ind_1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_non_square_handling_a0_axes0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_non_square_handling_a1_axes1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_tensorsolve_result_shape0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_tensorsolve_result_shape1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_tensorsolve_result_shape2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc2::test_blas64_dot, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc2::test_blas64_geqrf_lwork_smoketest, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc2::test_unsupported_commontype
2025-12-04T17:10:03.5518211Z 
2025-12-04T17:10:03.5518712Z Finished torch_np/numpy_tests/linalg/test_linalg 1/1 ... [2025-12-04 17:10:03.529160][28631.912055296], took 0.24min
2025-12-04T17:10:03.5663007Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.linalg.test_linalg/torch_np.numpy_tests.linalg.test_linalg-2974f2048ff6a577.xml
2025-12-04T17:10:03.6565045Z Running torch_np/numpy_tests/core/test_dtype 1/1 ... [2025-12-04 17:10:03.656158][28632.039051388]
2025-12-04T17:10:03.6565667Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:10:03.6568616Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_dtype.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:10:03.656595]
2025-12-04T17:10:09.7800715Z 
2025-12-04T17:10:09.7802007Z torch_np/numpy_tests/core/test_dtype 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_dtype_1.1_7868c6a3dd1e371a_.log
2025-12-04T17:10:09.7852731Z Running 102 items in this shard: test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_equivalent_dtype_hashing, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_invalid_types, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Bool, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Bytes0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Complex128, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Complex32, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Complex64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Datetime64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Float128, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Float16, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Float32, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Float64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Int16, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Int32, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Int64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Int8, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Object0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Str0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Timedelta64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_UInt16, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_UInt32, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_UInt64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_UInt8, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Uint32, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Uint64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Void0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_richcompare_invalid_dtype_comparison_operation0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_richcompare_invalid_dtype_comparison_operation1, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_richcompare_invalid_dtype_comparison_operation2, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_richcompare_invalid_dtype_comparison_operation3, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_richcompare_invalid_dtype_equality, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_run_t0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_run_t1, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_run_t2, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_run_t3, test/torch_np/numpy_tests/core/test_dtype.py::TestDtypeAttributeDeletion::test_dtype_non_writable_attributes_deletion, test/torch_np/numpy_tests/core/test_dtype.py::TestDtypeAttributeDeletion::test_dtype_writable_attributes_deletion, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_builtin_t0, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_builtin_t1, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_builtin_t2, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_builtin_t3, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_builtin_t4, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_DType11, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_bool__10, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_complex128_4, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_complex64_3, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_float16_0, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_float32_1, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_float64_2, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_int16_7, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_int32_8, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_int64_9, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_int8_6, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_uint8_5, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_other_value_based_complex64_complex64_None, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_other_value_based_float16_complex64_None, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_other_value_based_float32_complex64_None, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_other_value_based_other_4294967295_expected1_expected_weak1, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_other_value_based_other_65535_expected0_expected_weak0, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other0_expected0, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other1_expected1, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other2_expected2, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other3_expected3, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other4_expected4, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other5_expected5, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other6_expected6, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes0_expected0, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes1_expected1, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes2_expected2, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes3_expected3, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes4_expected4, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes5_expected5, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes6_expected6, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes7_expected7, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes8_expected8, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes9_expected9, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_python_integer_promotion_val_18446744073709551616, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_python_integer_promotion_val_2, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_python_integer_promotion_val_200, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_python_integer_promotion_val_4294967296, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_python_integer_promotion_val_9223372036854775808, test/torch_np/numpy_tests/core/test_dtype.py::TestMisc::test_dtypes_are_true, test/torch_np/numpy_tests/core/test_dtype.py::TestMisc::test_keyword_argument, test/torch_np/numpy_tests/core/test_dtype.py::TestFromDTypeAttribute::test_recursion, test/torch_np/numpy_tests/core/test_dtype.py::TestFromDTypeAttribute::test_simple, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_?, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_B, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_D, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_F, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_b, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_d, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_e, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_f, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_h, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_i, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_l, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_subscript_scalar, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_subscript_tuple_arg_len_0, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_subscript_tuple_arg_len_1, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_subscript_tuple_arg_len_2, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_subscript_tuple_arg_len_3
2025-12-04T17:10:09.7902780Z 
2025-12-04T17:10:09.7903206Z Finished torch_np/numpy_tests/core/test_dtype 1/1 ... [2025-12-04 17:10:09.780023][28638.16291905], took 0.10min
2025-12-04T17:10:09.8179648Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.core.test_dtype/torch_np.numpy_tests.core.test_dtype-4b8c4285965a7813.xml
2025-12-04T17:10:09.8941297Z Running lazy/test_debug_util 1/1 ... [2025-12-04 17:10:09.893777][28638.276671492]
2025-12-04T17:10:09.8941870Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:10:09.8945043Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'lazy/test_debug_util.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:10:09.894214]
2025-12-04T17:10:15.6171435Z 
2025-12-04T17:10:15.6172412Z lazy/test_debug_util 1/1 was successful, full logs can be found in artifacts with path test/test-reports/lazy.test_debug_util_1.1_1fda475a5f9a06f5_.log
2025-12-04T17:10:15.6173598Z Running 1 items in this shard: test/lazy/test_debug_util.py::DebugUtilTest::test_get_python_frames
2025-12-04T17:10:15.6174112Z 
2025-12-04T17:10:15.6174455Z Finished lazy/test_debug_util 1/1 ... [2025-12-04 17:10:15.616895][28643.999791016], took 0.10min
2025-12-04T17:10:15.6549329Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/lazy.test_debug_util/lazy.test_debug_util-7c02b1e3dfee61bd.xml
2025-12-04T17:10:15.6928455Z Running nn/test_load_state_dict 1/1 ... [2025-12-04 17:10:15.692523][28644.075416258]
2025-12-04T17:10:15.6929029Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:10:15.6932164Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_load_state_dict.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:10:15.692955]
2025-12-04T17:10:21.9667334Z 
2025-12-04T17:10:21.9668499Z nn/test_load_state_dict 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_load_state_dict_1.1_54a686ad2f48d7f9_.log
2025-12-04T17:10:21.9682925Z Running 29 items in this shard: test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_BC_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_BC_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_meta_swap_False_keep_vars_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_meta_swap_False_keep_vars_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_meta_swap_True_keep_vars_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_meta_swap_True_keep_vars_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_shape_stride_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_shape_stride_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_with_optimizer_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_with_optimizer_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_child_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_child_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_custom_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_custom_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_invalid_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_invalid_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_ref_cycle_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_type_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_type_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_warn_assign_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_warn_assign_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_with_unexpected_key_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_with_unexpected_key_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_scalar_param_1d_tensor_raises_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_scalar_param_1d_tensor_raises_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDictSwap::test_swap_subclass_swap_True_assign_False, test/nn/test_load_state_dict.py::TestLoadStateDictSwap::test_swap_subclass_swap_True_assign_True
2025-12-04T17:10:21.9696601Z 
2025-12-04T17:10:21.9696956Z Finished nn/test_load_state_dict 1/1 ... [2025-12-04 17:10:21.966504][28650.349401217], took 0.10min
2025-12-04T17:10:22.0047346Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_load_state_dict/nn.test_load_state_dict-e81d6ed8d3f8789f.xml
2025-12-04T17:10:22.0757260Z Running test_shape_ops 1/1 ... [2025-12-04 17:10:22.075403][28650.458297383]
2025-12-04T17:10:22.0760045Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:10:22.0761285Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_shape_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:10:22.075840]
2025-12-04T17:10:30.7533474Z 
2025-12-04T17:10:30.7534407Z test_shape_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_shape_ops_1.1_4cd0c635a81aa180_.log
2025-12-04T17:10:30.7568009Z Running 99 items in this shard: test/test_shape_ops.py::TestShapeOpsCUDA::test_clamp_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_clamp_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_clamp_propagates_nans_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_clamp_raises_arg_errors_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_complex_rot90_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_complex_rot90_cuda_complex64, test/test_shape_ops.py::TestShapeOpsCUDA::test_diag_cuda_bool, test/test_shape_ops.py::TestShapeOpsCUDA::test_diag_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_diagonal_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_diagonal_multidim_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_bfloat16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_bool, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_complex64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_float16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_int16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_int32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_int8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_uint8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_bfloat16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_bool, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_complex64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_float16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_int16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_int32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_int8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_uint8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_large_tensor_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_bfloat16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_bool, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_complex64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_float16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_int16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_int32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_int8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_uint8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_unsupported_dtype_cuda_quint2x4, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_unsupported_dtype_cuda_quint4x2, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_invalid_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_invalid_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_invalid_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_invalid_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_invalid_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_invalid_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_invalid_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_invalid_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_invalid_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_astuple_out_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_bfloat16, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_bool, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_float16, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_int16, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_int32, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_int8, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_uint8, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_discontiguous_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_no_warning_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_non_diff_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_rot90_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_sparse_dense_dim_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_sparse_dense_dim_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_sparse_dense_dim_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_tolist_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_float16, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_int16, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_int32, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_int8, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_uint8, test/test_shape_ops.py::TestShapeOpsCUDA::test_unbind_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_unfold_all_devices_and_dtypes_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_unfold_backward_errors_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_unfold_errors_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_unfold_scalars_cuda
2025-12-04T17:10:30.7601023Z 
2025-12-04T17:10:30.7601318Z Finished test_shape_ops 1/1 ... [2025-12-04 17:10:30.753230][28659.136125766], took 0.14min
2025-12-04T17:10:30.7915005Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_shape_ops/test_shape_ops-a6160583c0856270.xml
2025-12-04T17:10:30.8781197Z Running nn/test_module_hooks 1/1 ... [2025-12-04 17:10:30.877796][28659.260687459]
2025-12-04T17:10:30.8781774Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:10:30.8784996Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_module_hooks.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:10:30.878248]
2025-12-04T17:10:36.7518298Z 
2025-12-04T17:10:36.7519303Z nn/test_module_hooks 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_module_hooks_1.1_b8e5016c3845034d_.log
2025-12-04T17:10:36.7542112Z Running 53 items in this shard: test/nn/test_module_hooks.py::TestModuleHooks::test_always_called_forward_hooks, test/nn/test_module_hooks.py::TestModuleHooks::test_bw_hook_warning_for_non_tensor_or_tuple, test/nn/test_module_hooks.py::TestModuleHooks::test_forward_hooks_named_tuple_False, test/nn/test_module_hooks.py::TestModuleHooks::test_forward_hooks_named_tuple_True, test/nn/test_module_hooks.py::TestModuleHooks::test_forward_pre_hooks_named_tuple_False, test/nn/test_module_hooks.py::TestModuleHooks::test_forward_pre_hooks_named_tuple_True, test/nn/test_module_hooks.py::TestModuleHooks::test_full_backward_hooks_named_tuple_False, test/nn/test_module_hooks.py::TestModuleHooks::test_full_backward_hooks_named_tuple_True, test/nn/test_module_hooks.py::TestModuleHooks::test_full_backward_pre_hooks_named_tuple_False, test/nn/test_module_hooks.py::TestModuleHooks::test_full_backward_pre_hooks_named_tuple_True, test/nn/test_module_hooks.py::TestModuleHooks::test_kwarg_hooks, test/nn/test_module_hooks.py::TestModuleHooks::test_mixed_hooks_named_tuple_False, test/nn/test_module_hooks.py::TestModuleHooks::test_mixed_hooks_named_tuple_True, test/nn/test_module_hooks.py::TestModuleHooks::test_remove_kwarg_hooks, test/nn/test_module_hooks.py::TestStateDictHooks::test_load_state_dict_module_pre_hook_swap_False, test/nn/test_module_hooks.py::TestStateDictHooks::test_load_state_dict_module_pre_hook_swap_True, test/nn/test_module_hooks.py::TestStateDictHooks::test_load_state_dict_post_hook_backward_compatibility_swap_False, test/nn/test_module_hooks.py::TestStateDictHooks::test_load_state_dict_post_hook_backward_compatibility_swap_True, test/nn/test_module_hooks.py::TestStateDictHooks::test_load_state_dict_post_hook_swap_False, test/nn/test_module_hooks.py::TestStateDictHooks::test_load_state_dict_post_hook_swap_True, test/nn/test_module_hooks.py::TestStateDictHooks::test_load_state_dict_pre_hook_swap_False, test/nn/test_module_hooks.py::TestStateDictHooks::test_load_state_dict_pre_hook_swap_True, test/nn/test_module_hooks.py::TestStateDictHooks::test_no_extra_ref_to_module, test/nn/test_module_hooks.py::TestStateDictHooks::test_pickled_hook, test/nn/test_module_hooks.py::TestStateDictHooks::test_register_state_dict_post_hook_private_False, test/nn/test_module_hooks.py::TestStateDictHooks::test_register_state_dict_post_hook_private_True, test/nn/test_module_hooks.py::TestStateDictHooks::test_register_state_dict_pre_hook, test/nn/test_module_hooks.py::TestStateDictHooks::test_register_state_dict_pre_hook_backward_compat, test/nn/test_module_hooks.py::TestStateDictHooks::test_register_state_dict_pre_hook_lazy_module, test/nn/test_module_hooks.py::TestModuleGlobalHooks::test_global_and_local_hooks_order, test/nn/test_module_hooks.py::TestModuleGlobalHooks::test_module_backward_global_hook_writeable, test/nn/test_module_hooks.py::TestModuleGlobalHooks::test_module_forward_forward_hook_removable, test/nn/test_module_hooks.py::TestModuleGlobalHooks::test_module_forward_preforward_hook_removable, test/nn/test_module_hooks.py::TestModuleGlobalHooks::test_module_global_forward_preforward_hook_writeable, test/nn/test_module_hooks.py::TestModuleGlobalHooks::test_module_global_hook_invalid_outputs, test/nn/test_module_hooks.py::TestModuleGlobalHooks::test_module_global_hooks, test/nn/test_module_hooks.py::TestModuleGlobalHooks::test_module_global_hooks_with_kwargs, test/nn/test_module_hooks.py::TestModuleHookNN::test_backward_hooks_interaction, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_backward_size, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_backward_writeable, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_buffer_registration, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_cpp, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_extra_input, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_forward_preforward_writable, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_inplace, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_invalid_outputs, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_last_arg_requires_grad, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_no_requires_grad, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_non_full_warning, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_parameter_registration, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_requires_grad, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_submodule_registration, test/nn/test_module_hooks.py::TestModuleHookNN::test_hooks
2025-12-04T17:10:36.7564247Z 
2025-12-04T17:10:36.7564585Z Finished nn/test_module_hooks 1/1 ... [2025-12-04 17:10:36.751660][28665.134557521], took 0.10min
2025-12-04T17:10:36.7900424Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_module_hooks/nn.test_module_hooks-e13d4f4eb9af9666.xml
2025-12-04T17:10:36.8341161Z Running torch_np/numpy_tests/lib/test_twodim_base 1/1 ... [2025-12-04 17:10:36.833821][28665.216715404]
2025-12-04T17:10:36.8341842Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:10:36.8344963Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_twodim_base.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:10:36.834248]
2025-12-04T17:10:42.8078617Z 
2025-12-04T17:10:42.8080079Z torch_np/numpy_tests/lib/test_twodim_base 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_twodim_base_1.1_facf24e95ed5355d_.log
2025-12-04T17:10:42.8093999Z Running 34 items in this shard: test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_2d, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_basic, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_bool, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_diag, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_diag2d, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_eye_bounds, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_order, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestDiag::test_diag_bounds, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestDiag::test_failure, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestDiag::test_fortran_order, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestDiag::test_matrix, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestDiag::test_vector, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestFliplr::test_basic, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestFlipud::test_basic, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_all_outliers, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_asym, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_bad_length_x_len_10_y_len_11, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_bad_length_x_len_20_y_len_19, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_binparameter_combination, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_density, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_empty, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_simple, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_dtype, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_mask_indices, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_tril_indices, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_tril_triu_dtype, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_tril_triu_ndim2, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_tril_triu_ndim3, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_tril_triu_with_inf, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTriuIndices::test_triu_indices, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTrilIndicesFrom::test_exceptions, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTriuIndicesFrom::test_exceptions, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestVander::test_basic, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestVander::test_dtypes
2025-12-04T17:10:42.8107200Z 
2025-12-04T17:10:42.8107622Z Finished torch_np/numpy_tests/lib/test_twodim_base 1/1 ... [2025-12-04 17:10:42.807696][28671.1905916], took 0.10min
2025-12-04T17:10:42.8462973Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_twodim_base/torch_np.numpy_tests.lib.test_twodim_base-2da66c446de8da89.xml
2025-12-04T17:10:42.9146738Z Running profiler/test_memory_profiler 1/1 ... [2025-12-04 17:10:42.914328][28671.297222428]
2025-12-04T17:10:42.9147396Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:10:42.9150338Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_memory_profiler.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:10:42.914766]
2025-12-04T17:10:55.0471921Z 
2025-12-04T17:10:55.0473152Z profiler/test_memory_profiler 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_memory_profiler_1.1_70baf0213dbc5855_.log
2025-12-04T17:10:55.0489692Z Running 33 items in this shard: test/profiler/test_memory_profiler.py::TestMemoryProfiler::test_config_check, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_from_module, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_from_module_and_optimizer, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_from_optimizer, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_from_optimizer_set_to_none, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_low_level, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_complicated, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_non_op_allocations, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_simple, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_simple_backward, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_simple_inplace, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_stacked, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_with_annotations, test/profiler/test_memory_profiler.py::TestDataFlow::test_match_schemas, test/profiler/test_memory_profiler.py::TestDataFlow::test_match_schemas_backward, test/profiler/test_memory_profiler.py::TestDataFlow::test_match_schemas_tensorlist, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_sequential_fwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_sequential_fwd_bwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_fwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_fwd_bwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_fwd_bwd_step, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_module_fwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_module_fwd_bwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_module_fwd_bwd_step, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_inputs_fwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_inputs_fwd_bwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_inputs_fwd_lazy, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_lazily_initialized, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_manual_optimizer_step, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_memory_timeline, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_parameters_and_gradients, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_parameters_and_gradients_set_to_none, test/profiler/test_memory_profiler.py::TestMemoryProfilerTimelineCUDA::test_memory_timeline_no_id_cuda
2025-12-04T17:10:55.0505407Z 
2025-12-04T17:10:55.0505789Z Finished profiler/test_memory_profiler 1/1 ... [2025-12-04 17:10:55.046984][28683.429881605], took 0.20min
2025-12-04T17:10:55.0857591Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/profiler.test_memory_profiler/profiler.test_memory_profiler-20f5e2eefecacaee.xml
2025-12-04T17:10:55.1542330Z Running test_jit_llga_fuser 1/1 ... [2025-12-04 17:10:55.153875][28683.536769]
2025-12-04T17:10:55.1542893Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:10:55.1545596Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_jit_llga_fuser.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:10:55.154301]
2025-12-04T17:11:23.5679669Z 
2025-12-04T17:11:23.5680957Z test_jit_llga_fuser 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_jit_llga_fuser_1.1_a67e637a7f701026_.log
2025-12-04T17:11:23.5722654Z Running 107 items in this shard: test/test_jit_llga_fuser.py::TestEnableDisableLlgaFuser::test_context_manager, test/test_jit_llga_fuser.py::TestDynamoAOT::test_dynamo_aot_ts_onednn, test/test_jit_llga_fuser.py::TestModel::test_vision_alexnet_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_alexnet_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_densenet121_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_densenet121_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_densenet161_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_densenet161_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_densenet169_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_densenet169_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_densenet201_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_densenet201_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b0_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b0_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b1_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b1_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b2_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b2_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b3_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b3_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b4_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b4_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b5_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b5_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b6_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b6_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b7_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b7_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_googlenet_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_googlenet_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_mnasnet1_0_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_mnasnet1_0_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_mobilenet_v2_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_mobilenet_v2_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_mobilenet_v3_large_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_mobilenet_v3_large_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_regnet_y_400mf_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_regnet_y_400mf_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_resnet50_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_resnet50_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_resnext101_32x8d_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_resnext101_32x8d_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_resnext50_32x4d_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_resnext50_32x4d_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_shufflenet_v2_x1_0_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_shufflenet_v2_x1_0_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_squeezenet1_0_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_squeezenet1_0_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_vgg16_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_vgg16_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_wide_resnet50_2_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_wide_resnet50_2_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_bn2d_eltwise_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_bn2d_eltwise_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_bn_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_bn_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_bn_relu_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_bn_relu_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_clamp_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_clamp_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_eltwise_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_eltwise_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_silu_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_silu_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_sum_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_sum_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_ensure_tensor_is_rewrapped_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_ensure_tensor_is_rewrapped_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_linear_eltwise_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_linear_eltwise_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_rewrap_tensor_input_to_pytorch_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_rewrap_tensor_input_to_pytorch_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_wildcard_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_wildcard_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_wildcard_unsupported_dtype_cuda_int32, test/test_jit_llga_fuser.py::TestOpCUDA::test_add_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_add_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_add_scalar_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_add_scalar_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_addmm_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_addmm_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_avg_pool2d_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_avg_pool2d_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_bn2d_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_bn2d_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_cat_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_cat_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_conv2d_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_conv2d_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_eltwise_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_eltwise_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_identity_binary_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_identity_binary_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_layer_norm_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_layer_norm_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_linear_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_linear_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_max_pool2d_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_max_pool2d_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_mul_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_mul_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_softmax_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_softmax_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_typecheck_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_typecheck_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_variable_kernel_avg_pool2d_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_variable_kernel_avg_pool2d_cuda_float32
2025-12-04T17:11:23.5760229Z 
2025-12-04T17:11:23.5760547Z Finished test_jit_llga_fuser 1/1 ... [2025-12-04 17:11:23.567937][28711.950832179], took 0.47min
2025-12-04T17:11:23.6073223Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_jit_llga_fuser/test_jit_llga_fuser-b203cab2c461ce78.xml
2025-12-04T17:11:23.6836773Z Running optim/test_optim 1/1 ... [2025-12-04 17:11:23.683313][28712.06620721]
2025-12-04T17:11:23.6837339Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:11:23.6840695Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'optim/test_optim.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:11:23.683763]
2025-12-04T17:11:28.5287776Z 
2025-12-04T17:11:28.5288758Z optim/test_optim 1/1 was successful, full logs can be found in artifacts with path test/test-reports/optim.test_optim_1.1_e409dee8e8c07436_.log
2025-12-04T17:11:28.5289500Z 
2025-12-04T17:11:28.5289831Z Finished optim/test_optim 1/1 ... [2025-12-04 17:11:28.528541][28716.911438104], took 0.08min
2025-12-04T17:11:28.5681262Z Running torch_np/numpy_tests/core/test_getlimits 1/1 ... [2025-12-04 17:11:28.567812][28716.950706048]
2025-12-04T17:11:28.5682155Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:11:28.5685268Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_getlimits.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:11:28.568258]
2025-12-04T17:11:34.3913585Z 
2025-12-04T17:11:34.3914769Z torch_np/numpy_tests/core/test_getlimits 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_getlimits_1.1_827a2f053af78584_.log
2025-12-04T17:11:34.3922498Z Running 17 items in this shard: test/torch_np/numpy_tests/core/test_getlimits.py::TestPythonFloat::test_singleton, test/torch_np/numpy_tests/core/test_getlimits.py::TestHalf::test_singleton, test/torch_np/numpy_tests/core/test_getlimits.py::TestSingle::test_singleton, test/torch_np/numpy_tests/core/test_getlimits.py::TestDouble::test_singleton, test/torch_np/numpy_tests/core/test_getlimits.py::TestFinfo::test_basic, test/torch_np/numpy_tests/core/test_getlimits.py::TestFinfo::test_basic_missing, test/torch_np/numpy_tests/core/test_getlimits.py::TestIinfo::test_basic, test/torch_np/numpy_tests/core/test_getlimits.py::TestIinfo::test_unsigned_max_T0, test/torch_np/numpy_tests/core/test_getlimits.py::TestIinfo::test_unsigned_max_T1, test/torch_np/numpy_tests/core/test_getlimits.py::TestIinfo::test_unsigned_max_T2, test/torch_np/numpy_tests/core/test_getlimits.py::TestIinfo::test_unsigned_max_T3, test/torch_np/numpy_tests/core/test_getlimits.py::TestRepr::test_finfo_repr, test/torch_np/numpy_tests/core/test_getlimits.py::TestRepr::test_iinfo_repr, test/torch_np/numpy_tests/core/test_getlimits.py::TestMisc::test_instances, test/torch_np/numpy_tests/core/test_getlimits.py::TestMisc::test_known_types, test/torch_np/numpy_tests/core/test_getlimits.py::TestMisc::test_plausible_finfo, test/torch_np/numpy_tests/core/test_getlimits.py::TestMisc::test_subnormal_warning
2025-12-04T17:11:34.3929350Z 
2025-12-04T17:11:34.3929783Z Finished torch_np/numpy_tests/core/test_getlimits 1/1 ... [2025-12-04 17:11:34.391134][28722.774031195], took 0.10min
2025-12-04T17:11:34.4302204Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.core.test_getlimits/torch_np.numpy_tests.core.test_getlimits-5149534e2555ec6f.xml
2025-12-04T17:11:34.5026541Z Running torch_np/test_ndarray_methods 1/1 ... [2025-12-04 17:11:34.502319][28722.885212469]
2025-12-04T17:11:34.5027163Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:11:34.5029929Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_ndarray_methods.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:11:34.502758]
2025-12-04T17:11:44.3318799Z 
2025-12-04T17:11:44.3319933Z torch_np/test_ndarray_methods 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_ndarray_methods_1.1_793e3aaaf30f7d3c_.log
2025-12-04T17:11:44.3480381Z Running 342 items in this shard: test/torch_np/test_ndarray_methods.py::TestIndexing::test_indexing_simple, test/torch_np/test_ndarray_methods.py::TestIndexing::test_setitem, test/torch_np/test_ndarray_methods.py::TestReshape::test_reshape_function, test/torch_np/test_ndarray_methods.py::TestReshape::test_reshape_method, test/torch_np/test_ndarray_methods.py::TestTranspose::test_transpose_function, test/torch_np/test_ndarray_methods.py::TestTranspose::test_transpose_method, test/torch_np/test_ndarray_methods.py::TestRavel::test_ravel_function, test/torch_np/test_ndarray_methods.py::TestRavel::test_ravel_method, test/torch_np/test_ndarray_methods.py::TestNonzero::test_array_method, test/torch_np/test_ndarray_methods.py::TestNonzero::test_nonzero_onedim, test/torch_np/test_ndarray_methods.py::TestNonzero::test_nonzero_trivial, test/torch_np/test_ndarray_methods.py::TestNonzero::test_nonzero_twodim, test/torch_np/test_ndarray_methods.py::TestNonzero::test_sparse, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_all_method_max, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_all_method_min, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size0_axis0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size0_axis0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size10_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size10_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size11_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size11_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size12_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size12_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size13_axis13_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size13_axis13_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size14_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size14_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size15_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size15_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size16_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size16_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size17_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size17_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size18_axis18_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size18_axis18_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size19_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size19_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size1_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size1_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size20_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size20_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size21_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size21_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size22_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size22_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size23_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size23_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size24_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size24_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size25_axis25_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size25_axis25_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size26_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size26_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size27_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size27_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size28_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size28_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size29_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size29_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size2_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size2_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size30_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size30_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size31_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size31_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size32_axis32_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size32_axis32_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size33_axis_-4_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size33_axis_-4_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size34_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size34_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size35_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size35_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size36_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size36_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size37_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size37_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size38_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size38_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size39_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size39_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size3_axis3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size3_axis3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size40_axis_3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size40_axis_3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size41_axis41_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size41_axis41_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size42_axis_-4_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size42_axis_-4_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size43_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size43_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size44_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size44_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size45_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size45_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size46_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size46_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size47_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size47_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size48_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size48_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size49_axis_3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size49_axis_3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size4_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size4_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size50_axis50_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size50_axis50_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size51_axis_-4_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size51_axis_-4_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size52_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size52_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size53_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size53_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size54_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size54_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size55_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size55_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size56_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size56_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size57_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size57_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size58_axis_3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size58_axis_3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size59_axis59_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size59_axis59_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size5_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size5_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size60_axis_-4_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size60_axis_-4_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size61_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size61_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size62_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size62_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size63_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size63_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size64_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size64_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size65_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size65_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size66_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size66_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size67_axis_3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size67_axis_3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size68_axis68_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size68_axis68_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size69_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size69_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size6_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size6_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size70_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size70_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size71_axis71_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size71_axis71_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size72_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size72_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size73_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size73_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size74_axis74_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size74_axis74_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size75_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size75_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size76_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size76_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size77_axis77_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size77_axis77_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size7_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size7_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size8_axis8_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size8_axis8_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size9_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size9_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_vs_ndarray_arr_method_argmax_np_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_vs_ndarray_arr_method_argmin_np_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_vs_ndarray_positional_arr_method_argmax_np_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_vs_ndarray_positional_arr_method_argmin_np_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_output_shape_method_argmax, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_output_shape_method_argmin, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_0_method_argmax, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_0_method_argmin, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_1_method_argmax, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_1_method_argmin, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data0, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data1, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data10, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data11, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data12, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data13, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data14, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data15, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data16, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data17, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data18, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data19, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data2, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data20, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data21, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data22, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data23, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data24, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data25, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data26, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data27, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data28, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data29, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data3, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data30, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data31, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data32, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data33, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data34, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data35, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data36, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data37, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data38, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data39, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data4, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data40, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data41, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data42, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data43, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data44, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data45, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data46, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data47, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data48, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data49, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data5, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data50, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data51, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data52, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data53, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data54, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data55, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data56, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data57, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data58, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data59, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data6, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data60, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data61, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data62, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data63, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data64, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data65, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data66, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data67, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data68, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data69, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data7, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data70, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data71, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data72, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data73, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data8, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data9, test/torch_np/test_ndarray_methods.py::TestArgmax::test_maximum_signed_integers, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data0, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data1, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data10, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data11, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data12, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data13, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data14, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data15, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data16, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data17, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data18, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data19, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data2, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data20, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data21, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data22, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data23, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data24, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data25, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data26, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data27, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data28, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data29, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data3, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data30, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data31, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data32, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data33, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data34, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data35, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data36, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data37, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data38, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data39, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data4, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data40, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data41, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data42, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data43, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data44, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data45, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data46, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data47, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data48, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data49, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data5, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data50, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data51, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data52, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data53, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data54, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data55, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data56, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data57, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data58, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data59, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data6, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data60, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data61, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data62, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data63, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data64, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data65, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data66, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data67, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data68, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data69, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data7, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data70, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data71, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data72, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data73, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data8, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data9, test/torch_np/test_ndarray_methods.py::TestArgmin::test_minimum_signed_integers, test/torch_np/test_ndarray_methods.py::TestAmax::test_basic, test/torch_np/test_ndarray_methods.py::TestAmin::test_basic, test/torch_np/test_ndarray_methods.py::TestContains::test_contains, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_fn, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_ivar, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_method, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_name, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_plain, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_rvar, test/torch_np/test_ndarray_methods.py::TestIter::test_iter_1d, test/torch_np/test_ndarray_methods.py::TestIter::test_iter_2d
2025-12-04T17:11:44.3638413Z 
2025-12-04T17:11:44.3638815Z Finished torch_np/test_ndarray_methods 1/1 ... [2025-12-04 17:11:44.332351][28732.715245344], took 0.16min
2025-12-04T17:11:44.3717905Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.test_ndarray_methods/torch_np.test_ndarray_methods-fe7c638b86097b2d.xml
2025-12-04T17:11:44.4740225Z Running test_view_ops 1/1 ... [2025-12-04 17:11:44.473689][28732.85658216]
2025-12-04T17:11:44.4740745Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:11:44.4743807Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_view_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:11:44.474145]
2025-12-04T17:12:07.4722256Z 
2025-12-04T17:12:07.4723164Z test_view_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_view_ops_1.1_405f53c81662ed35_.log
2025-12-04T17:12:07.4821986Z Running 279 items in this shard: test/test_view_ops.py::TestViewOpsCUDA::test_T_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_advanced_indexing_assignment_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_advanced_indexing_nonview_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_as_strided_gradients_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_as_strided_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_as_strided_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_basic_indexing_ellipses_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_basic_indexing_newaxis_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_basic_indexing_slice_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_chunk_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_conj_imag_view_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_conj_imag_view_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_conj_view_with_shared_memory_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_contiguous_nonview_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_contiguous_self_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_diagonal_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_expand_as_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_expand_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_flatten_nonview_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_flatten_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_movedim_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_narrow_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_permute_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_real_imag_view_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_real_imag_view_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_reshape_as_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_reshape_nonview_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_reshape_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_select_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_bool, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_float16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_float32, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_float64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_int16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_int32, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_int64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_int8, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_bool, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_float16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_float32, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_float64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_int16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_int32, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_int64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_int8, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_split_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_squeeze_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_squeeze_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_t_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_t_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_transpose_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_transpose_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unbind_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unbind_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unfold_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unsqueeze_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unsqueeze_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_complex_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_real_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_real_cuda_complex32, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_real_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_copy_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_copy_out_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_copy_output_contiguous_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_view_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_T_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_as_strided_overflow_storage_offset_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_gradient_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_big_transpose_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_shapes_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_tensors_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_chunk_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_conj_neg_view_numpy_error_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_contiguous_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_crow_col_indices_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_empty_reshape_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_expand_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_flatten_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_memory_format_resize__cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_memory_format_resize_as_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_narrow_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_narrow_tensor_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_python_types_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_ravel_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_bfloat16, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_resize_all_dtypes_and_devices_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_resize_as_all_dtypes_and_devices_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_resize_as_preserves_strides_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_resize_overflow_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_split_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_t_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_errors_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_invalid_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_invalid_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_invalid_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_vs_numpy_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_vs_numpy_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_vs_numpy_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_bfloat16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_bfloat16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_unsqueeze_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_view_all_dtypes_and_devices_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_view_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_view_empty_cuda
2025-12-04T17:12:07.4918829Z 
2025-12-04T17:12:07.4919117Z Finished test_view_ops 1/1 ... [2025-12-04 17:12:07.472419][28755.85531383], took 0.38min
2025-12-04T17:12:07.5112374Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_view_ops/test_view_ops-f5d6b3525797eb50.xml
2025-12-04T17:12:07.6225157Z Running test_type_info 1/1 ... [2025-12-04 17:12:07.622224][28756.005117232]
2025-12-04T17:12:07.6225676Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:12:07.6228549Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_type_info.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:12:07.622649]
2025-12-04T17:12:13.0951746Z 
2025-12-04T17:12:13.0952654Z test_type_info 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_type_info_1.1_9ab09808df8277a9_.log
2025-12-04T17:12:13.0954924Z Running 5 items in this shard: test/test_type_info.py::TestDTypeInfo::test_finfo, test/test_type_info.py::TestDTypeInfo::test_iinfo, test/test_type_info.py::TestDTypeInfo::test_invalid_input, test/test_type_info.py::TestDTypeInfo::test_to_complex, test/test_type_info.py::TestDTypeInfo::test_to_real
2025-12-04T17:12:13.0956397Z 
2025-12-04T17:12:13.0956689Z Finished test_type_info 1/1 ... [2025-12-04 17:12:13.094942][28761.477839247], took 0.09min
2025-12-04T17:12:13.1340749Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_type_info/test_type_info-22600993e111f6f2.xml
2025-12-04T17:12:13.1678215Z Running functorch/test_aotdispatch 1/1 ... [2025-12-04 17:12:13.167527][28761.550421028]
2025-12-04T17:12:13.1678829Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:12:13.1682193Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_aotdispatch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:12:13.167957]
2025-12-04T17:14:34.5974427Z 
2025-12-04T17:14:34.5976097Z functorch/test_aotdispatch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_aotdispatch_1.1_9b74bc936a6dcdae_.log
2025-12-04T17:14:34.6518785Z Running 537 items in this shard: test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_autocast_disable_guard, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_data, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_forward_inputs, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_forward_inputs_create_graph, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_on_grad_out, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_pass_autocast_custom, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_pass_autocast_off, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_pass_autocast_on, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_batch_norm_amp, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_batchnorm_inference, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_buffer_batch_norm, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_buffer_copied_in_graph, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_buffer_copied_in_graph_with_different_shapes, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_compilation_context, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_complex_linear, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_composite_impl_compile, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_custom_autograd, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_custom_tensor_metadata, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_default_partitioner_saves_symints_not_tensors_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dupe_arg, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dupe_arg_returned_as_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dupe_arg_torture, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_duplicated_arguments_on_tensor_overlap, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dynamic_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dynamic_shape_output_not_in_bw_graph, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_embedding_bag_view_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_fw_bw_mutation_no_functionalization1, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_fw_bw_mutation_no_functionalization2, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_grad_context, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_inference_mode, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_inner_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_aliased_with_mutation_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_data_and_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_data_and_metadata_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_inplace_requires_grad_true, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_metadata_mutation_aliases, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_alias_everything, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_and_none_require_gradients, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_and_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_bases_out_of_order, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_other_input2, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_and_output_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_false_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_hidden_from_autograd_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_is_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_metadata2, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_modifies_autograd_meta_of_aliases, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_output_view_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_requires_grad_detach, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_requires_grad_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_requires_grad_no_grad_detach_mixed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_requires_grad_no_grad_inference_graph, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_return, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_set__input_mutation, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_set__nop, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_simple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_simple_with_none_and_nontensor, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_storage_resize_before_set_, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_storage_resize_down, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_storage_resize_down_and_set_, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_storage_resize_up, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_output_aliase_custom_autograd_function, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_output_view_metadata_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_output_view_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_output_view_simple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_dupe, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_dupe_fake, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_dupe_left_bias, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_requires_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_requires_grad_fake, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_list_codegen, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mark_activations_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mark_activations_dynamic_with_nested, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mark_outputs_dynamic_use_autograd_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mark_outputs_dynamic_use_autograd_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mem_leak_from_save_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_module, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_multi_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_multi_output_list, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mutates_input_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses_complicated_inps, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses_complicated_inps_mixed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses_non_homogenous, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses_non_nested_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_new_inp_requires_grad_now, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_no_grad_input_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_non_tensor_and_none_inputs, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nonidempotent_amp, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_input_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_input_multi_output_view_should_raise_autograd_error, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_and_returned, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_and_returned_different_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_and_returned_flipped, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_inplace_view_and_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_inplace_view_with_detach, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_multiple_mixed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_mutation_linear, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_returned_multiple_times, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_single, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_multiple_inputs_get_correct_one, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_output_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_all_alias_types, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_dict, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_op_depending_on_symint, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_outputs_are_aliased, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_real_weights_in_symbolic_mode, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_real_weights_in_symbolic_mode_with_inplace_ops, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_saved_tensors_hooks_mutations_raise, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_set__and_data_mutation_bad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_set__and_data_mutation_good, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_set__not_allowed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_set__steals_view_chain, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_single_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_some_output_requires_grad_input_doesnt, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_some_outputs_dont_require_grad_non_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_some_outputs_dont_require_grad_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_squeeze_mutation, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_subclass_metadata_mutation_req_grad_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_subclass_metadata_mutation_req_grad_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_subclasses_mixed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_subclasses_mixed_mode, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_synthetic_base_base_attribute_is_none, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_view_and_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_view_detach, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_ban_dropout_mut_pre_dispatch, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_forward_mutation_multiple_mut, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_forward_mutation_no_buffer_mut, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_functionalized_rng_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_input_dupes_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_input_mutation, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_input_mutation_on_input_requiring_grad_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_input_mutation_on_parameter_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_metadata_mutation_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_module_joint, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_multiple_outputs_require_grad_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_buffer_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_composite_implicit_inplace, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_composite_implicit_linear, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_contiguous, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_conv_and_bn, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_func_composite_implicit, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_func_simple, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_func_view, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_map_1, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_map_2, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_outdtype, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_reshape, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_with_autograd_op, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_with_cond, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_with_cond_nested, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_simplified_basic, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_simplified_pytrees_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_synthetic_bases_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_unbacked_arg, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_with_torch_cond, test/functorch/test_aotdispatch.py::TestPartitioning::test_autocast, test/functorch/test_aotdispatch.py::TestPartitioning::test_contiguous, test/functorch/test_aotdispatch.py::TestPartitioning::test_custom_partitioner_fn, test/functorch/test_aotdispatch.py::TestPartitioning::test_default_partitioner_getitem, test/functorch/test_aotdispatch.py::TestPartitioning::test_default_partitioner_output_tensor_shape_tensor, test/functorch/test_aotdispatch.py::TestPartitioning::test_generate_gives_inference_graph, test/functorch/test_aotdispatch.py::TestPartitioning::test_meta_tensor_inplace_op, test/functorch/test_aotdispatch.py::TestPartitioning::test_min_cut_partitioner, test/functorch/test_aotdispatch.py::TestPartitioning::test_min_cut_partitioner_output_tensor_shape_tensor, test/functorch/test_aotdispatch.py::TestPartitioning::test_min_cut_partitioner_raise_getitems, test/functorch/test_aotdispatch.py::TestPartitioning::test_min_cut_partitioner_save_shape, test/functorch/test_aotdispatch.py::TestPartitioning::test_preserve_random, test/functorch/test_aotdispatch.py::TestPartitioning::test_quantize_activation_duplicate_nodes, test/functorch/test_aotdispatch.py::TestPartitioning::test_recompute_partitioning, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_incorrect_backward, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_inference, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_input_data_and_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_input_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_input_mutation, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_input_mutation_and_output_alias, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_output_alias, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_output_requires_grad_in_no_grad, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_output_requires_grad_in_no_grad_views, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_simple, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified_dynamic, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified_fake_tensor_gm_raises, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified_preserves_stack_trace, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified_preserves_stack_trace_from_mutation, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_test_subclasses_with_tensor_factories, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_flex_attn_noncontiguous_tangents, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_grads_no_force_contiguous_dense, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_grads_no_force_contiguous_nested_subclass, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_grads_no_force_contiguous_nested_tensor_tangent, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_grads_no_force_contiguous_subclass, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_inductor_freezing_with_subclasses, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_inference_python_dispatcher, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_layer_norm, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_lift_fresh_copy_in_graph, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_False_test_subclasses_False_device_cpu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_False_test_subclasses_False_device_cuda, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_False_test_subclasses_True_device_cpu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_False_test_subclasses_True_device_cuda, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_True_test_subclasses_False_device_cpu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_True_test_subclasses_False_device_cuda, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_True_test_subclasses_True_device_cpu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_True_test_subclasses_True_device_cuda, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rms_norm, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rrelu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rrelu_with_noise_mutation, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_base_saved_tensors_hooks_filtering_mode_all, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_base_saved_tensors_hooks_filtering_mode_donated, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_base_saved_tensors_hooks_filtering_mode_no_static, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_donated_buffers, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_params, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_recompile, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_subclass_parameters, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_subclass_parameters_torture_case, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_tangent_type_coercion, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_wrong_guess_tangent_type, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_autocast_disable_guard, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_data, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_forward_inputs, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_forward_inputs_create_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_on_grad_out, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_pass_autocast_custom, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_pass_autocast_off, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_pass_autocast_on, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_batch_norm_amp, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_batchnorm_inference, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_buffer_batch_norm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_buffer_copied_in_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_buffer_copied_in_graph_with_different_shapes, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_compilation_context, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_complex_linear, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_composite_impl_compile, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_custom_autograd, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_custom_tensor_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_default_partitioner_saves_symints_not_tensors_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dupe_arg, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dupe_arg_returned_as_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dupe_arg_torture, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_duplicated_arguments_on_tensor_overlap, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dynamic_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dynamic_shape_output_not_in_bw_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_embedding_bag_view_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_fw_bw_mutation_no_functionalization1, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_fw_bw_mutation_no_functionalization2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_grad_context, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_inference_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_inner_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_aliased_with_mutation_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_data_and_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_data_and_metadata_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_inplace_requires_grad_true, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_metadata_mutation_aliases, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_alias_everything, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_and_none_require_gradients, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_and_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_bases_out_of_order, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_other_input2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_and_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_false_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_hidden_from_autograd_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_is_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_metadata2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_modifies_autograd_meta_of_aliases, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_output_view_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_requires_grad_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_requires_grad_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_requires_grad_no_grad_detach_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_requires_grad_no_grad_inference_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_return, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_set__input_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_set__nop, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_simple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_simple_with_none_and_nontensor, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_storage_resize_before_set_, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_storage_resize_down, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_storage_resize_down_and_set_, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_storage_resize_up, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_output_aliase_custom_autograd_function, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_output_view_metadata_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_output_view_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_output_view_simple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_inputs_overlapping_unsqueeze_with_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_inputs_overlapping_with_mutation_guard_base, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_dupe, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_dupe_fake, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_dupe_left_bias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_requires_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_requires_grad_fake, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_list_codegen, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mark_activations_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mark_activations_dynamic_with_nested, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mark_outputs_dynamic_use_autograd_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mark_outputs_dynamic_use_autograd_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mem_leak_from_save_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_module, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_multi_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_multi_output_list, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mutates_input_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mutation_of_input_in_fw_and_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mutations_in_bw_detached_from_tangent, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses_complicated_inps, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses_complicated_inps_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses_non_homogenous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses_non_nested_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_new_inp_requires_grad_now, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_no_grad_input_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_non_tensor_and_none_inputs, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nonidempotent_amp, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_input_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_input_multi_output_view_should_raise_autograd_error, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_and_returned, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_and_returned_different_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_and_returned_flipped, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_inplace_view_and_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_inplace_view_with_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_multiple_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_mutation_linear, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_returned_multiple_times, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_single, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_multiple_inputs_get_correct_one, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_output_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_all_alias_types, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_dict, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_op_depending_on_symint, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_outputs_are_aliased, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_real_weights_in_symbolic_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_real_weights_in_symbolic_mode_with_inplace_ops, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_saved_tensors_hooks_mutations_raise, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_set__and_data_mutation_bad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_set__and_data_mutation_good, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_set__not_allowed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_set__steals_view_chain, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_single_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_some_output_requires_grad_input_doesnt, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_some_outputs_dont_require_grad_non_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_some_outputs_dont_require_grad_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_squeeze_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_subclass_metadata_mutation_req_grad_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_subclass_metadata_mutation_req_grad_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_subclasses_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_subclasses_mixed_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_synthetic_base_base_attribute_is_none, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_view_and_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_view_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_autocast_disable_guard, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_data, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_forward_inputs, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_forward_inputs_create_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_on_grad_out, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_pass_autocast_custom, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_pass_autocast_off, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_pass_autocast_on, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_batch_norm_amp, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_batchnorm_inference, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_buffer_batch_norm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_buffer_copied_in_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_buffer_copied_in_graph_with_different_shapes, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_compilation_context, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_complex_linear, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_composite_impl_compile, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_custom_autograd, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_custom_tensor_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_default_partitioner_saves_symints_not_tensors_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dupe_arg, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dupe_arg_returned_as_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dupe_arg_torture, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_duplicated_arguments_on_tensor_overlap, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dynamic_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dynamic_shape_output_not_in_bw_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_embedding_bag_view_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_fw_bw_mutation_no_functionalization1, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_fw_bw_mutation_no_functionalization2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_grad_context, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inference_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inner_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_aliased_with_mutation_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_data_and_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_data_and_metadata_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_inplace_requires_grad_true, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_metadata_mutation_aliases, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_alias_everything, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_and_none_require_gradients, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_and_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_bases_out_of_order, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_other_input2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_and_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_false_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_hidden_from_autograd_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_is_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_metadata2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_modifies_autograd_meta_of_aliases, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_output_view_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_requires_grad_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_requires_grad_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_requires_grad_no_grad_detach_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_requires_grad_no_grad_inference_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_return, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_set__input_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_set__nop, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_simple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_simple_with_none_and_nontensor, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_storage_resize_before_set_, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_storage_resize_down, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_storage_resize_down_and_set_, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_storage_resize_up, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_output_aliase_custom_autograd_function, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_output_view_metadata_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_output_view_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_output_view_simple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inputs_overlapping_unsqueeze_with_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inputs_overlapping_with_mutation_guard_base, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_dupe, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_dupe_fake, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_dupe_left_bias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_requires_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_requires_grad_fake, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_list_codegen, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mark_activations_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mark_activations_dynamic_with_nested, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mark_outputs_dynamic_use_autograd_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mark_outputs_dynamic_use_autograd_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mem_leak_from_save_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_module, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_multi_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_multi_output_list, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mutates_input_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mutation_of_input_in_fw_and_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mutations_in_bw_detached_from_tangent, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses_complicated_inps, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses_complicated_inps_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses_non_homogenous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses_non_nested_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_new_inp_requires_grad_now, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_no_grad_input_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_non_tensor_and_none_inputs, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nonidempotent_amp, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_input_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_input_multi_output_view_should_raise_autograd_error, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_and_returned, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_and_returned_different_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_and_returned_flipped, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_inplace_view_and_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_inplace_view_with_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_multiple_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_mutation_linear, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_returned_multiple_times, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_single, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_multiple_inputs_get_correct_one, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_output_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_all_alias_types, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_dict, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_op_depending_on_symint, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_outputs_are_aliased, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_real_weights_in_symbolic_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_real_weights_in_symbolic_mode_with_inplace_ops, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_saved_tensors_hooks_mutations_raise, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_set__and_data_mutation_bad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_set__and_data_mutation_good, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_set__not_allowed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_set__steals_view_chain, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_single_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_some_output_requires_grad_input_doesnt, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_some_outputs_dont_require_grad_non_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_some_outputs_dont_require_grad_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_squeeze_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_subclass_metadata_mutation_req_grad_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_subclass_metadata_mutation_req_grad_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_subclasses_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_subclasses_mixed_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_synthetic_base_base_attribute_is_none, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_view_and_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_view_detach
2025-12-04T17:14:34.6867134Z 
2025-12-04T17:14:34.6867549Z Finished functorch/test_aotdispatch 1/1 ... [2025-12-04 17:14:34.598421][28902.981313059], took 2.36min
2025-12-04T17:14:34.6868910Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_aotdispatch/functorch.test_aotdispatch-efb7e0b79840fa38.xml
2025-12-04T17:14:34.7453241Z Running test_scatter_gather_ops 1/1 ... [2025-12-04 17:14:34.745038][28903.127932174]
2025-12-04T17:14:34.7453846Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:14:34.7457203Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_scatter_gather_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:14:34.745477]
2025-12-04T17:14:57.9962995Z 
2025-12-04T17:14:57.9963978Z test_scatter_gather_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_scatter_gather_ops_1.1_f3de59c3735d2471_.log
2025-12-04T17:14:58.0000109Z Running 76 items in this shard: test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_backward_with_empty_index_tensor_sparse_grad_False_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_backward_with_empty_index_tensor_sparse_grad_False_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_backward_with_empty_index_tensor_sparse_grad_True_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_backward_with_empty_index_tensor_sparse_grad_True_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_bool_cuda_bool, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_expanded_index_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_expanded_index_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_expanded_index_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_large_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_large_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__reductions_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__reductions_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__scalar_cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__scalar_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__scalar_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add__cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add__cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add__cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add_broadcasted_index_deterministic_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add_mult_index_base_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_expanded_index_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_expanded_index_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_expanded_index_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_expanded_index_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_uint8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_uint8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_uint8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_uint8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_complex128, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_uint8
2025-12-04T17:14:58.0035030Z 
2025-12-04T17:14:58.0035374Z Finished test_scatter_gather_ops 1/1 ... [2025-12-04 17:14:57.996119][28926.379014958], took 0.39min
2025-12-04T17:14:58.0368430Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_scatter_gather_ops/test_scatter_gather_ops-5274bd99c8a0619f.xml
2025-12-04T17:14:58.1148029Z Running test_cuda_multigpu 1/1 ... [2025-12-04 17:14:58.114496][28926.49739045]
2025-12-04T17:14:58.1148576Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:14:58.1151854Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cuda_multigpu.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:14:58.114943]
2025-12-04T17:15:06.0409023Z 
2025-12-04T17:15:06.0410137Z test_cuda_multigpu 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cuda_multigpu_1.1_5809c25d23c9a947_.log
2025-12-04T17:15:06.0432121Z Running 61 items in this shard: test/test_cuda_multigpu.py::TestCudaMultiGPU::test_autogpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_caching_pinned_memory_multi_gpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cat_autogpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_copy_device, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_copy_streams, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cuda_device_memory_allocated, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cuda_init_race, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cuda_memory_leak_detection, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cuda_set_device, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cuda_synchronize, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_current_stream, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_default_stream, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_events_multi_gpu_elapsed_time, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_events_multi_gpu_query, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_events_wait, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_external_streams, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_external_streams_multi_device, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_get_set_rng_state_all, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_grad_scaling_device_as_key, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_grad_scaling_multigpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_grad_scaling_scale, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_load_nonexistent_device, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_mem_get_info, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_memory_stats, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_memory_stats_multigpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_multigpu_serialization_remap, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_multigpu_serialization_remap_dict, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_multigpu_storage_clone, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_new, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_rng_state_offset, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_stream_context, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_stream_event_device, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_stream_event_nogil, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_streaming_backwards_device_transfer, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_streams_multi_gpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_streams_multi_gpu_eq, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_streams_multi_gpu_query, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_streams_priority, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_tensor_device, test/test_cuda_multigpu.py::TestCudaComm::test_broadcast_coalesced, test/test_cuda_multigpu.py::TestCudaComm::test_broadcast_coalesced_dense_only, test/test_cuda_multigpu.py::TestCudaComm::test_broadcast_coalesced_empty_tensors, test/test_cuda_multigpu.py::TestCudaComm::test_broadcast_cpu, test/test_cuda_multigpu.py::TestCudaComm::test_broadcast_gpu, test/test_cuda_multigpu.py::TestCudaComm::test_gather, test/test_cuda_multigpu.py::TestCudaComm::test_gather_dim, test/test_cuda_multigpu.py::TestCudaComm::test_gather_namedtuple, test/test_cuda_multigpu.py::TestCudaComm::test_gather_neg_dim, test/test_cuda_multigpu.py::TestCudaComm::test_memory_format_scatter_gather, test/test_cuda_multigpu.py::TestCudaComm::test_reduce_add, test/test_cuda_multigpu.py::TestCudaComm::test_reduce_add_coalesced, test/test_cuda_multigpu.py::TestCudaComm::test_reduce_add_coalesced_dense_only, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_cpu, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_cpu_dim, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_cpu_neg_dim, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_cpu_sizes, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_gpu, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_gpu_dim, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_gpu_neg_dim, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_gpu_sizes, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_namedtuple
2025-12-04T17:15:06.0452719Z 
2025-12-04T17:15:06.0453046Z Finished test_cuda_multigpu 1/1 ... [2025-12-04 17:15:06.040758][28934.423651614], took 0.13min
2025-12-04T17:15:06.0808021Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_multigpu/test_cuda_multigpu-a9a26e79d8868522.xml
2025-12-04T17:15:06.1535451Z Running torch_np/numpy_tests/lib/test_index_tricks 1/1 ... [2025-12-04 17:15:06.153137][28934.536030664]
2025-12-04T17:15:06.1536153Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:15:06.1538189Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_index_tricks.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:15:06.153567]
2025-12-04T17:15:12.0768955Z 
2025-12-04T17:15:12.0770124Z torch_np/numpy_tests/lib/test_index_tricks 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_index_tricks_1.1_e2c692e766f99011_.log
2025-12-04T17:15:12.0792230Z Running 47 items in this shard: test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_0d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_big_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_clipmodes, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_dtypes, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_clip, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_raise, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_wrap, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_unravel, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_writeability, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_longdouble, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_npcomplexfloating, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_npfloating, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_linspace_equivalence, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_mgrid_size_none_handling_start0_stop_10_step0_expected0, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_mgrid_size_none_handling_start_-10_stop_20_step1_expected1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_nd, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_sparse, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_0d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_1d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_2d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_complex_step, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_mixed_type, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_more_mixed_type, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestNdenumerate::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIndexExpression::test_regression_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIndexExpression::test_simple_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_1d_only, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_bool, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_regression_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_repeated_input, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_shape_and_dtype, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestC::test_c_, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_hetero_shape_handling, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_low_dim_handling, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_operate_4d_array, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_tall_matrix, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_tall_matrix_wrap, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_wide_matrix, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndices::test_diag_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_diag_indices_from, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_error_shape_mismatch, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_error_small_input, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestNdIndex::test_ndindex
2025-12-04T17:15:12.0812806Z 
2025-12-04T17:15:12.0813248Z Finished torch_np/numpy_tests/lib/test_index_tricks 1/1 ... [2025-12-04 17:15:12.076737][28940.459632773], took 0.10min
2025-12-04T17:15:12.1170680Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_index_tricks/torch_np.numpy_tests.lib.test_index_tricks-43f32c31fbfc43cd.xml
2025-12-04T17:15:12.1501175Z Running test_jit_autocast 1/1 ... [2025-12-04 17:15:12.149853][28940.532747438]
2025-12-04T17:15:12.1501698Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:15:12.1504812Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_jit_autocast.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:15:12.150262]
2025-12-04T17:15:31.6997373Z 
2025-12-04T17:15:31.6998320Z test_jit_autocast 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_jit_autocast_1.1_9af7b4b8017e3406_.log
2025-12-04T17:15:31.7016759Z Running 54 items in this shard: test/test_jit_autocast.py::TestAutocast::test_autocast_api, test/test_jit_autocast.py::TestAutocast::test_autocast_api_not_supported, test/test_jit_autocast.py::TestAutocast::test_autocast_autodiff, test/test_jit_autocast.py::TestAutocast::test_autocast_decorator, test/test_jit_autocast.py::TestAutocast::test_autocast_decorator_outside_jit, test/test_jit_autocast.py::TestAutocast::test_autocast_mixed_dtypes, test/test_jit_autocast.py::TestAutocast::test_callees, test/test_jit_autocast.py::TestAutocast::test_callees_with_autocast_off, test/test_jit_autocast.py::TestAutocast::test_callees_with_autocast_on, test/test_jit_autocast.py::TestAutocast::test_conditional_autocast, test/test_jit_autocast.py::TestAutocast::test_control_flow, test/test_jit_autocast.py::TestAutocast::test_divergent_autocast, test/test_jit_autocast.py::TestAutocast::test_divergent_types, test/test_jit_autocast.py::TestAutocast::test_duplicate_inputs, test/test_jit_autocast.py::TestAutocast::test_eager_and_script, test/test_jit_autocast.py::TestAutocast::test_explicit_casts, test/test_jit_autocast.py::TestAutocast::test_fp32_policy, test/test_jit_autocast.py::TestAutocast::test_fp32_policy_with_fp64, test/test_jit_autocast.py::TestAutocast::test_fp32_set_opt_dtype_policy, test/test_jit_autocast.py::TestAutocast::test_fp32_set_opt_dtype_policy_fp64, test/test_jit_autocast.py::TestAutocast::test_ignore_amp, test/test_jit_autocast.py::TestAutocast::test_implicitly_nested_autocast, test/test_jit_autocast.py::TestAutocast::test_inplace, test/test_jit_autocast.py::TestAutocast::test_jit_autocast_softmax_cpu, test/test_jit_autocast.py::TestAutocast::test_jit_autocast_softmax_gpu, test/test_jit_autocast.py::TestAutocast::test_jit_call_method_under_autocast, test/test_jit_autocast.py::TestAutocast::test_jit_executor_under_autocast, test/test_jit_autocast.py::TestAutocast::test_jit_freeze_autocast_basic, test/test_jit_autocast.py::TestAutocast::test_jit_freeze_autocast_constants, test/test_jit_autocast.py::TestAutocast::test_jit_generic_autocast, test/test_jit_autocast.py::TestAutocast::test_linear_bf16, test/test_jit_autocast.py::TestAutocast::test_minimal, test/test_jit_autocast.py::TestAutocast::test_minimal_cpu, test/test_jit_autocast.py::TestAutocast::test_minimal_off, test/test_jit_autocast.py::TestAutocast::test_nested_autocast, test/test_jit_autocast.py::TestAutocast::test_promote_policy, test/test_jit_autocast.py::TestAutocast::test_promote_policy_fp64, test/test_jit_autocast.py::TestAutocast::test_reused_autocast, test/test_jit_autocast.py::TestAutocast::test_reused_autocast_expr, test/test_jit_autocast.py::TestAutocast::test_runtime_autocast_state, test/test_jit_autocast.py::TestAutocast::test_runtime_autocast_state_expr, test/test_jit_autocast.py::TestAutocast::test_script_and_tracing, test/test_jit_autocast.py::TestAutocast::test_script_and_tracing_with_autocast, test/test_jit_autocast.py::TestAutocast::test_script_module, test/test_jit_autocast.py::TestAutocast::test_tracing_and_script, test/test_jit_autocast.py::TestAutocast::test_tracing_with_autocast_and_script, test/test_jit_autocast.py::TestJitTraceAutocast::test_cat_promote, test/test_jit_autocast.py::TestJitTraceAutocast::test_generate_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_nchw_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_nhwc_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_cpu, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_cuda, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_enable_and_check, test/test_jit_autocast.py::TestJitTraceAutocast::test_scripted_aliasing
2025-12-04T17:15:31.7034735Z 
2025-12-04T17:15:31.7035046Z Finished test_jit_autocast 1/1 ... [2025-12-04 17:15:31.699542][28960.082437457], took 0.33min
2025-12-04T17:15:31.7401890Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_jit_autocast/test_jit_autocast-9b5e22ff1077135a.xml
2025-12-04T17:15:31.8234723Z Running nn/test_pooling 1/1 ... [2025-12-04 17:15:31.823144][28960.206037859]
2025-12-04T17:15:31.8235491Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:15:31.8238186Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_pooling.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:15:31.823580]
2025-12-04T17:15:47.0613491Z 
2025-12-04T17:15:47.0614566Z nn/test_pooling 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_pooling_1.1_e8e935ea909a1883_.log
2025-12-04T17:15:47.0682844Z Running 147 items in this shard: test/nn/test_pooling.py::TestAvgPool::test_avg_pool1d_ceil_mode, test/nn/test_pooling.py::TestAvgPool::test_avg_pool2d_ceil_mode, test/nn/test_pooling.py::TestAvgPool::test_avg_pool3d_ceil_mode, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool2d, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool2d_with_divisor, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool3d, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool3d_with_divisor, test/nn/test_pooling.py::TestPoolingNN::test_MaxUnpool2d_output_size, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_avg_pooling_nhwc_overflow, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_avg_pooling_overflow, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc_launch_config_backward, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc_launch_config_forward, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc_non_contiguous, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_lower_precision, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_size_none, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_size_overflow, test/nn/test_pooling.py::TestPoolingNN::test_max_unpool, test/nn/test_pooling.py::TestPoolingNN::test_max_unpool2d_nhwc_cpu, test/nn/test_pooling.py::TestPoolingNN::test_max_unpool3d_input_check, test/nn/test_pooling.py::TestPoolingNN::test_quantized_max_pool1d_empty_kernel, test/nn/test_pooling.py::TestPoolingNN::test_quantized_max_pool3d, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool_zero_batch_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AvgPool2d_empty_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AvgPool3d_backward_after_cat_dim1_device_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool2d_zero_batch_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool2d_zero_out_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool2d_zero_samples_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_errors_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_zero_batch_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_zero_out_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_zero_samples_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_LPPool1d_kernel_size_overflow_large_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_errors_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool_zero_batch_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case10_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case1_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case2_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case3_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case4_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case5_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case6_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case7_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case8_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case9_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_invalid_output_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_zero_batch_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_avg_pool2d_output_size_one_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_avg_pool3d_output_size_one_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_avg_pooling_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_max_pooling_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pool_odd_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_max_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_max_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int8, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_uint8, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_zero_batch_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_zero_batch_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_nhwc_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_reduced_floating_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_reduced_floating_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool2d_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool2d_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool3d_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool_nan_inf_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool_nan_inf_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool_nan_inf_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_corner_cases_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_corner_cases_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_corner_cases_cuda_int32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_corner_cases_cuda_int64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_indices_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_nhwc_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_with_indices_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool3d_ndhwc_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool3d_ndhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool3d_ndhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_bfloat16_half_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_bfloat16_half_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_nan_inf_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_nan_inf_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_nan_inf_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_unpool_invalid_indices_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool3d_non_square_backward_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool3d_large_size_int64_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool3d_size_one_feature_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_bfloat16_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_large_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_max_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_max_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_avg_pooling_dims_1_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_avg_pooling_dims_2_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_avg_pooling_dims_3_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_max_pooling_dims_1_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_max_pooling_dims_2_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_max_pooling_dims_3_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_zero_stride_cuda
2025-12-04T17:15:47.0750283Z 
2025-12-04T17:15:47.0750661Z Finished nn/test_pooling 1/1 ... [2025-12-04 17:15:47.061345][28975.444240019], took 0.25min
2025-12-04T17:15:47.1015471Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_pooling/nn.test_pooling-2151df52b065bbdf.xml
2025-12-04T17:15:47.1855714Z Running nn/test_embedding 1/1 ... [2025-12-04 17:15:47.185235][28975.568129139]
2025-12-04T17:15:47.1856276Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:15:47.1859353Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_embedding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:15:47.185684]
2025-12-04T17:16:06.4275801Z 
2025-12-04T17:16:06.4276905Z nn/test_embedding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_embedding_1.1_dc9119745a665b44_.log
2025-12-04T17:16:06.4368478Z Running 156 items in this shard: test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_bag_from_pretrained, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_bag_from_pretrained_padding_idx, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_bag_functional, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_bag_padding_idx_error, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_float32, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_float64, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_int16, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_int32, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_int64, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_int8, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_options, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_padding_idx, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_uint8, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_functional, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_max_norm, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_max_norm_unsorted_repeating_indices, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_sparse_basic, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_sparse_empty_tensor, test/nn/test_embedding.py::TestEmbeddingNN::test_embeddingbag_2d_include_last_offset, test/nn/test_embedding.py::TestEmbeddingNN::test_embeddingbag_from_pretrained, test/nn/test_embedding.py::TestEmbeddingNN::test_embeddingbag_from_pretrained_options, test/nn/test_embedding.py::TestEmbeddingNN::test_embeddingbag_include_last_offset, test/nn/test_embedding.py::TestEmbeddingNN::test_large_tensors, test/nn/test_embedding.py::TestEmbeddingNN::test_move_sparse_half_embedding, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_failures_cuda_int32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_failures_cuda_int32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_failures_cuda_int64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_failures_cuda_int64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_backward_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_backward_cuda_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_backward_large_batch_overflow_cuda_bfloat16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_1D_padding_idx_cuda_bfloat16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_1D_padding_idx_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_2D_padding_idx_cuda_bfloat16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_2D_padding_idx_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_bfloat16_cuda_int32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_bfloat16_cuda_int32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_bfloat16_cuda_int64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_bfloat16_cuda_int64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_dimension_errors_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_empty_input_cuda_int32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_empty_input_cuda_int32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_empty_input_cuda_int64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_empty_input_cuda_int64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_half_cuda_int32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_half_cuda_int32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_half_cuda_int64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_half_cuda_int64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_max_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_max_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_max_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_max_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_mean_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_mean_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_mean_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_mean_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_sum_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_sum_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_sum_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_sum_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_max_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_max_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_max_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_max_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_mean_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_mean_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_mean_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_mean_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_sum_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_sum_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_sum_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_sum_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_per_sample_weights_grad_bag_use_grad_False_per_sample_weights_use_grad_False_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_per_sample_weights_grad_bag_use_grad_False_per_sample_weights_use_grad_True_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_per_sample_weights_grad_bag_use_grad_True_per_sample_weights_use_grad_False_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_per_sample_weights_grad_bag_use_grad_True_per_sample_weights_use_grad_True_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_dense_grad_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_backward_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_backward_cuda_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_backward_cuda_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_device_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_device_cuda_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_device_cuda_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_fwd_AD_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_fwd_AD_cuda_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_fwd_AD_cuda_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_padding_idx_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_padding_idx_cuda_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_padding_idx_cuda_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_scalar_weight_error_cuda
2025-12-04T17:16:06.4459712Z 
2025-12-04T17:16:06.4460059Z Finished nn/test_embedding 1/1 ... [2025-12-04 17:16:06.427721][28994.810613375], took 0.32min
2025-12-04T17:16:06.4684201Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_embedding/nn.test_embedding-d055fd5d393643fe.xml
2025-12-04T17:16:06.5496435Z Running test_xnnpack_integration 1/1 ... [2025-12-04 17:16:06.549336][28994.932229904]
2025-12-04T17:16:06.5497024Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:16:06.5500360Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_xnnpack_integration.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:16:06.549802]
2025-12-04T17:16:19.7881778Z 
2025-12-04T17:16:19.7883439Z test_xnnpack_integration 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_xnnpack_integration_1.1_5b815e7820ba690d_.log
2025-12-04T17:16:19.7891915Z Running 12 items in this shard: test/test_xnnpack_integration.py::TestXNNPACKOps::test_conv2d, test/test_xnnpack_integration.py::TestXNNPACKOps::test_conv2d_transpose, test/test_xnnpack_integration.py::TestXNNPACKOps::test_linear, test/test_xnnpack_integration.py::TestXNNPACKOps::test_linear_1d_input, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_combined_model, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_conv2d, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_conv2d_transpose, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_linear, test/test_xnnpack_integration.py::TestXNNPACKRewritePass::test_decomposed_linear, test/test_xnnpack_integration.py::TestXNNPACKRewritePass::test_linear, test/test_xnnpack_integration.py::TestXNNPACKConv1dTransformPass::test_conv1d_basic, test/test_xnnpack_integration.py::TestXNNPACKConv1dTransformPass::test_conv1d_with_relu_fc
2025-12-04T17:16:19.7896712Z 
2025-12-04T17:16:19.7897078Z Finished test_xnnpack_integration 1/1 ... [2025-12-04 17:16:19.787918][29008.170811951], took 0.22min
2025-12-04T17:16:19.8288549Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_xnnpack_integration/test_xnnpack_integration-ed8e38bda9a33f4f.xml
2025-12-04T17:16:19.9144138Z Running test_cuda_trace 1/1 ... [2025-12-04 17:16:19.914076][29008.296970052]
2025-12-04T17:16:19.9144687Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:16:19.9148442Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cuda_trace.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:16:19.914570]
2025-12-04T17:17:54.4414203Z 
2025-12-04T17:17:54.4417305Z test_cuda_trace 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cuda_trace_1.1_70d30feb0b9acc89_.log
2025-12-04T17:17:54.4422470Z Running 12 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_all_trace_callbacks_called, test/test_cuda_trace.py::TestCudaTrace::test_device_synchronization_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_creation_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_deletion_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_record_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_synchronization_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_wait_callback, test/test_cuda_trace.py::TestCudaTrace::test_memcpy_synchronization, test/test_cuda_trace.py::TestCudaTrace::test_memory_allocation_callback, test/test_cuda_trace.py::TestCudaTrace::test_memory_deallocation_callback, test/test_cuda_trace.py::TestCudaTrace::test_stream_creation_callback, test/test_cuda_trace.py::TestCudaTrace::test_stream_synchronization_callback
2025-12-04T17:17:54.4427206Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_all_trace_callbacks_called
2025-12-04T17:17:54.4428152Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_device_synchronization_callback
2025-12-04T17:17:54.4429063Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_creation_callback
2025-12-04T17:17:54.4429941Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_deletion_callback
2025-12-04T17:17:54.4430806Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_record_callback
2025-12-04T17:17:54.4431710Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_synchronization_callback
2025-12-04T17:17:54.4432588Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_wait_callback
2025-12-04T17:17:54.4433450Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_memcpy_synchronization
2025-12-04T17:17:54.4434340Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_memory_allocation_callback
2025-12-04T17:17:54.4435333Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_memory_deallocation_callback
2025-12-04T17:17:54.4436235Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_stream_creation_callback
2025-12-04T17:17:54.4437165Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_stream_synchronization_callback
2025-12-04T17:17:54.4437706Z 
2025-12-04T17:17:54.4438017Z Finished test_cuda_trace 1/1 ... [2025-12-04 17:17:54.441447][29102.824340688], took 1.58min
2025-12-04T17:17:54.4828433Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-5e38c3c197506de5.xml
2025-12-04T17:17:54.5762360Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-4795d7c5159b6e03.xml
2025-12-04T17:17:54.6204411Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-6c76df2a5666e90f.xml
2025-12-04T17:17:54.6512720Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-c93df3ae687a8e58.xml
2025-12-04T17:17:54.6980158Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-524a9565fc6ac576.xml
2025-12-04T17:17:54.7320230Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-b18b3c9d4ddc6b34.xml
2025-12-04T17:17:54.7636999Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-319074c2014cbf3e.xml
2025-12-04T17:17:54.7940600Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-2483ba726355768c.xml
2025-12-04T17:17:54.8236701Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-f2eed9c29ea8eac7.xml
2025-12-04T17:17:54.8535114Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-c778cc218c519690.xml
2025-12-04T17:17:54.8945457Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-7897a7f1d03cdaa3.xml
2025-12-04T17:17:54.9231712Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-d4fb26045e698199.xml
2025-12-04T17:17:54.9627499Z Running torch_np/test_reductions 1/1 ... [2025-12-04 17:17:54.962504][29103.345397789]
2025-12-04T17:17:54.9628067Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:17:54.9631801Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_reductions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:17:54.962946]
2025-12-04T17:18:03.8908253Z 
2025-12-04T17:18:03.8909293Z torch_np/test_reductions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_reductions_1.1_b720aba5a84f607c_.log
2025-12-04T17:18:03.9397499Z Running 966 items in this shard: test/torch_np/test_reductions.py::TestFlatnonzero::test_basic, test/torch_np/test_reductions.py::TestAny::test_basic, test/torch_np/test_reductions.py::TestAny::test_method_vs_function, test/torch_np/test_reductions.py::TestAny::test_nd, test/torch_np/test_reductions.py::TestAll::test_basic, test/torch_np/test_reductions.py::TestAll::test_method_vs_function, test/torch_np/test_reductions.py::TestAll::test_nd, test/torch_np/test_reductions.py::TestMean::test_mean, test/torch_np/test_reductions.py::TestMean::test_mean_float16, test/torch_np/test_reductions.py::TestMean::test_mean_values, test/torch_np/test_reductions.py::TestMean::test_mean_where, test/torch_np/test_reductions.py::TestSum::test_sum, test/torch_np/test_reductions.py::TestSum::test_sum_boolean, test/torch_np/test_reductions.py::TestSum::test_sum_complex_1_dt0, test/torch_np/test_reductions.py::TestSum::test_sum_complex_1_dt1, test/torch_np/test_reductions.py::TestSum::test_sum_complex_2_dt0, test/torch_np/test_reductions.py::TestSum::test_sum_complex_2_dt1, test/torch_np/test_reductions.py::TestSum::test_sum_dtypes_2, test/torch_np/test_reductions.py::TestSum::test_sum_dtypes_warnings, test/torch_np/test_reductions.py::TestSum::test_sum_initial, test/torch_np/test_reductions.py::TestSum::test_sum_stability, test/torch_np/test_reductions.py::TestSum::test_sum_where, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func9, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_array_axis_func0, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_array_axis_func1, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_axis_bad_tuple_func0, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_axis_bad_tuple_func1, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_axis_empty_generic_func0, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_axis_empty_generic_func1, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_bad_axis_func0, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_bad_axis_func1
2025-12-04T17:18:03.9874376Z 
2025-12-04T17:18:03.9874753Z Finished torch_np/test_reductions 1/1 ... [2025-12-04 17:18:03.892288][29112.275179736], took 0.15min
2025-12-04T17:18:03.9876041Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.test_reductions/torch_np.test_reductions-73a2026a6cdfd4dc.xml
2025-12-04T17:18:04.0300891Z Running torch_np/numpy_tests/core/test_scalar_ctors 1/1 ... [2025-12-04 17:18:04.029755][29112.41264687]
2025-12-04T17:18:04.0301841Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:18:04.0305433Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_scalar_ctors.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:18:04.030275]
2025-12-04T17:18:09.8034036Z 
2025-12-04T17:18:09.8035288Z torch_np/numpy_tests/core/test_scalar_ctors 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_scalar_ctors_1.1_4168b5c3b3d7f9be_.log
2025-12-04T17:18:09.8067902Z Running 65 items in this shard: test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestFromString::test_bool, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestFromString::test_floating, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestFromString::test_floating_overflow, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestFromInt::test_intp, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestFromInt::test_uint64_from_negative, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_complex_t10_t20, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_complex_t10_t21, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_complex_t10_t22, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_complex_t11_t20, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_complex_t11_t21, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_complex_t11_t22, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_np_byte, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_np_int_, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_np_intc, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_np_longlong, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_np_short, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_t25, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_t26, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__np_byte, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__np_int_, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__np_intc, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__np_longlong, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__np_short, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__t25, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__t26, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_np_byte, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_np_int_, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_np_intc, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_np_longlong, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_np_short, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_t25, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_t26, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_np_byte, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_np_int_, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_np_intc, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_np_longlong, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_np_short, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_t25, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_t26, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_np_byte, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_np_int_, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_np_intc, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_np_longlong, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_np_short, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_t25, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_t26, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_np_byte, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_np_int_, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_np_intc, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_np_longlong, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_np_short, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_t25, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_t26, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t10_t20, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t10_t21, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t10_t22, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t10_t23, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t11_t20, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t11_t21, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t11_t22, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t11_t23, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t12_t20, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t12_t21, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t12_t22, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t12_t23
2025-12-04T17:18:09.8100010Z 
2025-12-04T17:18:09.8100529Z Finished torch_np/numpy_tests/core/test_scalar_ctors 1/1 ... [2025-12-04 17:18:09.803289][29118.186183807], took 0.10min
2025-12-04T17:18:09.8453747Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.core.test_scalar_ctors/torch_np.numpy_tests.core.test_scalar_ctors-e23576bcb06b5d61.xml
2025-12-04T17:18:09.8772530Z Running torch_np/numpy_tests/lib/test_arraypad 1/1 ... [2025-12-04 17:18:09.876979][29118.259874076]
2025-12-04T17:18:09.8773178Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:18:09.8776835Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_arraypad.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:18:09.877438]
2025-12-04T17:18:15.5505561Z 
2025-12-04T17:18:15.5506760Z torch_np/numpy_tests/lib/test_arraypad 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_arraypad_1.1_867803734c4a045d_.log
2025-12-04T17:18:15.5511644Z Running 9 items in this shard: test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant_float, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant_float2, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant_float3, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant_odd_pad_amount, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant_pad_2d, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant_zeros, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_large_integers, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_pad_empty_dimension
2025-12-04T17:18:15.5515678Z 
2025-12-04T17:18:15.5516099Z Finished torch_np/numpy_tests/lib/test_arraypad 1/1 ... [2025-12-04 17:18:15.550341][29123.933237058], took 0.09min
2025-12-04T17:18:15.5924931Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_arraypad/torch_np.numpy_tests.lib.test_arraypad-f4e46a1506be78e1.xml
2025-12-04T17:18:15.6260865Z Running test_prims 1/1 ... [2025-12-04 17:18:15.625817][29124.008710886]
2025-12-04T17:18:15.6261384Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:18:15.6264785Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_prims.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:18:15.626231]
2025-12-04T17:18:24.7540596Z 
2025-12-04T17:18:24.7541533Z test_prims 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_prims_1.1_8a7702ff07b7da5d_.log
2025-12-04T17:18:24.7551837Z Running 26 items in this shard: test/test_prims.py::TestPrimsBasic::test_check_deprecation_warning, test/test_prims.py::TestPrimsBasic::test_clone_complex, test/test_prims.py::TestPrimsBasic::test_clone_meta_stride_preservation_dense, test/test_prims.py::TestPrimsBasic::test_clone_meta_stride_preservation_sparse, test/test_prims.py::TestPrimsBasic::test_mul_complex, test/test_prims.py::TestPrimsBasic::test_torch_ops, test/test_prims.py::TestPrimsCUDA::test_aten_overload_to_prims_cuda, test/test_prims.py::TestPrimsCUDA::test_broadcast_in_dim_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_broadcast_in_dim_sum_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_cbrt_prim_cuda_float64, test/test_prims.py::TestPrimsCUDA::test_cbrt_prim_cuda_int64, test/test_prims.py::TestPrimsCUDA::test_collapse_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_functional_rng_wrappers_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_memory_format_strides_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_philox_rand_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_reshape_view_method_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_var_correction_0_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_var_correction_1_cuda_float32, test/test_prims.py::TestRefsCUDA::test_constant_pad_nd_memory_format_cuda_float32, test/test_prims.py::TestRefsCUDA::test_inferred_tags_cuda, test/test_prims.py::TestRefsCUDA::test_infinite_loop_from_py_dispatcher_cuda, test/test_prims.py::TestRefsCUDA::test_linspace_with_complex_input_cuda, test/test_prims.py::TestRefsCUDA::test_logspace_with_complex_input_cuda, test/test_prims.py::TestRefsCUDA::test_unbind_cuda, test/test_prims.py::TestDecompCUDA::test_decomposition_method_vararg_ones_cuda_float32, test/test_prims.py::TestDecompCUDA::test_decomposition_method_vararg_permute_cuda_float32
2025-12-04T17:18:24.7560877Z 
2025-12-04T17:18:24.7561151Z Finished test_prims 1/1 ... [2025-12-04 17:18:24.753865][29133.136760746], took 0.15min
2025-12-04T17:18:24.7967842Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_prims/test_prims-38188698633a9bb5.xml
2025-12-04T17:18:24.8860550Z Running test_spectral_ops 1/1 ... [2025-12-04 17:18:24.885676][29133.268569242]
2025-12-04T17:18:24.8861139Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:18:24.8863835Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_spectral_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:18:24.886128]
2025-12-04T17:19:02.6575822Z 
2025-12-04T17:19:02.6577337Z test_spectral_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_spectral_ops_1.1_434231ff814fe9e8_.log
2025-12-04T17:19:02.6717879Z Running 347 items in this shard: test/test_spectral_ops.py::TestFFTCUDA::test_batch_istft_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_complex_istft_real_equiv_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_complex_stft_definition_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_complex_stft_onesided_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_complex_stft_real_equiv_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_complex_stft_roundtrip_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_complex_stft_roundtrip_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_cufft_context_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_cufft_context_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_cufft_plan_cache_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ihfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ihfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ihfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ihfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ihfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ihfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_rfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_rfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_rfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_rfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_rfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_rfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ihfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ihfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ihfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ihfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ihfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ihfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_rfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_rfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_rfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_rfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_rfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_rfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_ifft_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_fft2_fftn_equivalence_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fft2_fftn_equivalence_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fft2_invalid_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_fft2_numpy_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_fft2_numpy_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_fft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_fft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_fftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_hfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_hfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_hfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_ifft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_ifft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_ifftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_ihfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_ihfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_ihfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_irfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_irfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_irfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_rfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_rfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_rfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_fft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_fft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_fftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_hfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_hfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_hfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_ifft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_ifft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_ifftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_ihfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_ihfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_ihfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_irfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_irfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_irfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_rfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_rfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_rfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_fft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_fft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_fft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_fft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_fftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_fftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_hfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_hfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_hfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_hfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_hfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_hfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ifft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ifft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ifft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ifft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ifftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ifftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ihfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ihfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ihfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_irfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_irfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_irfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_irfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_irfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_irfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_rfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_rfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_rfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_fft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_fft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_fft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_fft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_fftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_fftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_hfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_hfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_hfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_hfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_hfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_hfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ifft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ifft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ifft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ifft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ifftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ifftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ihfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ihfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ihfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_irfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_irfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_irfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_irfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_irfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_irfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_rfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_rfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_rfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_ifft_rfft_irfft_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fft_input_modification_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_fft_invalid_dtypes_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_fft_plan_repeatable_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_fft_round_trip_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_fft_round_trip_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_round_trip_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fft_round_trip_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_round_trip_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_round_trip_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_int8, test/test_spectral_ops.py::TestFFTCUDA::test_fftfreq_numpy_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftfreq_numpy_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fftfreq_out_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftfreq_out_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_fftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_fftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_hfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_hfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_ifftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_ifftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_ihfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_irfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_irfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_rfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_fftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_fftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_hfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_hfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_ifftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_ifftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_ihfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_irfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_irfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_rfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_noop_transform_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_noop_transform_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_noop_transform_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_noop_transform_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_noop_transform_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_round_trip_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_round_trip_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_round_trip_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_round_trip_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_round_trip_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_round_trip_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fftshift_frequencies_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftshift_frequencies_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fftshift_numpy_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_fftshift_numpy_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftshift_numpy_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftshift_numpy_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_hfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_hfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_hfftn_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_ihfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_ihfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_ihfftn_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_against_librosa_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_linearity_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_of_sine_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_requires_window_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_istft_round_trip_simple_cases_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_round_trip_various_params_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_round_trip_with_padding_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_throws_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_fft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_fft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_hfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_hfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_ifft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_ifft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_ihfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_irfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_irfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_rfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_fft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_fft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_hfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_hfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_ifft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_ifft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_ihfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_irfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_irfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_rfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_fftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_fftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_hfftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_hfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_ifftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_ifftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_irfftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_irfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_fftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_fftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_hfftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_hfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_ifftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_ifftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_irfftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_irfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_stft_align_to_window_only_requires_non_center_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_stft_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_stft_requires_complex_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_stft_requires_window_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_stft_roundtrip_complex_window_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_stft_roundtrip_complex_window_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_stft_window_device_cuda
2025-12-04T17:19:02.6856900Z 
2025-12-04T17:19:02.6857221Z Finished test_spectral_ops 1/1 ... [2025-12-04 17:19:02.657942][29171.040835748], took 0.63min
2025-12-04T17:19:02.7011963Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_spectral_ops/test_spectral_ops-ae86fbbf23286ef9.xml
2025-12-04T17:19:02.7813066Z Running test_autoload_disable 1/1 ... [2025-12-04 17:19:02.780994][29171.163888872]
2025-12-04T17:19:03.1536324Z Processing /var/lib/jenkins/workspace/test/cpp_extensions
2025-12-04T17:19:08.1219827Z   Preparing metadata (pyproject.toml) ... [?25l- done
2025-12-04T17:19:08.1242714Z [?25hBuilding wheels for collected packages: torch_test_cpp_extension
2025-12-04T17:20:47.6583179Z   Building wheel for torch_test_cpp_extension (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - done
2025-12-04T17:20:47.6959136Z [?25h  Created wheel for torch_test_cpp_extension: filename=torch_test_cpp_extension-0.0.0-cp310-cp310-linux_x86_64.whl size=13197897 sha256=cc46647de823d8374b96f8523d19d867e9d39561b9e921d3b31efe3d71f46120
2025-12-04T17:20:47.6961552Z   Stored in directory: /tmp/pip-ephem-wheel-cache-dyxdiszk/wheels/2b/79/8d/635cf291e138cfea331292ca746c62b61fade208eb55a7e3a1
2025-12-04T17:20:47.6982453Z Successfully built torch_test_cpp_extension
2025-12-04T17:20:48.3159868Z Installing collected packages: torch_test_cpp_extension
2025-12-04T17:20:48.5636691Z Successfully installed torch_test_cpp_extension-0.0.0
2025-12-04T17:20:52.8264014Z 
2025-12-04T17:20:52.8264441Z Running tests...
2025-12-04T17:20:52.8264859Z ----------------------------------------------------------------------
2025-12-04T17:20:54.7280014Z .
2025-12-04T17:20:54.7280439Z ----------------------------------------------------------------------
2025-12-04T17:20:54.7280916Z Ran 1 test in 1.902s
2025-12-04T17:20:54.7281113Z 
2025-12-04T17:20:54.7281204Z OK
2025-12-04T17:20:54.7281341Z 
2025-12-04T17:20:54.7281462Z Generating XML reports...
2025-12-04T17:20:55.7018699Z Finished test_autoload_disable 1/1 ... [2025-12-04 17:20:55.701457][29284.084344964], took 1.88min
2025-12-04T17:20:55.7449158Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-unittest/test_autoload/TEST-TestDeviceBackendAutoload-20251204172052.xml
2025-12-04T17:20:55.9315414Z Running test_cpp_extensions_aot_ninja 1/1 ... [2025-12-04 17:20:55.931186][29284.314079145]
2025-12-04T17:20:56.3558820Z Processing /var/lib/jenkins/workspace/test/cpp_extensions
2025-12-04T17:21:01.4060441Z   Preparing metadata (pyproject.toml) ... [?25l- done
2025-12-04T17:21:01.4083652Z [?25hBuilding wheels for collected packages: torch_test_cpp_extension
2025-12-04T17:22:41.3790833Z   Building wheel for torch_test_cpp_extension (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | done
2025-12-04T17:22:41.4243503Z [?25h  Created wheel for torch_test_cpp_extension: filename=torch_test_cpp_extension-0.0.0-cp310-cp310-linux_x86_64.whl size=16079182 sha256=3f76972410efd549ce5ce0fc42c4d311b1eda2e2cc15bab4ad63434d9d6c315d
2025-12-04T17:22:41.4245117Z   Stored in directory: /tmp/pip-ephem-wheel-cache-4s2_754p/wheels/2b/79/8d/635cf291e138cfea331292ca746c62b61fade208eb55a7e3a1
2025-12-04T17:22:41.4268024Z Successfully built torch_test_cpp_extension
2025-12-04T17:22:42.0449811Z Installing collected packages: torch_test_cpp_extension
2025-12-04T17:22:42.3468390Z Successfully installed torch_test_cpp_extension-0.0.0
2025-12-04T17:22:42.8069819Z Processing /var/lib/jenkins/workspace/test/cpp_extensions/no_python_abi_suffix_test
2025-12-04T17:22:46.1406001Z   Preparing metadata (pyproject.toml) ... [?25l- done
2025-12-04T17:22:46.1429199Z [?25hBuilding wheels for collected packages: no_python_abi_suffix_test
2025-12-04T17:22:49.9000589Z   Building wheel for no_python_abi_suffix_test (pyproject.toml) ... [?25l- \ | done
2025-12-04T17:22:49.9009007Z [?25h  Created wheel for no_python_abi_suffix_test: filename=no_python_abi_suffix_test-0.0.0-cp310-cp310-linux_x86_64.whl size=2944 sha256=9e1be669c02aec48f2b8fbc04478a32fe38a4139d4ab0c15944290f8b27df031
2025-12-04T17:22:49.9010617Z   Stored in directory: /tmp/pip-ephem-wheel-cache-6q62uft5/wheels/8c/c7/11/bcf2bfbdebb3cf78b8211ac54acc945a8fdf1732548d147a80
2025-12-04T17:22:49.9032746Z Successfully built no_python_abi_suffix_test
2025-12-04T17:22:50.5211536Z Installing collected packages: no_python_abi_suffix_test
2025-12-04T17:22:50.5311625Z Successfully installed no_python_abi_suffix_test-0.0.0
2025-12-04T17:22:50.6738679Z [1m* Getting build dependencies for wheel...[0m
2025-12-04T17:22:53.5095367Z running egg_info
2025-12-04T17:22:53.5183467Z creating python_agnostic.egg-info
2025-12-04T17:22:53.5184680Z writing python_agnostic.egg-info/PKG-INFO
2025-12-04T17:22:53.5189449Z writing dependency_links to python_agnostic.egg-info/dependency_links.txt
2025-12-04T17:22:53.5192481Z writing top-level names to python_agnostic.egg-info/top_level.txt
2025-12-04T17:22:53.5195009Z writing manifest file 'python_agnostic.egg-info/SOURCES.txt'
2025-12-04T17:22:53.5713466Z reading manifest file 'python_agnostic.egg-info/SOURCES.txt'
2025-12-04T17:22:53.5722907Z writing manifest file 'python_agnostic.egg-info/SOURCES.txt'
2025-12-04T17:22:54.0383236Z [1m* Building wheel...[0m
2025-12-04T17:22:56.8940752Z running bdist_wheel
2025-12-04T17:22:56.9631689Z running build
2025-12-04T17:22:56.9632018Z running build_ext
2025-12-04T17:22:56.9670725Z building 'python_agnostic._C' extension
2025-12-04T17:22:56.9674995Z creating /var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/build/temp.linux-x86_64-cpython-310/var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/python_agnostic/csrc
2025-12-04T17:23:09.7489297Z [1/1] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/build/temp.linux-x86_64-cpython-310/var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/python_agnostic/csrc/ultra_norm.o.d -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/conda/envs/py_3.10/include/python3.10 -c -c /var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/python_agnostic/csrc/ultra_norm.cu -o /var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/build/temp.linux-x86_64-cpython-310/var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/python_agnostic/csrc/ultra_norm.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x030A0000 -DTORCH_EXTENSION_NAME=_C -gencode=arch=compute_75,code=sm_75 -std=c++17
2025-12-04T17:23:09.7552731Z creating build/lib.linux-x86_64-cpython-310/python_agnostic
2025-12-04T17:23:09.7559487Z g++ -pthread -B /opt/conda/envs/py_3.10/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/envs/py_3.10/include -fPIC -O2 -isystem /opt/conda/envs/py_3.10/include -pthread -B /opt/conda/envs/py_3.10/compiler_compat -shared /var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/build/temp.linux-x86_64-cpython-310/var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/python_agnostic/csrc/ultra_norm.o -L/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-310/python_agnostic/_C.so
2025-12-04T17:23:10.3029453Z installing to build/bdist.linux-x86_64/wheel
2025-12-04T17:23:10.3029899Z running install
2025-12-04T17:23:10.3086233Z running install_lib
2025-12-04T17:23:10.3169041Z creating build/bdist.linux-x86_64/wheel
2025-12-04T17:23:10.3171622Z creating build/bdist.linux-x86_64/wheel/python_agnostic
2025-12-04T17:23:10.3173163Z copying build/lib.linux-x86_64-cpython-310/python_agnostic/_C.so -> build/bdist.linux-x86_64/wheel/./python_agnostic
2025-12-04T17:23:10.3179926Z running install_egg_info
2025-12-04T17:23:10.3263441Z running egg_info
2025-12-04T17:23:10.3340783Z writing python_agnostic.egg-info/PKG-INFO
2025-12-04T17:23:10.3345188Z writing dependency_links to python_agnostic.egg-info/dependency_links.txt
2025-12-04T17:23:10.3358915Z writing top-level names to python_agnostic.egg-info/top_level.txt
2025-12-04T17:23:10.3449672Z reading manifest file 'python_agnostic.egg-info/SOURCES.txt'
2025-12-04T17:23:10.3461333Z writing manifest file 'python_agnostic.egg-info/SOURCES.txt'
2025-12-04T17:23:10.3475224Z Copying python_agnostic.egg-info to build/bdist.linux-x86_64/wheel/./python_agnostic-0.0-py3.10.egg-info
2025-12-04T17:23:10.3481813Z running install_scripts
2025-12-04T17:23:10.3609366Z creating build/bdist.linux-x86_64/wheel/python_agnostic-0.0.dist-info/WHEEL
2025-12-04T17:23:10.3615380Z creating '/var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/dist/.tmp-2zipdfqj/python_agnostic-0.0-cp39-abi3-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it
2025-12-04T17:23:10.3821680Z adding 'python_agnostic/_C.so'
2025-12-04T17:23:10.3838895Z adding 'python_agnostic-0.0.dist-info/METADATA'
2025-12-04T17:23:10.3840609Z adding 'python_agnostic-0.0.dist-info/WHEEL'
2025-12-04T17:23:10.3841603Z adding 'python_agnostic-0.0.dist-info/top_level.txt'
2025-12-04T17:23:10.3843175Z adding 'python_agnostic-0.0.dist-info/RECORD'
2025-12-04T17:23:10.3843987Z removing build/bdist.linux-x86_64/wheel
2025-12-04T17:23:10.8300273Z [1m[92mSuccessfully built [4mpython_agnostic-0.0-cp39-abi3-linux_x86_64.whl[0m[1m[92m[0m
2025-12-04T17:23:11.2246337Z Processing /var/lib/jenkins/workspace/test/cpp_extensions/libtorch_agnostic_2_9_extension
2025-12-04T17:23:14.6566486Z   Preparing metadata (pyproject.toml) ... [?25l- done
2025-12-04T17:23:14.6591375Z [?25hRequirement already satisfied: torch in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from libtorch_agnostic_2_9==0.0) (2.10.0a0+gitffd9b0f)
2025-12-04T17:23:14.6621608Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (3.18.0)
2025-12-04T17:23:14.6627228Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (4.12.2)
2025-12-04T17:23:14.6632692Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (1.13.3)
2025-12-04T17:23:14.6637963Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (2.8.8)
2025-12-04T17:23:14.6642139Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (3.1.6)
2025-12-04T17:23:14.6647457Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (2025.10.0)
2025-12-04T17:23:14.7083864Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch->libtorch_agnostic_2_9==0.0) (1.3.0)
2025-12-04T17:23:14.7149222Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch->libtorch_agnostic_2_9==0.0) (3.0.3)
2025-12-04T17:23:14.7160552Z Building wheels for collected packages: libtorch_agnostic_2_9
2025-12-04T17:23:22.0021819Z   Building wheel for libtorch_agnostic_2_9 (pyproject.toml) ... [?25l- \ | / done
2025-12-04T17:23:22.0031377Z [?25h  Created wheel for libtorch_agnostic_2_9: filename=libtorch_agnostic_2_9-0.0-cp39-abi3-linux_x86_64.whl size=54876 sha256=63993cdfa4505b36917d28d00ca4398f906ad02fe7a4cab3e5d2f92a48153c55
2025-12-04T17:23:22.0033014Z   Stored in directory: /tmp/pip-ephem-wheel-cache-ap_pvjar/wheels/e1/56/0d/91ac1e918c8015b48f6a77f66abeeb8427a8788f7d37715e0e
2025-12-04T17:23:22.0053091Z Successfully built libtorch_agnostic_2_9
2025-12-04T17:23:22.5774690Z Installing collected packages: libtorch_agnostic_2_9
2025-12-04T17:23:22.5913546Z Successfully installed libtorch_agnostic_2_9-0.0
2025-12-04T17:23:23.0578068Z Processing /var/lib/jenkins/workspace/test/cpp_extensions/libtorch_agnostic_2_10_extension
2025-12-04T17:23:26.3638137Z   Preparing metadata (pyproject.toml) ... [?25l- done
2025-12-04T17:23:26.3661454Z [?25hRequirement already satisfied: torch in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from libtorch_agnostic_2_10==0.0) (2.10.0a0+gitffd9b0f)
2025-12-04T17:23:26.3691378Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (3.18.0)
2025-12-04T17:23:26.3696974Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (4.12.2)
2025-12-04T17:23:26.3702409Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (1.13.3)
2025-12-04T17:23:26.3707727Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (2.8.8)
2025-12-04T17:23:26.3711765Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (3.1.6)
2025-12-04T17:23:26.3717056Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (2025.10.0)
2025-12-04T17:23:26.4151055Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch->libtorch_agnostic_2_10==0.0) (1.3.0)
2025-12-04T17:23:26.4215588Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch->libtorch_agnostic_2_10==0.0) (3.0.3)
2025-12-04T17:23:26.4226702Z Building wheels for collected packages: libtorch_agnostic_2_10
2025-12-04T17:23:34.2679908Z   Building wheel for libtorch_agnostic_2_10 (pyproject.toml) ... [?25l- \ | / - \ done
2025-12-04T17:23:34.2689618Z [?25h  Created wheel for libtorch_agnostic_2_10: filename=libtorch_agnostic_2_10-0.0-cp39-abi3-linux_x86_64.whl size=81593 sha256=5f00068c251b607df4407edd8072b05735d778ebe5fb0a9e5a9f6d2828e50e3c
2025-12-04T17:23:34.2691135Z   Stored in directory: /tmp/pip-ephem-wheel-cache-6rmob9iy/wheels/03/17/c4/d9b9dbd12b271a9a317a75e944d0966701385d67eac86f2c1a
2025-12-04T17:23:34.2712315Z Successfully built libtorch_agnostic_2_10
2025-12-04T17:23:34.8337332Z Installing collected packages: libtorch_agnostic_2_10
2025-12-04T17:23:34.8478112Z Successfully installed libtorch_agnostic_2_10-0.0
2025-12-04T17:23:34.9367114Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:23:34.9371457Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cpp_extensions_aot_ninja.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:23:34.936857]
2025-12-04T17:23:43.5986616Z 
2025-12-04T17:23:43.5987795Z test_cpp_extensions_aot_ninja 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cpp_extensions_aot_ninja_1.1_f69ae0466baae8e0_.log
2025-12-04T17:23:43.5998650Z Running 21 items in this shard: test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_backward, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_cublas_extension, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_cuda_dlink_libs, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_cuda_extension, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_cusolver_extension, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_extension_function, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_extension_module, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_mps_extension, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_no_python_abi_suffix_sets_the_correct_library_name, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_optional, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_sycl_extension, test/test_cpp_extensions_aot_ninja.py::TestPybindTypeCasters::test_pybind_return_types, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_add, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_autocast_apis_for_maia_device, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_conv_backend_override, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_matmul_autocast_default_precision, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_matmul_autocast_float16_precision, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_unregistered, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_zeros, test/test_cpp_extensions_aot_ninja.py::TestRNGExtension::test_rng, test/test_cpp_extensions_aot_ninja.py::TestTorchLibrary::test_torch_library
2025-12-04T17:23:43.6007578Z 
2025-12-04T17:23:43.6007953Z Finished test_cpp_extensions_aot_ninja 1/1 ... [2025-12-04 17:23:43.598541][29451.98143494], took 2.79min
2025-12-04T17:23:43.6425061Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cpp_extensions_aot_ninja/test_cpp_extensions_aot_ninja-5c9ab2f003415ced.xml
2025-12-04T17:23:43.7209053Z Running test_cpp_extensions_aot_no_ninja 1/1 ... [2025-12-04 17:23:43.720588][29452.10348363]
2025-12-04T17:23:44.1137318Z Processing /var/lib/jenkins/workspace/test/cpp_extensions
2025-12-04T17:23:49.0856887Z   Preparing metadata (pyproject.toml) ... [?25l- done
2025-12-04T17:23:49.0880602Z [?25hBuilding wheels for collected packages: torch_test_cpp_extension
2025-12-04T17:25:27.3206304Z   Building wheel for torch_test_cpp_extension (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - done
2025-12-04T17:25:27.3584232Z [?25h  Created wheel for torch_test_cpp_extension: filename=torch_test_cpp_extension-0.0.0-cp310-cp310-linux_x86_64.whl size=13197897 sha256=d3bb43a8de5ff621a65010cc20d8c1ac3de2b334ada123116dd85af039ed0652
2025-12-04T17:25:27.3585862Z   Stored in directory: /tmp/pip-ephem-wheel-cache-5rvnqec8/wheels/2b/79/8d/635cf291e138cfea331292ca746c62b61fade208eb55a7e3a1
2025-12-04T17:25:27.3606700Z Successfully built torch_test_cpp_extension
2025-12-04T17:25:27.9727292Z Installing collected packages: torch_test_cpp_extension
2025-12-04T17:25:28.2260458Z Successfully installed torch_test_cpp_extension-0.0.0
2025-12-04T17:25:28.6883320Z Processing /var/lib/jenkins/workspace/test/cpp_extensions/no_python_abi_suffix_test
2025-12-04T17:25:32.0294575Z   Preparing metadata (pyproject.toml) ... [?25l- done
2025-12-04T17:25:32.0317997Z [?25hBuilding wheels for collected packages: no_python_abi_suffix_test
2025-12-04T17:25:35.8430880Z   Building wheel for no_python_abi_suffix_test (pyproject.toml) ... [?25l- \ | done
2025-12-04T17:25:35.8439493Z [?25h  Created wheel for no_python_abi_suffix_test: filename=no_python_abi_suffix_test-0.0.0-cp310-cp310-linux_x86_64.whl size=2944 sha256=da515172478958bd0a69f7a1366f3fe56fc2eef5b1de4d4bc2e7d1388f072f02
2025-12-04T17:25:35.8441079Z   Stored in directory: /tmp/pip-ephem-wheel-cache-r4jbm4xg/wheels/8c/c7/11/bcf2bfbdebb3cf78b8211ac54acc945a8fdf1732548d147a80
2025-12-04T17:25:35.8464029Z Successfully built no_python_abi_suffix_test
2025-12-04T17:25:36.4691998Z Installing collected packages: no_python_abi_suffix_test
2025-12-04T17:25:36.4811270Z Successfully installed no_python_abi_suffix_test-0.0.0
2025-12-04T17:25:36.6296596Z [1m* Getting build dependencies for wheel...[0m
2025-12-04T17:25:39.4404257Z running egg_info
2025-12-04T17:25:39.4494373Z writing python_agnostic.egg-info/PKG-INFO
2025-12-04T17:25:39.4499279Z writing dependency_links to python_agnostic.egg-info/dependency_links.txt
2025-12-04T17:25:39.4502594Z writing top-level names to python_agnostic.egg-info/top_level.txt
2025-12-04T17:25:39.5024833Z reading manifest file 'python_agnostic.egg-info/SOURCES.txt'
2025-12-04T17:25:39.5035508Z writing manifest file 'python_agnostic.egg-info/SOURCES.txt'
2025-12-04T17:25:39.9703280Z [1m* Building wheel...[0m
2025-12-04T17:25:43.0101796Z running bdist_wheel
2025-12-04T17:25:43.0805250Z running build
2025-12-04T17:25:43.0805546Z running build_ext
2025-12-04T17:25:43.0844648Z building 'python_agnostic._C' extension
2025-12-04T17:25:43.1611382Z ninja: no work to do.
2025-12-04T17:25:43.1663775Z g++ -pthread -B /opt/conda/envs/py_3.10/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/envs/py_3.10/include -fPIC -O2 -isystem /opt/conda/envs/py_3.10/include -pthread -B /opt/conda/envs/py_3.10/compiler_compat -shared /var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/build/temp.linux-x86_64-cpython-310/var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/python_agnostic/csrc/ultra_norm.o -L/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-310/python_agnostic/_C.so
2025-12-04T17:25:43.7452553Z installing to build/bdist.linux-x86_64/wheel
2025-12-04T17:25:43.7452987Z running install
2025-12-04T17:25:43.7508359Z running install_lib
2025-12-04T17:25:43.7593703Z creating build/bdist.linux-x86_64/wheel
2025-12-04T17:25:43.7595402Z creating build/bdist.linux-x86_64/wheel/python_agnostic
2025-12-04T17:25:43.7596826Z copying build/lib.linux-x86_64-cpython-310/python_agnostic/_C.so -> build/bdist.linux-x86_64/wheel/./python_agnostic
2025-12-04T17:25:43.7603533Z running install_egg_info
2025-12-04T17:25:43.7687771Z running egg_info
2025-12-04T17:25:43.7765088Z writing python_agnostic.egg-info/PKG-INFO
2025-12-04T17:25:43.7769995Z writing dependency_links to python_agnostic.egg-info/dependency_links.txt
2025-12-04T17:25:43.7773669Z writing top-level names to python_agnostic.egg-info/top_level.txt
2025-12-04T17:25:43.7855178Z reading manifest file 'python_agnostic.egg-info/SOURCES.txt'
2025-12-04T17:25:43.7866452Z writing manifest file 'python_agnostic.egg-info/SOURCES.txt'
2025-12-04T17:25:43.7868418Z Copying python_agnostic.egg-info to build/bdist.linux-x86_64/wheel/./python_agnostic-0.0-py3.10.egg-info
2025-12-04T17:25:43.7876513Z running install_scripts
2025-12-04T17:25:43.8004515Z creating build/bdist.linux-x86_64/wheel/python_agnostic-0.0.dist-info/WHEEL
2025-12-04T17:25:43.8010283Z creating '/var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/dist/.tmp-p4c6246r/python_agnostic-0.0-cp39-abi3-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it
2025-12-04T17:25:43.8214874Z adding 'python_agnostic/_C.so'
2025-12-04T17:25:43.8232202Z adding 'python_agnostic-0.0.dist-info/METADATA'
2025-12-04T17:25:43.8233490Z adding 'python_agnostic-0.0.dist-info/WHEEL'
2025-12-04T17:25:43.8234742Z adding 'python_agnostic-0.0.dist-info/top_level.txt'
2025-12-04T17:25:43.8236256Z adding 'python_agnostic-0.0.dist-info/RECORD'
2025-12-04T17:25:43.8237071Z removing build/bdist.linux-x86_64/wheel
2025-12-04T17:25:44.2516429Z [1m[92mSuccessfully built [4mpython_agnostic-0.0-cp39-abi3-linux_x86_64.whl[0m[1m[92m[0m
2025-12-04T17:25:44.6409568Z Processing /var/lib/jenkins/workspace/test/cpp_extensions/libtorch_agnostic_2_9_extension
2025-12-04T17:25:47.9354864Z   Preparing metadata (pyproject.toml) ... [?25l- done
2025-12-04T17:25:47.9379863Z [?25hRequirement already satisfied: torch in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from libtorch_agnostic_2_9==0.0) (2.10.0a0+gitffd9b0f)
2025-12-04T17:25:47.9409782Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (3.18.0)
2025-12-04T17:25:47.9415715Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (4.12.2)
2025-12-04T17:25:47.9421427Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (1.13.3)
2025-12-04T17:25:47.9427269Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (2.8.8)
2025-12-04T17:25:47.9431438Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (3.1.6)
2025-12-04T17:25:47.9437162Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (2025.10.0)
2025-12-04T17:25:47.9871239Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch->libtorch_agnostic_2_9==0.0) (1.3.0)
2025-12-04T17:25:47.9935899Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch->libtorch_agnostic_2_9==0.0) (3.0.3)
2025-12-04T17:25:47.9947517Z Building wheels for collected packages: libtorch_agnostic_2_9
2025-12-04T17:25:51.8812873Z   Building wheel for libtorch_agnostic_2_9 (pyproject.toml) ... [?25l- \ | done
2025-12-04T17:25:51.8822117Z [?25h  Created wheel for libtorch_agnostic_2_9: filename=libtorch_agnostic_2_9-0.0-cp39-abi3-linux_x86_64.whl size=54876 sha256=82babd3bcf2045972a618ae7912fb61a0a5e5f4e6e94fc7cdfdab7eeb43e57c2
2025-12-04T17:25:51.8823699Z   Stored in directory: /tmp/pip-ephem-wheel-cache-yxx7mpj8/wheels/e1/56/0d/91ac1e918c8015b48f6a77f66abeeb8427a8788f7d37715e0e
2025-12-04T17:25:51.8844069Z Successfully built libtorch_agnostic_2_9
2025-12-04T17:25:52.4531318Z Installing collected packages: libtorch_agnostic_2_9
2025-12-04T17:25:52.4666938Z Successfully installed libtorch_agnostic_2_9-0.0
2025-12-04T17:25:52.9282891Z Processing /var/lib/jenkins/workspace/test/cpp_extensions/libtorch_agnostic_2_10_extension
2025-12-04T17:25:56.3839340Z   Preparing metadata (pyproject.toml) ... [?25l- done
2025-12-04T17:25:56.3863694Z [?25hRequirement already satisfied: torch in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from libtorch_agnostic_2_10==0.0) (2.10.0a0+gitffd9b0f)
2025-12-04T17:25:56.3894348Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (3.18.0)
2025-12-04T17:25:56.3900553Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (4.12.2)
2025-12-04T17:25:56.3906036Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (1.13.3)
2025-12-04T17:25:56.3911534Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (2.8.8)
2025-12-04T17:25:56.3915532Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (3.1.6)
2025-12-04T17:25:56.3921070Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (2025.10.0)
2025-12-04T17:25:56.4362006Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch->libtorch_agnostic_2_10==0.0) (1.3.0)
2025-12-04T17:25:56.4428038Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch->libtorch_agnostic_2_10==0.0) (3.0.3)
2025-12-04T17:25:56.4439584Z Building wheels for collected packages: libtorch_agnostic_2_10
2025-12-04T17:26:00.6296282Z   Building wheel for libtorch_agnostic_2_10 (pyproject.toml) ... [?25l- \ | done
2025-12-04T17:26:00.6306491Z [?25h  Created wheel for libtorch_agnostic_2_10: filename=libtorch_agnostic_2_10-0.0-cp39-abi3-linux_x86_64.whl size=81593 sha256=a10451cbfb056d0b0c8c2a62b0a026ccfaf49e43556f6e1867a4e59bfd96dff1
2025-12-04T17:26:00.6308195Z   Stored in directory: /tmp/pip-ephem-wheel-cache-5l5_jzuz/wheels/03/17/c4/d9b9dbd12b271a9a317a75e944d0966701385d67eac86f2c1a
2025-12-04T17:26:00.6328218Z Successfully built libtorch_agnostic_2_10
2025-12-04T17:26:01.1988983Z Installing collected packages: libtorch_agnostic_2_10
2025-12-04T17:26:01.2133811Z Successfully installed libtorch_agnostic_2_10-0.0
2025-12-04T17:26:01.3007918Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T17:26:01.3012241Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cpp_extensions_aot_no_ninja.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:26:01.300941]
2025-12-04T17:26:09.8033158Z 
2025-12-04T17:26:09.8034386Z test_cpp_extensions_aot_no_ninja 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cpp_extensions_aot_no_ninja_1.1_8356099a97b89d55_.log
2025-12-04T17:26:09.8044028Z Running 21 items in this shard: test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_backward, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_cublas_extension, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_cuda_dlink_libs, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_cuda_extension, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_cusolver_extension, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_extension_function, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_extension_module, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_mps_extension, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_no_python_abi_suffix_sets_the_correct_library_name, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_optional, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_sycl_extension, test/test_cpp_extensions_aot_no_ninja.py::TestPybindTypeCasters::test_pybind_return_types, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_add, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_autocast_apis_for_maia_device, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_conv_backend_override, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_matmul_autocast_default_precision, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_matmul_autocast_float16_precision, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_unregistered, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_zeros, test/test_cpp_extensions_aot_no_ninja.py::TestRNGExtension::test_rng, test/test_cpp_extensions_aot_no_ninja.py::TestTorchLibrary::test_torch_library
2025-12-04T17:26:09.8053017Z 
2025-12-04T17:26:09.8053387Z Finished test_cpp_extensions_aot_no_ninja 1/1 ... [2025-12-04 17:26:09.803163][29598.18605664], took 2.43min
2025-12-04T17:26:09.8471572Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cpp_extensions_aot_no_ninja/test_cpp_extensions_aot_no_ninja-dc0b3ab1cc30279c.xml
2025-12-04T17:26:11.5773483Z Uploading artifacts took 1.65 seconds
2025-12-04T17:26:18.8605662Z Running test batch 'tests to run' cost 27651.08 seconds
2025-12-04T17:26:18.8620448Z Emitting td_test_failure_stats_v2
2025-12-04T17:26:18.8624184Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869178_5850e89cd13611f09b240242ac110002
2025-12-04T17:26:18.9615302Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869178_5850e89cd13611f09b240242ac110002 
2025-12-04T17:26:18.9630024Z Emitting td_test_failure_stats_v2
2025-12-04T17:26:18.9632718Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869178_58604cb0d13611f09b240242ac110002
2025-12-04T17:26:18.9967555Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869178_58604cb0d13611f09b240242ac110002 
2025-12-04T17:26:18.9983682Z Emitting td_test_failure_stats_v2
2025-12-04T17:26:18.9986442Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869178_5865b312d13611f09b240242ac110002
2025-12-04T17:26:19.0324791Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869178_5865b312d13611f09b240242ac110002 
2025-12-04T17:26:19.0341150Z Emitting td_test_failure_stats_v2
2025-12-04T17:26:19.0344078Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_586b2798d13611f09b240242ac110002
2025-12-04T17:26:19.0714313Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_586b2798d13611f09b240242ac110002 
2025-12-04T17:26:19.0730442Z Emitting td_test_failure_stats_v2
2025-12-04T17:26:19.0733117Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_587117acd13611f09b240242ac110002
2025-12-04T17:26:19.1094534Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_587117acd13611f09b240242ac110002 
2025-12-04T17:26:19.1111649Z Emitting td_test_failure_stats_v2
2025-12-04T17:26:19.1114509Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_5876e736d13611f09b240242ac110002
2025-12-04T17:26:19.1468105Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_5876e736d13611f09b240242ac110002 
2025-12-04T17:26:19.1484907Z Emitting td_test_failure_stats_v2
2025-12-04T17:26:19.1487526Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_587c987ad13611f09b240242ac110002
2025-12-04T17:26:19.1863554Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_587c987ad13611f09b240242ac110002 
2025-12-04T17:26:19.1880012Z Emitting td_test_failure_stats_v2
2025-12-04T17:26:19.1883223Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_5882a238d13611f09b240242ac110002
2025-12-04T17:26:19.2238927Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_5882a238d13611f09b240242ac110002 
2025-12-04T17:26:19.2255286Z Emitting td_test_failure_stats_v2
2025-12-04T17:26:19.2258702Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_58885c82d13611f09b240242ac110002
2025-12-04T17:26:19.2614033Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_58885c82d13611f09b240242ac110002 
2025-12-04T17:26:19.2630330Z Emitting td_test_failure_stats_v2
2025-12-04T17:26:19.2633590Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_588e156ed13611f09b240242ac110002
2025-12-04T17:26:19.3176402Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_588e156ed13611f09b240242ac110002 
2025-12-04T17:26:19.3192773Z Emitting td_test_failure_stats_v2
2025-12-04T17:26:19.3196125Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_5896aad0d13611f09b240242ac110002
2025-12-04T17:26:19.3551613Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_5896aad0d13611f09b240242ac110002 
2025-12-04T17:26:19.3552809Z inductor/test_aot_inductor 1/6 failed!
2025-12-04T17:26:19.3553531Z inductor/test_aot_inductor 6/6 failed!
2025-12-04T17:26:19.3554216Z inductor/test_torchinductor_codegen_dynamic_shapes 2/4 failed!
2025-12-04T17:26:19.3554742Z inductor/test_cuda_select_algorithm 3/5 failed!
2025-12-04T17:26:19.3555186Z inductor/test_compile_subprocess 3/3 failed!
2025-12-04T17:26:19.3555599Z inductor/test_deterministic 5/8 failed!
2025-12-04T17:26:19.3555961Z inductor/test_fp8 1/1 failed!
2025-12-04T17:26:19.3556299Z dynamo/test_model_output 1/1 failed!
2025-12-04T17:26:19.3556795Z inductor/test_loop_ordering 1/1 failed!
2025-12-04T17:26:19.3557160Z dynamo/test_backends 1/1 failed!
2025-12-04T17:26:19.3557541Z inductor/test_aot_inductor_package 1/1 failed!
2025-12-04T17:26:20.2353282Z 
2025-12-04T17:26:20.2353766Z real	460m59.836s
2025-12-04T17:26:20.2354144Z user	454m32.649s
2025-12-04T17:26:20.2354397Z sys	59m41.849s
2025-12-04T17:26:20.2354659Z + sccache_epilogue
2025-12-04T17:26:20.2354999Z + echo '::group::Sccache Compilation Log'
2025-12-04T17:26:20.2355833Z ##[group]Sccache Compilation Log
2025-12-04T17:26:20.2356250Z + echo '=================== sccache compilation log ==================='
2025-12-04T17:26:20.2356720Z =================== sccache compilation log ===================
2025-12-04T17:26:20.2357460Z + python /var/lib/jenkins/workspace/.ci/pytorch/print_sccache_log.py /var/lib/jenkins/sccache_error.log
2025-12-04T17:26:20.2507813Z + echo '=========== If your build fails, please take a look at the log above for possible reasons ==========='
2025-12-04T17:26:20.2508640Z =========== If your build fails, please take a look at the log above for possible reasons ===========
2025-12-04T17:26:20.2509205Z + sccache --show-stats
2025-12-04T17:26:20.2543267Z Compile requests                   6749
2025-12-04T17:26:20.2543680Z Compile requests executed           604
2025-12-04T17:26:20.2544039Z Cache hits                          314
2025-12-04T17:26:20.2545634Z Cache hits (C/C++)                  314
2025-12-04T17:26:20.2546036Z Cache misses                        290
2025-12-04T17:26:20.2546495Z Cache misses (C/C++)                290
2025-12-04T17:26:20.2546851Z Cache hits rate                   51.99 %
2025-12-04T17:26:20.2547231Z Cache hits rate (C/C++)           51.99 %
2025-12-04T17:26:20.2547603Z Cache timeouts                        0
2025-12-04T17:26:20.2547954Z Cache read errors                     0
2025-12-04T17:26:20.2548316Z Forced recaches                       0
2025-12-04T17:26:20.2548674Z Cache write errors                    0
2025-12-04T17:26:20.2549020Z Cache errors                          0
2025-12-04T17:26:20.2549373Z Compilations                        290
2025-12-04T17:26:20.2549737Z Compilation failures                  0
2025-12-04T17:26:20.2550101Z Non-cacheable compilations            0
2025-12-04T17:26:20.2550522Z Non-cacheable calls                 339
2025-12-04T17:26:20.2550899Z Non-compilation calls              5806
2025-12-04T17:26:20.2551285Z Unsupported compiler calls            0
2025-12-04T17:26:20.2551658Z Average cache write               0.044 s
2025-12-04T17:26:20.2552044Z Average compiler                  6.073 s
2025-12-04T17:26:20.2552427Z Average cache read hit            0.036 s
2025-12-04T17:26:20.2552821Z Failed distributed compilations       0
2025-12-04T17:26:20.2553076Z 
2025-12-04T17:26:20.2553191Z Non-cacheable reasons:
2025-12-04T17:26:20.2553511Z unknown source language             258
2025-12-04T17:26:20.2553873Z -E                                   81
2025-12-04T17:26:20.2554112Z 
2025-12-04T17:26:20.2554383Z Cache location                  s3, name: ossci-compiler-cache-circleci-v2, prefix: /
2025-12-04T17:26:20.2554924Z Version (client)                0.10.0
2025-12-04T17:26:20.2555287Z + sccache --stop-server
2025-12-04T17:26:20.2569994Z Stopping sccache server...
2025-12-04T17:26:20.2573875Z Compile requests                   6749
2025-12-04T17:26:20.2574554Z Compile requests executed           604
2025-12-04T17:26:20.2574967Z Cache hits                          314
2025-12-04T17:26:20.2575317Z Cache hits (C/C++)                  314
2025-12-04T17:26:20.2575680Z Cache misses                        290
2025-12-04T17:26:20.2576048Z Cache misses (C/C++)                290
2025-12-04T17:26:20.2576599Z Cache hits rate                   51.99 %
2025-12-04T17:26:20.2577089Z Cache hits rate (C/C++)           51.99 %
2025-12-04T17:26:20.2577567Z Cache timeouts                        0
2025-12-04T17:26:20.2577926Z Cache read errors                     0
2025-12-04T17:26:20.2578269Z Forced recaches                       0
2025-12-04T17:26:20.2578739Z Cache write errors                    0
2025-12-04T17:26:20.2579094Z Cache errors                          0
2025-12-04T17:26:20.2579437Z Compilations                        290
2025-12-04T17:26:20.2579805Z Compilation failures                  0
2025-12-04T17:26:20.2580181Z Non-cacheable compilations            0
2025-12-04T17:26:20.2580545Z Non-cacheable calls                 339
2025-12-04T17:26:20.2580916Z Non-compilation calls              5806
2025-12-04T17:26:20.2581292Z Unsupported compiler calls            0
2025-12-04T17:26:20.2581670Z Average cache write               0.044 s
2025-12-04T17:26:20.2582036Z Average compiler                  6.073 s
2025-12-04T17:26:20.2582417Z Average cache read hit            0.036 s
2025-12-04T17:26:20.2582808Z Failed distributed compilations       0
2025-12-04T17:26:20.2583060Z 
2025-12-04T17:26:20.2583171Z Non-cacheable reasons:
2025-12-04T17:26:20.2583488Z unknown source language             258
2025-12-04T17:26:20.2583849Z -E                                   81
2025-12-04T17:26:20.2584086Z 
2025-12-04T17:26:20.2584360Z Cache location                  s3, name: ossci-compiler-cache-circleci-v2, prefix: /
2025-12-04T17:26:20.2584902Z Version (client)                0.10.0
2025-12-04T17:26:20.2585293Z + echo ::endgroup::
2025-12-04T17:26:20.2585860Z ##[endgroup]
2025-12-04T17:26:20.2586131Z + cleanup_workspace
2025-12-04T17:26:20.2586796Z + echo 'sudo may print the following warning message that can be ignored. The chown command will still run.'
2025-12-04T17:26:20.2587981Z sudo may print the following warning message that can be ignored. The chown command will still run.
2025-12-04T17:26:20.2588736Z + echo '    sudo: setrlimit(RLIMIT_STACK): Operation not permitted'
2025-12-04T17:26:20.2589400Z     sudo: setrlimit(RLIMIT_STACK): Operation not permitted
2025-12-04T17:26:20.2590054Z + echo 'For more details refer to https://github.com/sudo-project/sudo/issues/42'
2025-12-04T17:26:20.2590760Z For more details refer to https://github.com/sudo-project/sudo/issues/42
2025-12-04T17:26:20.2591457Z + sudo chown -R 1000 /var/lib/jenkins/workspace
2025-12-04T17:26:21.0454603Z ##[error]Process completed with exit code 1.
2025-12-04T17:26:21.0546909Z Prepare all required actions
2025-12-04T17:26:21.0547390Z Getting action download info
2025-12-04T17:26:21.2514905Z ##[group]Run ./.github/actions/pytest-cache-upload
2025-12-04T17:26:21.2515312Z with:
2025-12-04T17:26:21.2515569Z   cache_dir: .pytest_cache
2025-12-04T17:26:21.2515885Z   shard: 1
2025-12-04T17:26:21.2516162Z   sha: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T17:26:21.2516564Z   test_config: legacy_nvidia_driver
2025-12-04T17:26:21.2517008Z   job_identifier: periodic_linux-jammy-cuda12.4-py3.10-gcc11
2025-12-04T17:26:21.2517449Z env:
2025-12-04T17:26:21.2517679Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:21.2517987Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:21.2518352Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:21.2518982Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:21.2519570Z ##[endgroup]
2025-12-04T17:26:21.2557490Z ##[group]Run nick-fields/retry@v3.0.0
2025-12-04T17:26:21.2557888Z with:
2025-12-04T17:26:21.2558113Z   shell: bash
2025-12-04T17:26:21.2558370Z   timeout_minutes: 5
2025-12-04T17:26:21.2558651Z   max_attempts: 5
2025-12-04T17:26:21.2558911Z   retry_wait_seconds: 30
2025-12-04T17:26:21.2559296Z   command: set -eu
python3 -m pip install boto3==1.35.42

2025-12-04T17:26:21.2559747Z   polling_interval_seconds: 1
2025-12-04T17:26:21.2560075Z   warning_on_retry: true
2025-12-04T17:26:21.2560368Z   continue_on_error: false
2025-12-04T17:26:21.2560659Z env:
2025-12-04T17:26:21.2560898Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:21.2561188Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:21.2561550Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:21.2562197Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:21.2562764Z ##[endgroup]
2025-12-04T17:26:21.6664692Z Defaulting to user installation because normal site-packages is not writeable
2025-12-04T17:26:22.9494346Z Collecting boto3==1.35.42
2025-12-04T17:26:22.9701631Z   Downloading boto3-1.35.42-py3-none-any.whl (139 kB)
2025-12-04T17:26:22.9875622Z Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /usr/lib/python3.9/site-packages (from boto3==1.35.42) (0.10.0)
2025-12-04T17:26:24.4419975Z Collecting botocore<1.36.0,>=1.35.42
2025-12-04T17:26:24.4469693Z   Downloading botocore-1.35.99-py3-none-any.whl (13.3 MB)
2025-12-04T17:26:24.6967727Z Collecting s3transfer<0.11.0,>=0.10.0
2025-12-04T17:26:24.7036897Z   Downloading s3transfer-0.10.4-py3-none-any.whl (83 kB)
2025-12-04T17:26:24.7145041Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (2.8.1)
2025-12-04T17:26:24.7157907Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.25.10)
2025-12-04T17:26:24.9206718Z Requirement already satisfied: six>=1.5 in /usr/lib/python3.9/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.15.0)
2025-12-04T17:26:25.0248522Z Installing collected packages: botocore, s3transfer, boto3
2025-12-04T17:26:25.6766098Z Successfully installed boto3-1.35.42 botocore-1.35.99 s3transfer-0.10.4
2025-12-04T17:26:26.3496261Z Command completed after 1 attempt(s).
2025-12-04T17:26:26.3554203Z ##[group]Run python3 .github/scripts/pytest_cache.py \
2025-12-04T17:26:26.3554762Z [36;1mpython3 .github/scripts/pytest_cache.py \[0m
2025-12-04T17:26:26.3555189Z [36;1m  --upload \[0m
2025-12-04T17:26:26.3555559Z [36;1m  --cache_dir "$GITHUB_WORKSPACE/$CACHE_DIR" \[0m
2025-12-04T17:26:26.3556009Z [36;1m  --pr_identifier "$GITHUB_REF" \[0m
2025-12-04T17:26:26.3556439Z [36;1m  --job_identifier "$JOB_IDENTIFIER" \[0m
2025-12-04T17:26:26.3556837Z [36;1m  --sha "$SHA" \[0m
2025-12-04T17:26:26.3557189Z [36;1m  --test_config "$TEST_CONFIG" \[0m
2025-12-04T17:26:26.3557553Z [36;1m  --shard "$SHARD" \[0m
2025-12-04T17:26:26.3558079Z [36;1m  --repo "$REPO" \[0m
2025-12-04T17:26:26.3558427Z [36;1m  --temp_dir "$RUNNER_TEMP" \[0m
2025-12-04T17:26:26.3569833Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T17:26:26.3570287Z env:
2025-12-04T17:26:26.3570549Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:26.3570851Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:26.3571530Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:26.3572190Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:26.3572790Z   CACHE_DIR: .pytest_cache
2025-12-04T17:26:26.3573196Z   JOB_IDENTIFIER: periodic_linux-jammy-cuda12.4-py3.10-gcc11
2025-12-04T17:26:26.3573697Z   SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T17:26:26.3574106Z   TEST_CONFIG: legacy_nvidia_driver
2025-12-04T17:26:26.3574433Z   SHARD: 1
2025-12-04T17:26:26.3574697Z   REPO: pytorch/pytorch
2025-12-04T17:26:26.3574992Z ##[endgroup]
2025-12-04T17:26:26.8688011Z PR identifier for `refs/heads/main` is `96e092540d6b3c4076e3d2bc6f1f9013`
2025-12-04T17:26:26.8690408Z Uploading cache with args Namespace(upload=True, download=False, cache_dir='/home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache', pr_identifier='refs/heads/main', job_identifier='periodic_linux-jammy-cuda12.4-py3.10-gcc11', sha='ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32', test_config='legacy_nvidia_driver', shard='1', repo='pytorch/pytorch', temp_dir='/home/ec2-user/actions-runner/_work/_temp', bucket=None)
2025-12-04T17:26:26.8692779Z Zipping /home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache
2025-12-04T17:26:26.8694331Z      to /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/periodic_linux-jammy-cuda12_4-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/legacy_nvidia_driver/1
2025-12-04T17:26:26.8697052Z Uploading /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/periodic_linux-jammy-cuda12_4-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/legacy_nvidia_driver/1.zip
2025-12-04T17:26:26.8699363Z        to s3://gha-artifacts/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/periodic_linux-jammy-cuda12_4-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/legacy_nvidia_driver/1.zip
2025-12-04T17:26:26.9389190Z ##[group]Run cat test/**/*_toprint.log || true
2025-12-04T17:26:26.9389667Z [36;1mcat test/**/*_toprint.log || true[0m
2025-12-04T17:26:26.9397596Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T17:26:26.9398038Z env:
2025-12-04T17:26:26.9398300Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:26.9398613Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:26.9398968Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:26.9399617Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:26.9400213Z ##[endgroup]
2025-12-04T17:26:26.9504427Z cat: 'test/**/*_toprint.log': No such file or directory
2025-12-04T17:26:26.9536253Z ##[group]Run kill "$MONITOR_SCRIPT_PID"
2025-12-04T17:26:26.9536823Z [36;1mkill "$MONITOR_SCRIPT_PID"[0m
2025-12-04T17:26:26.9543560Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T17:26:26.9544004Z env:
2025-12-04T17:26:26.9544261Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:26.9544671Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:26.9545045Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:26.9545703Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:26.9546299Z   MONITOR_SCRIPT_PID: 68781
2025-12-04T17:26:26.9546601Z ##[endgroup]
2025-12-04T17:26:26.9574416Z /home/ec2-user/actions-runner/_work/_temp/3915b074-498c-4516-9932-2a86b1af7de7.sh: line 1: kill: (68781) - No such process
2025-12-04T17:26:26.9576878Z ##[error]Process completed with exit code 1.
2025-12-04T17:26:26.9722444Z Prepare all required actions
2025-12-04T17:26:26.9723023Z Getting action download info
2025-12-04T17:26:27.1423223Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a)
2025-12-04T17:26:27.3785519Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02)
2025-12-04T17:26:27.7942311Z ##[group]Run ./.github/actions/upload-test-artifacts
2025-12-04T17:26:27.7942750Z with:
2025-12-04T17:26:27.7943212Z   file-suffix: test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248
2025-12-04T17:26:27.7943810Z   s3-bucket: gha-artifacts
2025-12-04T17:26:27.7944109Z env:
2025-12-04T17:26:27.7944340Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:27.7944649Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:27.7945019Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:27.7945653Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:27.7946301Z ##[endgroup]
2025-12-04T17:26:27.7974069Z ##[group]Run # Remove any previous test jsons if they exist
2025-12-04T17:26:27.7974626Z [36;1m# Remove any previous test jsons if they exist[0m
2025-12-04T17:26:27.7975057Z [36;1mrm -f test-jsons-*.zip[0m
2025-12-04T17:26:27.7975567Z [36;1mzip -r "test-jsons-${FILE_SUFFIX}.zip" test/test-reports -i '*.json'[0m
2025-12-04T17:26:27.7982650Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T17:26:27.7983109Z env:
2025-12-04T17:26:27.7983353Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:27.7983667Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:27.7984033Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:27.7984672Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:27.7985479Z   FILE_SUFFIX: test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248
2025-12-04T17:26:27.7986049Z ##[endgroup]
2025-12-04T17:26:27.8198526Z   adding: test/test-reports/td_exclusions-308e0cc0998f3631517c.json (deflated 16%)
2025-12-04T17:26:27.8199546Z   adding: test/test-reports/python-pytest/lazy.test_ts_opinfo/lazy.test_ts_opinfo-8eadd60536af3632.json (deflated 76%)
2025-12-04T17:26:27.8210013Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-f2c58a9dfc31919e.json (deflated 93%)
2025-12-04T17:26:27.8213808Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-a793ea186f6e0edb.json (deflated 91%)
2025-12-04T17:26:27.8216126Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-fcd1db8f24799401.json (deflated 91%)
2025-12-04T17:26:27.8219972Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-bb08f25297cc596b.json (deflated 94%)
2025-12-04T17:26:27.8225299Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-bf15e775351f3d84.json (deflated 92%)
2025-12-04T17:26:27.8234486Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-cd1c50b62bb47a1b.json (deflated 95%)
2025-12-04T17:26:27.8243422Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-3e5313e420476f15.json (deflated 95%)
2025-12-04T17:26:27.8246305Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-b23b654b51890d24.json (deflated 90%)
2025-12-04T17:26:27.8248416Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-2e7c8f13f7be0603.json (deflated 91%)
2025-12-04T17:26:27.8250600Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-8d6cdce6581fa448.json (deflated 91%)
2025-12-04T17:26:27.8254806Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-9dce38c1d023996d.json (deflated 92%)
2025-12-04T17:26:27.8257279Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-b570798f966501a4.json (deflated 91%)
2025-12-04T17:26:27.8259148Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-be9e2a318f1480ff.json (deflated 91%)
2025-12-04T17:26:27.8262502Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-e62e290dfdad5699.json (deflated 94%)
2025-12-04T17:26:27.8290980Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-0c75da116b2f10f8.json (deflated 94%)
2025-12-04T17:26:27.8292781Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-fd0863b8a222871a.json (deflated 88%)
2025-12-04T17:26:27.8294618Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-6fcb35b3fc35a71c.json (deflated 88%)
2025-12-04T17:26:27.8302107Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-f8b2416e9d43ac69.json (deflated 94%)
2025-12-04T17:26:27.8307002Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-8ad43f769763d7e0.json (deflated 95%)
2025-12-04T17:26:27.8312756Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-6495f5d67df68869.json (deflated 95%)
2025-12-04T17:26:27.8317986Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-f9f6352517dfd8be.json (deflated 96%)
2025-12-04T17:26:27.8323558Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-25ab0fa1230b07b5.json (deflated 95%)
2025-12-04T17:26:27.8325062Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-74cab4bdcde89184.json (deflated 86%)
2025-12-04T17:26:27.8326716Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-77e37a2f8b75b3d9.json (deflated 85%)
2025-12-04T17:26:27.8328198Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3ba19b390afd5854.json (deflated 85%)
2025-12-04T17:26:27.8329677Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4ad317a243ecdd30.json (deflated 85%)
2025-12-04T17:26:27.8331133Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f482798b2b39d897.json (deflated 85%)
2025-12-04T17:26:27.8332599Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cbe2514f89eef609.json (deflated 85%)
2025-12-04T17:26:27.8334077Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3707d31910126ebf.json (deflated 85%)
2025-12-04T17:26:27.8335562Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dedaec5daecec784.json (deflated 85%)
2025-12-04T17:26:27.8337115Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2f4f0e9c4ac682e4.json (deflated 85%)
2025-12-04T17:26:27.8338666Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-580d25229e34cb07.json (deflated 86%)
2025-12-04T17:26:27.8340136Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9d15e1ab064c4537.json (deflated 85%)
2025-12-04T17:26:27.8341771Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e6d909bcc6975bf8.json (deflated 85%)
2025-12-04T17:26:27.8343247Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0a612698d44183a1.json (deflated 85%)
2025-12-04T17:26:27.8344712Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-90f2ceb88314c75a.json (deflated 85%)
2025-12-04T17:26:27.8346188Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9644b19a5203c0ee.json (deflated 85%)
2025-12-04T17:26:27.8347669Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a6f999e52eb1904.json (deflated 86%)
2025-12-04T17:26:27.8349159Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-547414903ca204e9.json (deflated 85%)
2025-12-04T17:26:27.8350630Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-41f0f199b083e6d2.json (deflated 85%)
2025-12-04T17:26:27.8352083Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-438f9d52209526cc.json (deflated 85%)
2025-12-04T17:26:27.8353555Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98df9406c6e0faf3.json (deflated 85%)
2025-12-04T17:26:27.8355024Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f706cf73cc88a5b8.json (deflated 85%)
2025-12-04T17:26:27.8356499Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c67f05de6c39b0d8.json (deflated 85%)
2025-12-04T17:26:27.8357969Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0e30a339afee7d22.json (deflated 85%)
2025-12-04T17:26:27.8359475Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4a14a2e6be65f97f.json (deflated 85%)
2025-12-04T17:26:27.8360946Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82a3db4b14f41cd2.json (deflated 85%)
2025-12-04T17:26:27.8362420Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a02c7191ab69f431.json (deflated 85%)
2025-12-04T17:26:27.8363895Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e37b8ebc7938792f.json (deflated 85%)
2025-12-04T17:26:27.8365360Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ee37665d187f9309.json (deflated 86%)
2025-12-04T17:26:27.8366840Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-511047743df1b08e.json (deflated 85%)
2025-12-04T17:26:27.8368315Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4d9221d5ac70ff44.json (deflated 85%)
2025-12-04T17:26:27.8369797Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-af9a500a606c950b.json (deflated 85%)
2025-12-04T17:26:27.8371615Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e3ba96547605fc4e.json (deflated 85%)
2025-12-04T17:26:27.8373190Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ce470e45644e1cc6.json (deflated 85%)
2025-12-04T17:26:27.8374665Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cece0bb00c5477e6.json (deflated 85%)
2025-12-04T17:26:27.8376253Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4e672e5e3ae6046c.json (deflated 85%)
2025-12-04T17:26:27.8377824Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-65775801d71c7290.json (deflated 85%)
2025-12-04T17:26:27.8379288Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ed754aaaf490f98.json (deflated 86%)
2025-12-04T17:26:27.8380774Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-36a3a8a6a9d0a436.json (deflated 85%)
2025-12-04T17:26:27.8382256Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f55b1076fbed9be9.json (deflated 85%)
2025-12-04T17:26:27.8383740Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6062b5e411b734f8.json (deflated 85%)
2025-12-04T17:26:27.8385221Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-21fa07c752a411ad.json (deflated 85%)
2025-12-04T17:26:27.8386686Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c09ee7a97c51183.json (deflated 85%)
2025-12-04T17:26:27.8388152Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c713b1dc3f3923ec.json (stored 0%)
2025-12-04T17:26:27.8392685Z   adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-84a2c5e5cdda7bdd.json (deflated 96%)
2025-12-04T17:26:27.8395072Z   adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-97e49e1b6070e822.json (deflated 92%)
2025-12-04T17:26:27.8397412Z   adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-aaac502093c587a7.json (deflated 92%)
2025-12-04T17:26:27.8407222Z   adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-decce829c4432557.json (deflated 95%)
2025-12-04T17:26:27.8408681Z   adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-491de48d6c983340.json (deflated 74%)
2025-12-04T17:26:27.8418577Z   adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-35b1cdd46f4129e6.json (deflated 96%)
2025-12-04T17:26:27.8419948Z   adding: test/test-reports/python-pytest/inductor.test_flex_decoding/inductor.test_flex_decoding-4523fe803428b665.json (stored 0%)
2025-12-04T17:26:27.8421276Z   adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-ccc55353a2e77d8f.json (deflated 91%)
2025-12-04T17:26:27.8425479Z   adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-cbc1aeff512c7b0d.json (deflated 94%)
2025-12-04T17:26:27.8429969Z   adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-b35d65d1a2e42e4e.json (deflated 94%)
2025-12-04T17:26:27.8431318Z   adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-feba5ff46dbc30dd.json (deflated 74%)
2025-12-04T17:26:27.8432541Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dff864e79f1bf91b.json (deflated 88%)
2025-12-04T17:26:27.8434371Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-053a0e10a178eff6.json (deflated 88%)
2025-12-04T17:26:27.8436476Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-966288eeb3fe785e.json (deflated 88%)
2025-12-04T17:26:27.8438439Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-47dd8058babbbd0d.json (deflated 88%)
2025-12-04T17:26:27.8440524Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e92e228ccdafe934.json (deflated 88%)
2025-12-04T17:26:27.8442249Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0328fb4bc2fb022d.json (deflated 88%)
2025-12-04T17:26:27.8444281Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4ecceae3d20d3515.json (deflated 88%)
2025-12-04T17:26:27.8446207Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-af3f0411f43ffff1.json (deflated 88%)
2025-12-04T17:26:27.8448146Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5f646abfecfc34db.json (deflated 88%)
2025-12-04T17:26:27.8450094Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f71fa45f6063b14.json (deflated 88%)
2025-12-04T17:26:27.8452074Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3d881319a967678f.json (deflated 88%)
2025-12-04T17:26:27.8454040Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7b45d70025cf6016.json (deflated 88%)
2025-12-04T17:26:27.8456035Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b4a285d41fdad5fc.json (deflated 88%)
2025-12-04T17:26:27.8458046Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9b24822b6f23300e.json (deflated 88%)
2025-12-04T17:26:27.8459966Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-642548938a706c13.json (deflated 88%)
2025-12-04T17:26:27.8462252Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3087aa3d89d0a96b.json (deflated 89%)
2025-12-04T17:26:27.8464241Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5fb4c628c04a1cdc.json (deflated 88%)
2025-12-04T17:26:27.8466197Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0d752e0bfa5071ea.json (deflated 88%)
2025-12-04T17:26:27.8468136Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fedbc7df4b1c2869.json (deflated 88%)
2025-12-04T17:26:27.8470103Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3cf6d62b643bfad7.json (deflated 88%)
2025-12-04T17:26:27.8472325Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e2e744a24cd2751e.json (deflated 88%)
2025-12-04T17:26:27.8474174Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6de5a411a3f65f82.json (deflated 88%)
2025-12-04T17:26:27.8476161Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d2f3621583fff098.json (deflated 88%)
2025-12-04T17:26:27.8478092Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-319cee3df6121e1a.json (deflated 88%)
2025-12-04T17:26:27.8479993Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-452be63c68b4eb35.json (deflated 88%)
2025-12-04T17:26:27.8481928Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5a49841d6a2b730b.json (deflated 88%)
2025-12-04T17:26:27.8483851Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f1313a025d30dc09.json (deflated 88%)
2025-12-04T17:26:27.8485805Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-03aedafc0832726c.json (deflated 88%)
2025-12-04T17:26:27.8487694Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-89171bcc48f05a69.json (deflated 88%)
2025-12-04T17:26:27.8489645Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6450e334481f0131.json (deflated 88%)
2025-12-04T17:26:27.8490956Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f7999da795e3cf34.json (deflated 87%)
2025-12-04T17:26:27.8498427Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ad7a38726bbc8b50.json (deflated 96%)
2025-12-04T17:26:27.8505588Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b434424093647de3.json (deflated 96%)
2025-12-04T17:26:27.8507762Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ccd966f4e119e833.json (deflated 88%)
2025-12-04T17:26:27.8510354Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d16f18ba4de45d90.json (deflated 90%)
2025-12-04T17:26:27.8513029Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4078dca354f1c797.json (deflated 90%)
2025-12-04T17:26:27.8515293Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7591ded94ad5fda9.json (deflated 89%)
2025-12-04T17:26:27.8517317Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4955a88ef6b89264.json (deflated 88%)
2025-12-04T17:26:27.8519367Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2fae1650dec37ec0.json (deflated 88%)
2025-12-04T17:26:27.8521376Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0893388d06071d35.json (deflated 88%)
2025-12-04T17:26:27.8523425Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b62e3abe6013e6ef.json (deflated 88%)
2025-12-04T17:26:27.8525437Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0fe50dbde6f69754.json (deflated 88%)
2025-12-04T17:26:27.8527476Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1381d94cd6abec18.json (deflated 88%)
2025-12-04T17:26:27.8529493Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2b9aebe063e8f7ef.json (deflated 88%)
2025-12-04T17:26:27.8531520Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-649ba93d0ac5919c.json (deflated 88%)
2025-12-04T17:26:27.8533577Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df60bd1ca7e6baab.json (deflated 88%)
2025-12-04T17:26:27.8535639Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-927fdf8f8ff6280c.json (deflated 88%)
2025-12-04T17:26:27.8537797Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f380c761dc75570.json (deflated 88%)
2025-12-04T17:26:27.8539843Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db3aa4c2f1c0f2c1.json (deflated 88%)
2025-12-04T17:26:27.8541839Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7c20e7902388541e.json (deflated 88%)
2025-12-04T17:26:27.8543871Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43cf13c151388d8e.json (deflated 88%)
2025-12-04T17:26:27.8545883Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-27661fe34019a4f8.json (deflated 88%)
2025-12-04T17:26:27.8547911Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-63ef36c446edecf7.json (deflated 88%)
2025-12-04T17:26:27.8549891Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-818cc5e6f257d295.json (deflated 88%)
2025-12-04T17:26:27.8551970Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b552d5ebf2a766dc.json (deflated 88%)
2025-12-04T17:26:27.8553966Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-08c28ac73e77007a.json (deflated 88%)
2025-12-04T17:26:27.8555995Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df1b42bf8f6cd06e.json (deflated 88%)
2025-12-04T17:26:27.8558019Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-97d6c66aee44b097.json (deflated 88%)
2025-12-04T17:26:27.8560003Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-232f2d4b09cdec77.json (deflated 88%)
2025-12-04T17:26:27.8562066Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6add3d31a0a55a66.json (deflated 88%)
2025-12-04T17:26:27.8564132Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fa52f41f0c0be4e5.json (deflated 88%)
2025-12-04T17:26:27.8566325Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-38b24c1b21208356.json (deflated 88%)
2025-12-04T17:26:27.8568258Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b1ae24833396f782.json (deflated 88%)
2025-12-04T17:26:27.8570391Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-80996ba6b8c32f81.json (deflated 88%)
2025-12-04T17:26:27.8572555Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8b26ba548538abde.json (deflated 88%)
2025-12-04T17:26:27.8574552Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d73817a3e5f02a06.json (deflated 88%)
2025-12-04T17:26:27.8578145Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-90e37d7f0968dad1.json (deflated 92%)
2025-12-04T17:26:27.8580116Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6ba281452d587f38.json (deflated 88%)
2025-12-04T17:26:27.8582068Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-85d1d6e9267cc116.json (deflated 88%)
2025-12-04T17:26:27.8584037Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7a610c26dd7fa0e9.json (deflated 88%)
2025-12-04T17:26:27.8585963Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-269f6089cafc9f3b.json (deflated 88%)
2025-12-04T17:26:27.8587877Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f11fe18ee197cc1f.json (deflated 88%)
2025-12-04T17:26:27.8590321Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0b8acd36d7258295.json (deflated 89%)
2025-12-04T17:26:27.8592369Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-babe12520ea62fea.json (deflated 88%)
2025-12-04T17:26:27.8594258Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-08a6bb29b776e6ca.json (deflated 88%)
2025-12-04T17:26:27.8596160Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ef8db3fa00c6c1d7.json (deflated 88%)
2025-12-04T17:26:27.8598002Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7a3ac84fc91fa02b.json (deflated 88%)
2025-12-04T17:26:27.8599976Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e162f70cb76e49ff.json (deflated 88%)
2025-12-04T17:26:27.8602416Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e70a5c274fb86b8e.json (deflated 89%)
2025-12-04T17:26:27.8604520Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0c17434f07767682.json (deflated 89%)
2025-12-04T17:26:27.8606573Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3815c1aa47a06d85.json (deflated 89%)
2025-12-04T17:26:27.8608657Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-69850f25ab7699fd.json (deflated 89%)
2025-12-04T17:26:27.8610725Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-da23a1d59c747be6.json (deflated 89%)
2025-12-04T17:26:27.8612898Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-36993cd4956a89fe.json (deflated 89%)
2025-12-04T17:26:27.8614965Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-78153e5fcd212bc6.json (deflated 89%)
2025-12-04T17:26:27.8617155Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-04b538cf09549803.json (deflated 89%)
2025-12-04T17:26:27.8619198Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d91f9f6b0d5ec125.json (deflated 89%)
2025-12-04T17:26:27.8621280Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ebbb316cfb6210df.json (deflated 89%)
2025-12-04T17:26:27.8623330Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6f1b13e751374b5d.json (deflated 89%)
2025-12-04T17:26:27.8625395Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f2c09e3279cd971a.json (deflated 89%)
2025-12-04T17:26:27.8627387Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c0e3bac2edd6805.json (deflated 88%)
2025-12-04T17:26:27.8629030Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1c004f486086cbb5.json (deflated 88%)
2025-12-04T17:26:27.8631022Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5e05e7b060f911b0.json (deflated 88%)
2025-12-04T17:26:27.8635548Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4d9849432c7f5caf.json (deflated 98%)
2025-12-04T17:26:27.8636717Z   adding: test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-d1e6e78eb0372411.json (deflated 83%)
2025-12-04T17:26:27.8637964Z   adding: test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-0e0432c8246f889e.json (deflated 83%)
2025-12-04T17:26:27.8639197Z   adding: test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-1d824658578ee605.json (deflated 83%)
2025-12-04T17:26:27.8640436Z   adding: test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-a0141b45c0b55065.json (deflated 85%)
2025-12-04T17:26:27.8663834Z   adding: test/test-reports/python-pytest/inductor.test_triton_kernels/inductor.test_triton_kernels-498ce8e3e7c25595.json (deflated 95%)
2025-12-04T17:26:27.8667385Z   adding: test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-264346bf50f4314b.json (deflated 87%)
2025-12-04T17:26:27.8668775Z   adding: test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-f4f4ac9590e83730.json (deflated 87%)
2025-12-04T17:26:27.8670204Z   adding: test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-193da166d9e268ac.json (deflated 87%)
2025-12-04T17:26:27.8672202Z   adding: test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-d4eac70f931f6c8b.json (deflated 94%)
2025-12-04T17:26:27.8775181Z   adding: test/test-reports/python-pytest/export.test_serdes/export.test_serdes-191fd84c43c29743.json (deflated 95%)
2025-12-04T17:26:27.8776989Z   adding: test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-7c3220f8bc842d2f.json (deflated 84%)
2025-12-04T17:26:27.8778170Z   adding: test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-7281b232f81c7a26.json (deflated 87%)
2025-12-04T17:26:27.8779343Z   adding: test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-0c21d337a20b3a01.json (deflated 87%)
2025-12-04T17:26:27.8780518Z   adding: test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-62a9a8755ec319d6.json (deflated 84%)
2025-12-04T17:26:27.8786043Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-b1ca468dab29d0d8.json (deflated 95%)
2025-12-04T17:26:27.8788321Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-69f64b5320fd797d.json (deflated 91%)
2025-12-04T17:26:27.8790618Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-e41e403fca9b1188.json (deflated 91%)
2025-12-04T17:26:27.8792078Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-07f86488d4cce1d3.json (deflated 90%)
2025-12-04T17:26:27.8796140Z   adding: test/test-reports/python-pytest/inductor.test_padding/inductor.test_padding-be250a10b53bb058.json (deflated 91%)
2025-12-04T17:26:27.8797696Z   adding: test/test-reports/python-pytest/dynamo.test_aot_compile/dynamo.test_aot_compile-10a88b68c9603fe3.json (deflated 88%)
2025-12-04T17:26:27.8800490Z   adding: test/test-reports/python-pytest/dynamo.test_sets/dynamo.test_sets-f0cb58e83c4ea8ef.json (deflated 94%)
2025-12-04T17:26:27.8802802Z   adding: test/test-reports/python-pytest/dynamo.test_wrap_inductor_compiled_regions/dynamo.test_wrap_inductor_compiled_regions-2f1d9c362e038030.json (deflated 87%)
2025-12-04T17:26:27.8834101Z   adding: test/test-reports/python-pytest/test_sparse/test_sparse-598e6683c5cfc22a.json (deflated 97%)
2025-12-04T17:26:27.8846642Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-5879e0e26736617e.json (deflated 95%)
2025-12-04T17:26:27.8859043Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-c4519c63d1395608.json (deflated 95%)
2025-12-04T17:26:27.8871678Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-fd1a91e45a41098b.json (deflated 95%)
2025-12-04T17:26:27.8916014Z   adding: test/test-reports/python-pytest/test_ops_fwd_gradients/test_ops_fwd_gradients-dac273fbaf67ad10.json (deflated 97%)
2025-12-04T17:26:27.9089659Z   adding: test/test-reports/python-pytest/test_meta/test_meta-cbc50d7c3e0b1b6a.json (deflated 97%)
2025-12-04T17:26:27.9106082Z   adding: test/test-reports/python-pytest/test_ops_jit/test_ops_jit-2f4faab6a29e642c.json (deflated 95%)
2025-12-04T17:26:27.9138325Z   adding: test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-8372b6917771ca4c.json (deflated 98%)
2025-12-04T17:26:27.9218030Z   adding: test/test-reports/python-pytest/test_ops/test_ops-d95bfbe57b5d2d89.json (deflated 96%)
2025-12-04T17:26:27.9309669Z   adding: test/test-reports/python-pytest/test_ops/test_ops-75f8d45594e24741.json (deflated 97%)
2025-12-04T17:26:27.9311168Z   adding: test/test-reports/python-pytest/functorch.test_dims/functorch.test_dims-e2a9e671430fd99e.json (deflated 93%)
2025-12-04T17:26:27.9354780Z   adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-caabf5583dae6043.json (deflated 95%)
2025-12-04T17:26:27.9398771Z   adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-b6190fae5240f1fb.json (deflated 95%)
2025-12-04T17:26:27.9414526Z   adding: test/test-reports/python-pytest/inductor.test_cpu_repro/inductor.test_cpu_repro-e45fcbaf6c1a2b2c.json (deflated 97%)
2025-12-04T17:26:27.9415859Z   adding: test/test-reports/python-pytest/inductor.test_custom_lowering/inductor.test_custom_lowering-f90a8c2a1b7dd9b0.json (deflated 83%)
2025-12-04T17:26:27.9422010Z   adding: test/test-reports/python-pytest/inductor.test_perf/inductor.test_perf-34de9a09a2935f8d.json (deflated 93%)
2025-12-04T17:26:27.9423246Z   adding: test/test-reports/python-pytest/inductor.test_binary_folding/inductor.test_binary_folding-0c797ad2be676af7.json (deflated 83%)
2025-12-04T17:26:27.9430600Z   adding: test/test-reports/python-pytest/inductor.test_mkldnn_pattern_matcher/inductor.test_mkldnn_pattern_matcher-c93031a5b8f8293d.json (deflated 95%)
2025-12-04T17:26:27.9443677Z   adding: test/test-reports/python-pytest/inductor.test_gpu_cpp_wrapper/inductor.test_gpu_cpp_wrapper-c206afd337165094.json (deflated 95%)
2025-12-04T17:26:27.9445055Z   adding: test/test-reports/python-pytest/inductor.test_cutedsl_template/inductor.test_cutedsl_template-1780c0291e7a0397.json (deflated 92%)
2025-12-04T17:26:27.9446447Z   adding: test/test-reports/python-pytest/inductor.test_benchmark_fusion/inductor.test_benchmark_fusion-33e3c50f2f02127c.json (deflated 84%)
2025-12-04T17:26:27.9453616Z   adding: test/test-reports/python-pytest/dynamo.test_modules/dynamo.test_modules-f3674dc870090d50.json (deflated 91%)
2025-12-04T17:26:27.9454935Z   adding: test/test-reports/python-pytest/dynamo.test_recompiles/dynamo.test_recompiles-755ec9793479e2dd.json (deflated 82%)
2025-12-04T17:26:27.9456158Z   adding: test/test-reports/python-pytest/export.test_tree_utils/export.test_tree_utils-4b33de82582b2e92.json (deflated 61%)
2025-12-04T17:26:27.9458125Z   adding: test/test-reports/python-pytest/inductor.test_triton_wrapper/inductor.test_triton_wrapper-7697274370716365.json (deflated 51%)
2025-12-04T17:26:27.9459523Z   adding: test/test-reports/python-pytest/inductor.test_static_cuda_launcher/inductor.test_static_cuda_launcher-96effba66b878950.json (deflated 90%)
2025-12-04T17:26:27.9460878Z   adding: test/test-reports/python-pytest/export.test_dynamic_shapes/export.test_dynamic_shapes-6f817f896f94c83c.json (deflated 63%)
2025-12-04T17:26:27.9462223Z   adding: test/test-reports/python-pytest/dynamo.test_sdpa/dynamo.test_sdpa-3e0149796a415876.json (deflated 79%)
2025-12-04T17:26:27.9463340Z   adding: test/test-reports/python-pytest/dynamo.test_utils/dynamo.test_utils-e6d94f5c34c685f8.json (deflated 84%)
2025-12-04T17:26:27.9464563Z   adding: test/test-reports/python-pytest/inductor.test_codegen_triton/inductor.test_codegen_triton-f741c3b21cf28e3b.json (deflated 36%)
2025-12-04T17:26:27.9465823Z   adding: test/test-reports/python-pytest/dynamo.test_frame_init/dynamo.test_frame_init-c2e1024fb8a07387.json (deflated 37%)
2025-12-04T17:26:27.9467084Z   adding: test/test-reports/python-pytest/inductor.test_device_assert/inductor.test_device_assert-451c7142fcd9d62b.json (deflated 84%)
2025-12-04T17:26:27.9468387Z   adding: test/test-reports/python-pytest/dynamo.test_skip_non_tensor/dynamo.test_skip_non_tensor-f190ace25428cb94.json (deflated 80%)
2025-12-04T17:26:27.9469757Z   adding: test/test-reports/python-pytest/dynamo.test_skip_guard_eval_unsafe/dynamo.test_skip_guard_eval_unsafe-aa1ded9d0a4e400e.json (deflated 80%)
2025-12-04T17:26:27.9471389Z   adding: test/test-reports/python-pytest/inductor.test_control_deps/inductor.test_control_deps-23047419ffe03376.json (deflated 50%)
2025-12-04T17:26:27.9472702Z   adding: test/test-reports/python-pytest/inductor.test_benchmarking/inductor.test_benchmarking-53f04a03954d2058.json (deflated 91%)
2025-12-04T17:26:27.9474039Z   adding: test/test-reports/python-pytest/inductor.test_helion_kernels/inductor.test_helion_kernels-0df86f8cd24ea26a.json (deflated 69%)
2025-12-04T17:26:27.9475373Z   adding: test/test-reports/python-pytest/inductor.test_quantization/inductor.test_quantization-951156711359c867.json (deflated 66%)
2025-12-04T17:26:27.9476582Z   adding: test/test-reports/python-pytest/export.test_tools/export.test_tools-c033e9415dabe65c.json (deflated 56%)
2025-12-04T17:26:27.9484004Z   adding: test/test-reports/python-pytest/inductor.test_compiled_optimizers/inductor.test_compiled_optimizers-1745c9b9e5fc7ed3.json (deflated 96%)
2025-12-04T17:26:27.9485532Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor_utils/inductor.test_aot_inductor_utils-e7355f16ccb52d23.json (stored 0%)
2025-12-04T17:26:27.9504754Z   adding: test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-0b2081966a192cef.json (deflated 97%)
2025-12-04T17:26:27.9509446Z   adding: test/test-reports/python-pytest/inductor.test_minifier_isolate/inductor.test_minifier_isolate-f50615d1a1981661.json (deflated 93%)
2025-12-04T17:26:27.9636017Z   adding: test/test-reports/python-pytest/dynamo.test_error_messages/dynamo.test_error_messages-36d8e363c2770c16.json (deflated 95%)
2025-12-04T17:26:27.9637334Z   adding: test/test-reports/python-pytest/dynamo.test_fake_distributed/dynamo.test_fake_distributed-b0f5d6fe6c345e8f.json (deflated 73%)
2025-12-04T17:26:27.9638594Z   adding: test/test-reports/python-pytest/dynamo.test_tree_map/dynamo.test_tree_map-39d9c68e899fe910.json (deflated 93%)
2025-12-04T17:26:27.9647405Z   adding: test/test-reports/python-pytest/dynamo.test_minifier/dynamo.test_minifier-9124cc51e1c5e7b6.json (deflated 94%)
2025-12-04T17:26:27.9649006Z   adding: test/test-reports/python-pytest/dynamo.test_guard_manager/dynamo.test_guard_manager-f0dd8a549f18516b.json (deflated 93%)
2025-12-04T17:26:27.9650251Z   adding: test/test-reports/python-pytest/export.test_schema/export.test_schema-98e7fce7714746ab.json (deflated 82%)
2025-12-04T17:26:27.9651630Z   adding: test/test-reports/python-pytest/export.test_pass_infra/export.test_pass_infra-0489f34d1d482c78.json (deflated 81%)
2025-12-04T17:26:27.9652885Z   adding: test/test-reports/python-pytest/dynamo.test_recompile_ux/dynamo.test_recompile_ux-5436245cbc75fddd.json (deflated 82%)
2025-12-04T17:26:27.9654151Z   adding: test/test-reports/python-pytest/export.test_experimental/export.test_experimental-4743e9a7200af635.json (deflated 90%)
2025-12-04T17:26:27.9657043Z   adding: test/test-reports/python-pytest/export.test_converter/export.test_converter-a6e4e9ebcfaea6df.json (deflated 92%)
2025-12-04T17:26:27.9658482Z   adding: test/test-reports/python-pytest/dynamo.test_reorder_logs/dynamo.test_reorder_logs-d530254831fe0a21.json (deflated 87%)
2025-12-04T17:26:27.9664185Z   adding: test/test-reports/python-pytest/dynamo.test_subclasses/dynamo.test_subclasses-90ae20717b7fd572.json (deflated 93%)
2025-12-04T17:26:27.9665480Z   adding: test/test-reports/python-pytest/dynamo.test_python_autograd/dynamo.test_python_autograd-b76a60537c2ba691.json (deflated 83%)
2025-12-04T17:26:27.9666780Z   adding: test/test-reports/python-pytest/export.test_draft_export/export.test_draft_export-0c8a812115433a7d.json (deflated 89%)
2025-12-04T17:26:27.9669365Z   adding: test/test-reports/python-pytest/test_package/test_package-523d81f0792170f1.json (deflated 93%)
2025-12-04T17:26:27.9670407Z   adding: test/test-reports/python-pytest/test_mkl_verbose/test_mkl_verbose-874cbf06946f8b3e.json (deflated 64%)
2025-12-04T17:26:27.9671686Z   adding: test/test-reports/python-pytest/test_comparison_utils/test_comparison_utils-ce770324779d51b3.json (deflated 87%)
2025-12-04T17:26:27.9672939Z   adding: test/test-reports/python-pytest/functorch.test_ac_logging/functorch.test_ac_logging-f1c79a1c8c74be66.json (deflated 77%)
2025-12-04T17:26:27.9674144Z   adding: test/test-reports/python-pytest/test_mkldnn_verbose/test_mkldnn_verbose-e983273d29ed8e1e.json (deflated 64%)
2025-12-04T17:26:27.9681193Z   adding: test/test-reports/python-pytest/test_cpp_api_parity/test_cpp_api_parity-c6b7300fef8db168.json (deflated 97%)
2025-12-04T17:26:27.9682304Z   adding: test/test-reports/python-pytest/test_autoload/test_autoload-21f1eacf8f4a4d28.json (deflated 36%)
2025-12-04T17:26:27.9683519Z   adding: test/test-reports/python-pytest/nn.attention.test_open_registry/nn.attention.test_open_registry-bacfee0084c93992.json (deflated 63%)
2025-12-04T17:26:27.9684744Z   adding: test/test-reports/python-pytest/test_as_strided/test_as_strided-4555079064233d7d.json (deflated 61%)
2025-12-04T17:26:27.9744528Z   adding: test/test-reports/python-pytest/test_foreach/test_foreach-aa4419a4e7b6d381.json (deflated 98%)
2025-12-04T17:26:27.9745687Z   adding: test/test-reports/python-pytest/xpu.test_gemm/xpu.test_gemm-2cb9cf39de6aa2cf.json (stored 0%)
2025-12-04T17:26:27.9746737Z   adding: test/test-reports/python-pytest/test_numpy_interop/test_numpy_interop-660870d95235d56d.json (deflated 93%)
2025-12-04T17:26:27.9747923Z   adding: test/test-reports/python-pytest/profiler.test_cpp_thread/profiler.test_cpp_thread-31559e2ba96f64a3.json (deflated 85%)
2025-12-04T17:26:27.9748992Z   adding: test/test-reports/python-pytest/test_hub/test_hub-33a47573ff45c77e.json (deflated 86%)
2025-12-04T17:26:27.9751229Z   adding: test/test-reports/python-pytest/test_segment_reductions/test_segment_reductions-ad616dd6940e0de0.json (deflated 97%)
2025-12-04T17:26:27.9752465Z   adding: test/test-reports/python-pytest/test_autograd_fallback/test_autograd_fallback-e1a7bbd98afc63dc.json (deflated 94%)
2025-12-04T17:26:27.9753602Z   adding: test/test-reports/python-pytest/test_type_hints/test_type_hints-d14fd0906e097d86.json (deflated 58%)
2025-12-04T17:26:27.9754937Z   adding: test/test-reports/python-pytest/functorch.test_aot_joint_with_descriptors/functorch.test_aot_joint_with_descriptors-79fd9b229bc0c00b.json (deflated 91%)
2025-12-04T17:26:27.9756315Z   adding: test/test-reports/python-pytest/test_fx_reinplace_pass/test_fx_reinplace_pass-047146b9ff22e4f6.json (deflated 87%)
2025-12-04T17:26:27.9781852Z   adding: test/test-reports/python-pytest/functorch.test_control_flow/functorch.test_control_flow-922a9914156e0312.json (deflated 97%)
2025-12-04T17:26:27.9783586Z   adding: test/test-reports/python-pytest/test_subclass/test_subclass-68565895e4fc66ea.json (deflated 96%)
2025-12-04T17:26:27.9827249Z   adding: test/test-reports/python-pytest/functorch.test_vmap_registrations/functorch.test_vmap_registrations-40d5b566ee6986dc.json (deflated 98%)
2025-12-04T17:26:27.9828783Z   adding: test/test-reports/python-pytest/nn.test_parametrization/nn.test_parametrization-ed4e97080833ff92.json (deflated 94%)
2025-12-04T17:26:27.9846807Z   adding: test/test-reports/python-pytest/test_dynamic_shapes/test_dynamic_shapes-07075f000d166d21.json (deflated 95%)
2025-12-04T17:26:27.9847901Z   adding: test/test-reports/python-pytest/test_dispatch/test_dispatch-bf1fd68f7abb7228.json (deflated 93%)
2025-12-04T17:26:27.9849031Z   adding: test/test-reports/python-pytest/test_numba_integration/test_numba_integration-edcc49db775b9990.json (deflated 87%)
2025-12-04T17:26:27.9850237Z   adding: test/test-reports/python-pytest/test_functional_optim/test_functional_optim-389cbc1bb3d61470.json (deflated 79%)
2025-12-04T17:26:27.9866776Z   adding: test/test-reports/python-pytest/test_maskedtensor/test_maskedtensor-1089c4e953521eec.json (deflated 97%)
2025-12-04T17:26:27.9868092Z   adding: test/test-reports/python-pytest/benchmark_utils.test_benchmark_utils/benchmark_utils.test_benchmark_utils-76c10c33afe299c4.json (deflated 87%)
2025-12-04T17:26:27.9898683Z   adding: test/test-reports/python-pytest/test_scaled_matmul_cuda/test_scaled_matmul_cuda-d1f8763e6c1869e6.json (deflated 99%)
2025-12-04T17:26:27.9902024Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_shape_base/torch_np.numpy_tests.core.test_shape_base-b9eed7c143bc9bc3.json (deflated 97%)
2025-12-04T17:26:27.9903301Z   adding: test/test-reports/python-pytest/test_vulkan/test_vulkan-b25d187bf3baa78a.json (deflated 44%)
2025-12-04T17:26:27.9904355Z   adding: test/test-reports/python-pytest/lazy.test_generator/lazy.test_generator-42072a3593c4e25d.json (deflated 63%)
2025-12-04T17:26:27.9909537Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.linalg.test_linalg/torch_np.numpy_tests.linalg.test_linalg-2974f2048ff6a577.json (deflated 97%)
2025-12-04T17:26:27.9912999Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_dtype/torch_np.numpy_tests.core.test_dtype-4b8c4285965a7813.json (deflated 97%)
2025-12-04T17:26:27.9914324Z   adding: test/test-reports/python-pytest/lazy.test_debug_util/lazy.test_debug_util-7c02b1e3dfee61bd.json (deflated 33%)
2025-12-04T17:26:27.9915598Z   adding: test/test-reports/python-pytest/nn.test_load_state_dict/nn.test_load_state_dict-e81d6ed8d3f8789f.json (deflated 94%)
2025-12-04T17:26:27.9916707Z   adding: test/test-reports/python-pytest/test_shape_ops/test_shape_ops-a6160583c0856270.json (deflated 96%)
2025-12-04T17:26:27.9918328Z   adding: test/test-reports/python-pytest/nn.test_module_hooks/nn.test_module_hooks-e13d4f4eb9af9666.json (deflated 92%)
2025-12-04T17:26:27.9919765Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_twodim_base/torch_np.numpy_tests.lib.test_twodim_base-2da66c446de8da89.json (deflated 93%)
2025-12-04T17:26:27.9921255Z   adding: test/test-reports/python-pytest/profiler.test_memory_profiler/profiler.test_memory_profiler-20f5e2eefecacaee.json (deflated 88%)
2025-12-04T17:26:27.9923836Z   adding: test/test-reports/python-pytest/test_jit_llga_fuser/test_jit_llga_fuser-b203cab2c461ce78.json (deflated 96%)
2025-12-04T17:26:27.9925181Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_getlimits/torch_np.numpy_tests.core.test_getlimits-5149534e2555ec6f.json (deflated 91%)
2025-12-04T17:26:27.9932214Z   adding: test/test-reports/python-pytest/torch_np.test_ndarray_methods/torch_np.test_ndarray_methods-fe7c638b86097b2d.json (deflated 98%)
2025-12-04T17:26:27.9937924Z   adding: test/test-reports/python-pytest/test_view_ops/test_view_ops-f5d6b3525797eb50.json (deflated 95%)
2025-12-04T17:26:27.9939031Z   adding: test/test-reports/python-pytest/test_type_info/test_type_info-22600993e111f6f2.json (deflated 83%)
2025-12-04T17:26:27.9960126Z   adding: test/test-reports/python-pytest/functorch.test_aotdispatch/functorch.test_aotdispatch-efb7e0b79840fa38.json (deflated 95%)
2025-12-04T17:26:27.9961838Z   adding: test/test-reports/python-pytest/test_scatter_gather_ops/test_scatter_gather_ops-5274bd99c8a0619f.json (deflated 95%)
2025-12-04T17:26:27.9964412Z   adding: test/test-reports/python-pytest/test_cuda_multigpu/test_cuda_multigpu-a9a26e79d8868522.json (deflated 94%)
2025-12-04T17:26:27.9966016Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_index_tricks/torch_np.numpy_tests.lib.test_index_tricks-43f32c31fbfc43cd.json (deflated 94%)
2025-12-04T17:26:27.9967673Z   adding: test/test-reports/python-pytest/test_jit_autocast/test_jit_autocast-9b5e22ff1077135a.json (deflated 91%)
2025-12-04T17:26:27.9971594Z   adding: test/test-reports/python-pytest/nn.test_pooling/nn.test_pooling-2151df52b065bbdf.json (deflated 96%)
2025-12-04T17:26:27.9975364Z   adding: test/test-reports/python-pytest/nn.test_embedding/nn.test_embedding-d055fd5d393643fe.json (deflated 97%)
2025-12-04T17:26:27.9976618Z   adding: test/test-reports/python-pytest/test_xnnpack_integration/test_xnnpack_integration-ed8e38bda9a33f4f.json (deflated 88%)
2025-12-04T17:26:27.9977789Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-5e38c3c197506de5.json (deflated 36%)
2025-12-04T17:26:27.9978840Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-4795d7c5159b6e03.json (deflated 33%)
2025-12-04T17:26:27.9979878Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-6c76df2a5666e90f.json (deflated 34%)
2025-12-04T17:26:27.9980929Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-c93df3ae687a8e58.json (deflated 34%)
2025-12-04T17:26:27.9981984Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-524a9565fc6ac576.json (deflated 34%)
2025-12-04T17:26:27.9983026Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-b18b3c9d4ddc6b34.json (deflated 34%)
2025-12-04T17:26:27.9984052Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-319074c2014cbf3e.json (deflated 34%)
2025-12-04T17:26:27.9985082Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-2483ba726355768c.json (deflated 33%)
2025-12-04T17:26:27.9986131Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-f2eed9c29ea8eac7.json (deflated 34%)
2025-12-04T17:26:27.9987289Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-c778cc218c519690.json (deflated 33%)
2025-12-04T17:26:27.9988315Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-7897a7f1d03cdaa3.json (deflated 34%)
2025-12-04T17:26:27.9989356Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-d4fb26045e698199.json (deflated 33%)
2025-12-04T17:26:28.0006113Z   adding: test/test-reports/python-pytest/torch_np.test_reductions/torch_np.test_reductions-73a2026a6cdfd4dc.json (deflated 98%)
2025-12-04T17:26:28.0007799Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_scalar_ctors/torch_np.numpy_tests.core.test_scalar_ctors-e23576bcb06b5d61.json (deflated 97%)
2025-12-04T17:26:28.0009339Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_arraypad/torch_np.numpy_tests.lib.test_arraypad-f4e46a1506be78e1.json (deflated 87%)
2025-12-04T17:26:28.0010589Z   adding: test/test-reports/python-pytest/test_prims/test_prims-38188698633a9bb5.json (deflated 89%)
2025-12-04T17:26:28.0016556Z   adding: test/test-reports/python-pytest/test_spectral_ops/test_spectral_ops-ae86fbbf23286ef9.json (deflated 96%)
2025-12-04T17:26:28.0017789Z   adding: test/test-reports/python-pytest/test_cpp_extensions_aot_ninja/test_cpp_extensions_aot_ninja-5c9ab2f003415ced.json (deflated 90%)
2025-12-04T17:26:28.0019147Z   adding: test/test-reports/python-pytest/test_cpp_extensions_aot_no_ninja/test_cpp_extensions_aot_no_ninja-dc0b3ab1cc30279c.json (deflated 90%)
2025-12-04T17:26:28.0023284Z   adding: test/test-reports/td_exclusions-c770a75ce015dc406ab1.json (deflated 82%)
2025-12-04T17:26:28.0024314Z   adding: test/test-reports/python-unittest/test_autoload/TEST-TestDeviceBackendAutoload-20251204172052.json (deflated 37%)
2025-12-04T17:26:28.0056530Z ##[group]Run # Remove any previous test reports if they exist
2025-12-04T17:26:28.0057102Z [36;1m# Remove any previous test reports if they exist[0m
2025-12-04T17:26:28.0057746Z [36;1mrm -f test-reports-*.zip[0m
2025-12-04T17:26:28.0058311Z [36;1mzip -r "test-reports-${FILE_SUFFIX}.zip" test/test-reports -i '*.xml' -i '*.csv'[0m
2025-12-04T17:26:28.0065447Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T17:26:28.0065888Z env:
2025-12-04T17:26:28.0066144Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:28.0066463Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:28.0066823Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:28.0067485Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:28.0068297Z   FILE_SUFFIX: test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248
2025-12-04T17:26:28.0068883Z ##[endgroup]
2025-12-04T17:26:28.0208188Z   adding: test/test-reports/python-pytest/lazy.test_ts_opinfo/lazy.test_ts_opinfo-8eadd60536af3632.xml (deflated 62%)
2025-12-04T17:26:28.0215016Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-f2c58a9dfc31919e.xml (deflated 92%)
2025-12-04T17:26:28.0217311Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-a793ea186f6e0edb.xml (deflated 90%)
2025-12-04T17:26:28.0219367Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-fcd1db8f24799401.xml (deflated 90%)
2025-12-04T17:26:28.0222741Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-bb08f25297cc596b.xml (deflated 92%)
2025-12-04T17:26:28.0227955Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-bf15e775351f3d84.xml (deflated 90%)
2025-12-04T17:26:28.0239045Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-cd1c50b62bb47a1b.xml (deflated 93%)
2025-12-04T17:26:28.0275051Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-3e5313e420476f15.xml (deflated 93%)
2025-12-04T17:26:28.0277296Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-b23b654b51890d24.xml (deflated 89%)
2025-12-04T17:26:28.0278589Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-2e7c8f13f7be0603.xml (deflated 91%)
2025-12-04T17:26:28.0279866Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-8d6cdce6581fa448.xml (deflated 91%)
2025-12-04T17:26:28.0281156Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-9dce38c1d023996d.xml (deflated 91%)
2025-12-04T17:26:28.0282430Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-b570798f966501a4.xml (deflated 90%)
2025-12-04T17:26:28.0283705Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-be9e2a318f1480ff.xml (deflated 90%)
2025-12-04T17:26:28.0284972Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-e62e290dfdad5699.xml (deflated 91%)
2025-12-04T17:26:28.0293738Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-0c75da116b2f10f8.xml (deflated 93%)
2025-12-04T17:26:28.0296125Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-fd0863b8a222871a.xml (deflated 87%)
2025-12-04T17:26:28.0298545Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-6fcb35b3fc35a71c.xml (deflated 87%)
2025-12-04T17:26:28.0304087Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-f8b2416e9d43ac69.xml (deflated 93%)
2025-12-04T17:26:28.0308016Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-8ad43f769763d7e0.xml (deflated 92%)
2025-12-04T17:26:28.0312322Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-6495f5d67df68869.xml (deflated 92%)
2025-12-04T17:26:28.0316391Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-f9f6352517dfd8be.xml (deflated 92%)
2025-12-04T17:26:28.0320765Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-25ab0fa1230b07b5.xml (deflated 92%)
2025-12-04T17:26:28.0323246Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-74cab4bdcde89184.xml (deflated 85%)
2025-12-04T17:26:28.0325706Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-77e37a2f8b75b3d9.xml (deflated 84%)
2025-12-04T17:26:28.0328269Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3ba19b390afd5854.xml (deflated 84%)
2025-12-04T17:26:28.0330723Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4ad317a243ecdd30.xml (deflated 85%)
2025-12-04T17:26:28.0333251Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f482798b2b39d897.xml (deflated 84%)
2025-12-04T17:26:28.0335796Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cbe2514f89eef609.xml (deflated 84%)
2025-12-04T17:26:28.0338374Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3707d31910126ebf.xml (deflated 85%)
2025-12-04T17:26:28.0341029Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dedaec5daecec784.xml (deflated 84%)
2025-12-04T17:26:28.0343618Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2f4f0e9c4ac682e4.xml (deflated 84%)
2025-12-04T17:26:28.0346092Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-580d25229e34cb07.xml (deflated 85%)
2025-12-04T17:26:28.0348547Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9d15e1ab064c4537.xml (deflated 84%)
2025-12-04T17:26:28.0351109Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e6d909bcc6975bf8.xml (deflated 84%)
2025-12-04T17:26:28.0353731Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0a612698d44183a1.xml (deflated 85%)
2025-12-04T17:26:28.0356307Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-90f2ceb88314c75a.xml (deflated 84%)
2025-12-04T17:26:28.0358878Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9644b19a5203c0ee.xml (deflated 84%)
2025-12-04T17:26:28.0361366Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a6f999e52eb1904.xml (deflated 85%)
2025-12-04T17:26:28.0363837Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-547414903ca204e9.xml (deflated 84%)
2025-12-04T17:26:28.0366492Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-41f0f199b083e6d2.xml (deflated 84%)
2025-12-04T17:26:28.0369033Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-438f9d52209526cc.xml (deflated 85%)
2025-12-04T17:26:28.0371711Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98df9406c6e0faf3.xml (deflated 84%)
2025-12-04T17:26:28.0373183Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f706cf73cc88a5b8.xml (deflated 84%)
2025-12-04T17:26:28.0374650Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c67f05de6c39b0d8.xml (deflated 85%)
2025-12-04T17:26:28.0376117Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0e30a339afee7d22.xml (deflated 84%)
2025-12-04T17:26:28.0377661Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4a14a2e6be65f97f.xml (deflated 84%)
2025-12-04T17:26:28.0379115Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82a3db4b14f41cd2.xml (deflated 85%)
2025-12-04T17:26:28.0380586Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a02c7191ab69f431.xml (deflated 84%)
2025-12-04T17:26:28.0382056Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e37b8ebc7938792f.xml (deflated 84%)
2025-12-04T17:26:28.0383525Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ee37665d187f9309.xml (deflated 85%)
2025-12-04T17:26:28.0384974Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-511047743df1b08e.xml (deflated 84%)
2025-12-04T17:26:28.0386427Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4d9221d5ac70ff44.xml (deflated 84%)
2025-12-04T17:26:28.0387927Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-af9a500a606c950b.xml (deflated 85%)
2025-12-04T17:26:28.0389967Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e3ba96547605fc4e.xml (deflated 84%)
2025-12-04T17:26:28.0391942Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ce470e45644e1cc6.xml (deflated 84%)
2025-12-04T17:26:28.0393730Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cece0bb00c5477e6.xml (deflated 85%)
2025-12-04T17:26:28.0395370Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4e672e5e3ae6046c.xml (deflated 84%)
2025-12-04T17:26:28.0396831Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-65775801d71c7290.xml (deflated 84%)
2025-12-04T17:26:28.0398292Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ed754aaaf490f98.xml (deflated 85%)
2025-12-04T17:26:28.0399752Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-36a3a8a6a9d0a436.xml (deflated 84%)
2025-12-04T17:26:28.0401220Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f55b1076fbed9be9.xml (deflated 84%)
2025-12-04T17:26:28.0403739Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6062b5e411b734f8.xml (deflated 85%)
2025-12-04T17:26:28.0405197Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-21fa07c752a411ad.xml (deflated 84%)
2025-12-04T17:26:28.0406657Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c09ee7a97c51183.xml (deflated 84%)
2025-12-04T17:26:28.0408263Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c713b1dc3f3923ec.xml (deflated 28%)
2025-12-04T17:26:28.0409721Z   adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-84a2c5e5cdda7bdd.xml (deflated 95%)
2025-12-04T17:26:28.0411174Z   adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-97e49e1b6070e822.xml (deflated 92%)
2025-12-04T17:26:28.0412606Z   adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-aaac502093c587a7.xml (deflated 92%)
2025-12-04T17:26:28.0414596Z   adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-decce829c4432557.xml (deflated 94%)
2025-12-04T17:26:28.0416432Z   adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-491de48d6c983340.xml (deflated 73%)
2025-12-04T17:26:28.0418073Z   adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-35b1cdd46f4129e6.xml (deflated 96%)
2025-12-04T17:26:28.0419446Z   adding: test/test-reports/python-pytest/inductor.test_flex_decoding/inductor.test_flex_decoding-4523fe803428b665.xml (deflated 28%)
2025-12-04T17:26:28.0420774Z   adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-ccc55353a2e77d8f.xml (deflated 90%)
2025-12-04T17:26:28.0422656Z   adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-cbc1aeff512c7b0d.xml (deflated 93%)
2025-12-04T17:26:28.0427233Z   adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-b35d65d1a2e42e4e.xml (deflated 93%)
2025-12-04T17:26:28.0428804Z   adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-feba5ff46dbc30dd.xml (deflated 72%)
2025-12-04T17:26:28.0430409Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dff864e79f1bf91b.xml (deflated 88%)
2025-12-04T17:26:28.0432135Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-053a0e10a178eff6.xml (deflated 88%)
2025-12-04T17:26:28.0433733Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-966288eeb3fe785e.xml (deflated 88%)
2025-12-04T17:26:28.0435467Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-47dd8058babbbd0d.xml (deflated 88%)
2025-12-04T17:26:28.0437486Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e92e228ccdafe934.xml (deflated 88%)
2025-12-04T17:26:28.0439472Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0328fb4bc2fb022d.xml (deflated 88%)
2025-12-04T17:26:28.0441469Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4ecceae3d20d3515.xml (deflated 88%)
2025-12-04T17:26:28.0443420Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-af3f0411f43ffff1.xml (deflated 88%)
2025-12-04T17:26:28.0445417Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5f646abfecfc34db.xml (deflated 88%)
2025-12-04T17:26:28.0447380Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f71fa45f6063b14.xml (deflated 88%)
2025-12-04T17:26:28.0449393Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3d881319a967678f.xml (deflated 88%)
2025-12-04T17:26:28.0451382Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7b45d70025cf6016.xml (deflated 88%)
2025-12-04T17:26:28.0453354Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b4a285d41fdad5fc.xml (deflated 88%)
2025-12-04T17:26:28.0455280Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9b24822b6f23300e.xml (deflated 88%)
2025-12-04T17:26:28.0457429Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-642548938a706c13.xml (deflated 88%)
2025-12-04T17:26:28.0459608Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3087aa3d89d0a96b.xml (deflated 89%)
2025-12-04T17:26:28.0461584Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5fb4c628c04a1cdc.xml (deflated 88%)
2025-12-04T17:26:28.0463488Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0d752e0bfa5071ea.xml (deflated 88%)
2025-12-04T17:26:28.0465442Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fedbc7df4b1c2869.xml (deflated 88%)
2025-12-04T17:26:28.0467390Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3cf6d62b643bfad7.xml (deflated 88%)
2025-12-04T17:26:28.0469405Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e2e744a24cd2751e.xml (deflated 88%)
2025-12-04T17:26:28.0471522Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6de5a411a3f65f82.xml (deflated 88%)
2025-12-04T17:26:28.0473551Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d2f3621583fff098.xml (deflated 88%)
2025-12-04T17:26:28.0475484Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-319cee3df6121e1a.xml (deflated 88%)
2025-12-04T17:26:28.0477476Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-452be63c68b4eb35.xml (deflated 88%)
2025-12-04T17:26:28.0479440Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5a49841d6a2b730b.xml (deflated 88%)
2025-12-04T17:26:28.0481418Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f1313a025d30dc09.xml (deflated 88%)
2025-12-04T17:26:28.0483412Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-03aedafc0832726c.xml (deflated 88%)
2025-12-04T17:26:28.0485399Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-89171bcc48f05a69.xml (deflated 88%)
2025-12-04T17:26:28.0487617Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6450e334481f0131.xml (deflated 88%)
2025-12-04T17:26:28.0489297Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f7999da795e3cf34.xml (deflated 86%)
2025-12-04T17:26:28.0496870Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ad7a38726bbc8b50.xml (deflated 96%)
2025-12-04T17:26:28.0504976Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b434424093647de3.xml (deflated 96%)
2025-12-04T17:26:28.0506896Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ccd966f4e119e833.xml (deflated 88%)
2025-12-04T17:26:28.0509860Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d16f18ba4de45d90.xml (deflated 90%)
2025-12-04T17:26:28.0512727Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4078dca354f1c797.xml (deflated 90%)
2025-12-04T17:26:28.0514881Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7591ded94ad5fda9.xml (deflated 88%)
2025-12-04T17:26:28.0516871Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4955a88ef6b89264.xml (deflated 88%)
2025-12-04T17:26:28.0518974Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2fae1650dec37ec0.xml (deflated 88%)
2025-12-04T17:26:28.0521002Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0893388d06071d35.xml (deflated 88%)
2025-12-04T17:26:28.0522991Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b62e3abe6013e6ef.xml (deflated 88%)
2025-12-04T17:26:28.0525070Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0fe50dbde6f69754.xml (deflated 88%)
2025-12-04T17:26:28.0527227Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1381d94cd6abec18.xml (deflated 88%)
2025-12-04T17:26:28.0529327Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2b9aebe063e8f7ef.xml (deflated 88%)
2025-12-04T17:26:28.0531132Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-649ba93d0ac5919c.xml (deflated 88%)
2025-12-04T17:26:28.0533179Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df60bd1ca7e6baab.xml (deflated 88%)
2025-12-04T17:26:28.0535316Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-927fdf8f8ff6280c.xml (deflated 88%)
2025-12-04T17:26:28.0537383Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f380c761dc75570.xml (deflated 88%)
2025-12-04T17:26:28.0539450Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db3aa4c2f1c0f2c1.xml (deflated 88%)
2025-12-04T17:26:28.0541474Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7c20e7902388541e.xml (deflated 88%)
2025-12-04T17:26:28.0543441Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43cf13c151388d8e.xml (deflated 88%)
2025-12-04T17:26:28.0545437Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-27661fe34019a4f8.xml (deflated 88%)
2025-12-04T17:26:28.0547553Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-63ef36c446edecf7.xml (deflated 88%)
2025-12-04T17:26:28.0549493Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-818cc5e6f257d295.xml (deflated 88%)
2025-12-04T17:26:28.0551479Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b552d5ebf2a766dc.xml (deflated 88%)
2025-12-04T17:26:28.0553566Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-08c28ac73e77007a.xml (deflated 88%)
2025-12-04T17:26:28.0555550Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df1b42bf8f6cd06e.xml (deflated 88%)
2025-12-04T17:26:28.0557605Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-97d6c66aee44b097.xml (deflated 88%)
2025-12-04T17:26:28.0559696Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-232f2d4b09cdec77.xml (deflated 88%)
2025-12-04T17:26:28.0561582Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6add3d31a0a55a66.xml (deflated 88%)
2025-12-04T17:26:28.0563630Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fa52f41f0c0be4e5.xml (deflated 88%)
2025-12-04T17:26:28.0565711Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-38b24c1b21208356.xml (deflated 88%)
2025-12-04T17:26:28.0567824Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b1ae24833396f782.xml (deflated 88%)
2025-12-04T17:26:28.0569864Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-80996ba6b8c32f81.xml (deflated 88%)
2025-12-04T17:26:28.0572012Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8b26ba548538abde.xml (deflated 88%)
2025-12-04T17:26:28.0574082Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d73817a3e5f02a06.xml (deflated 88%)
2025-12-04T17:26:28.0577918Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-90e37d7f0968dad1.xml (deflated 92%)
2025-12-04T17:26:28.0579842Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6ba281452d587f38.xml (deflated 88%)
2025-12-04T17:26:28.0581703Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-85d1d6e9267cc116.xml (deflated 88%)
2025-12-04T17:26:28.0583700Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7a610c26dd7fa0e9.xml (deflated 88%)
2025-12-04T17:26:28.0585606Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-269f6089cafc9f3b.xml (deflated 88%)
2025-12-04T17:26:28.0587536Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f11fe18ee197cc1f.xml (deflated 88%)
2025-12-04T17:26:28.0590424Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0b8acd36d7258295.xml (deflated 87%)
2025-12-04T17:26:28.0592110Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-babe12520ea62fea.xml (deflated 88%)
2025-12-04T17:26:28.0593996Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-08a6bb29b776e6ca.xml (deflated 88%)
2025-12-04T17:26:28.0595966Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ef8db3fa00c6c1d7.xml (deflated 88%)
2025-12-04T17:26:28.0597896Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7a3ac84fc91fa02b.xml (deflated 88%)
2025-12-04T17:26:28.0599901Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e162f70cb76e49ff.xml (deflated 88%)
2025-12-04T17:26:28.0602406Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e70a5c274fb86b8e.xml (deflated 88%)
2025-12-04T17:26:28.0604368Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0c17434f07767682.xml (deflated 88%)
2025-12-04T17:26:28.0606626Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3815c1aa47a06d85.xml (deflated 88%)
2025-12-04T17:26:28.0608637Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-69850f25ab7699fd.xml (deflated 88%)
2025-12-04T17:26:28.0610773Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-da23a1d59c747be6.xml (deflated 88%)
2025-12-04T17:26:28.0612924Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-36993cd4956a89fe.xml (deflated 88%)
2025-12-04T17:26:28.0614946Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-78153e5fcd212bc6.xml (deflated 88%)
2025-12-04T17:26:28.0617182Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-04b538cf09549803.xml (deflated 88%)
2025-12-04T17:26:28.0619441Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d91f9f6b0d5ec125.xml (deflated 88%)
2025-12-04T17:26:28.0621479Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ebbb316cfb6210df.xml (deflated 88%)
2025-12-04T17:26:28.0623514Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6f1b13e751374b5d.xml (deflated 88%)
2025-12-04T17:26:28.0625633Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f2c09e3279cd971a.xml (deflated 88%)
2025-12-04T17:26:28.0627519Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c0e3bac2edd6805.xml (deflated 88%)
2025-12-04T17:26:28.0629500Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1c004f486086cbb5.xml (deflated 88%)
2025-12-04T17:26:28.0631485Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5e05e7b060f911b0.xml (deflated 88%)
2025-12-04T17:26:28.0634972Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4d9849432c7f5caf.xml (deflated 97%)
2025-12-04T17:26:28.0637413Z   adding: test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-d1e6e78eb0372411.xml (deflated 82%)
2025-12-04T17:26:28.0639272Z   adding: test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-0e0432c8246f889e.xml (deflated 82%)
2025-12-04T17:26:28.0640634Z   adding: test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-1d824658578ee605.xml (deflated 82%)
2025-12-04T17:26:28.0642099Z   adding: test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-a0141b45c0b55065.xml (deflated 78%)
2025-12-04T17:26:28.0660456Z   adding: test/test-reports/python-pytest/inductor.test_triton_kernels/inductor.test_triton_kernels-498ce8e3e7c25595.xml (deflated 94%)
2025-12-04T17:26:28.0663839Z   adding: test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-264346bf50f4314b.xml (deflated 85%)
2025-12-04T17:26:28.0666423Z   adding: test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-f4f4ac9590e83730.xml (deflated 87%)
2025-12-04T17:26:28.0668397Z   adding: test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-193da166d9e268ac.xml (deflated 87%)
2025-12-04T17:26:28.0669859Z   adding: test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-d4eac70f931f6c8b.xml (deflated 92%)
2025-12-04T17:26:28.0758138Z   adding: test/test-reports/python-pytest/export.test_serdes/export.test_serdes-191fd84c43c29743.xml (deflated 95%)
2025-12-04T17:26:28.0760083Z   adding: test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-7c3220f8bc842d2f.xml (deflated 82%)
2025-12-04T17:26:28.0762422Z   adding: test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-7281b232f81c7a26.xml (deflated 86%)
2025-12-04T17:26:28.0763821Z   adding: test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-0c21d337a20b3a01.xml (deflated 86%)
2025-12-04T17:26:28.0765118Z   adding: test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-62a9a8755ec319d6.xml (deflated 77%)
2025-12-04T17:26:28.0768164Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-b1ca468dab29d0d8.xml (deflated 94%)
2025-12-04T17:26:28.0770393Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-69f64b5320fd797d.xml (deflated 90%)
2025-12-04T17:26:28.0772914Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-e41e403fca9b1188.xml (deflated 90%)
2025-12-04T17:26:28.0774863Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-07f86488d4cce1d3.xml (deflated 88%)
2025-12-04T17:26:28.0778217Z   adding: test/test-reports/python-pytest/inductor.test_padding/inductor.test_padding-be250a10b53bb058.xml (deflated 89%)
2025-12-04T17:26:28.0780048Z   adding: test/test-reports/python-pytest/dynamo.test_aot_compile/dynamo.test_aot_compile-10a88b68c9603fe3.xml (deflated 83%)
2025-12-04T17:26:28.0781803Z   adding: test/test-reports/python-pytest/dynamo.test_sets/dynamo.test_sets-f0cb58e83c4ea8ef.xml (deflated 88%)
2025-12-04T17:26:28.0783845Z   adding: test/test-reports/python-pytest/dynamo.test_wrap_inductor_compiled_regions/dynamo.test_wrap_inductor_compiled_regions-2f1d9c362e038030.xml (deflated 84%)
2025-12-04T17:26:28.0808854Z   adding: test/test-reports/python-pytest/test_sparse/test_sparse-598e6683c5cfc22a.xml (deflated 95%)
2025-12-04T17:26:28.0818882Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-5879e0e26736617e.xml (deflated 91%)
2025-12-04T17:26:28.0828673Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-c4519c63d1395608.xml (deflated 91%)
2025-12-04T17:26:28.0838858Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-fd1a91e45a41098b.xml (deflated 91%)
2025-12-04T17:26:28.0876525Z   adding: test/test-reports/python-pytest/test_ops_fwd_gradients/test_ops_fwd_gradients-dac273fbaf67ad10.xml (deflated 96%)
2025-12-04T17:26:28.1027082Z   adding: test/test-reports/python-pytest/test_meta/test_meta-cbc50d7c3e0b1b6a.xml (deflated 96%)
2025-12-04T17:26:28.1039912Z   adding: test/test-reports/python-pytest/test_ops_jit/test_ops_jit-2f4faab6a29e642c.xml (deflated 93%)
2025-12-04T17:26:28.1069195Z   adding: test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-8372b6917771ca4c.xml (deflated 98%)
2025-12-04T17:26:28.1134019Z   adding: test/test-reports/python-pytest/test_ops/test_ops-d95bfbe57b5d2d89.xml (deflated 94%)
2025-12-04T17:26:28.1211884Z   adding: test/test-reports/python-pytest/test_ops/test_ops-75f8d45594e24741.xml (deflated 95%)
2025-12-04T17:26:28.1213538Z   adding: test/test-reports/python-pytest/functorch.test_dims/functorch.test_dims-e2a9e671430fd99e.xml (deflated 86%)
2025-12-04T17:26:28.1248695Z   adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-caabf5583dae6043.xml (deflated 93%)
2025-12-04T17:26:28.1284410Z   adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-b6190fae5240f1fb.xml (deflated 93%)
2025-12-04T17:26:28.1299029Z   adding: test/test-reports/python-pytest/inductor.test_cpu_repro/inductor.test_cpu_repro-e45fcbaf6c1a2b2c.xml (deflated 96%)
2025-12-04T17:26:28.1300465Z   adding: test/test-reports/python-pytest/inductor.test_custom_lowering/inductor.test_custom_lowering-f90a8c2a1b7dd9b0.xml (deflated 69%)
2025-12-04T17:26:28.1305819Z   adding: test/test-reports/python-pytest/inductor.test_perf/inductor.test_perf-34de9a09a2935f8d.xml (deflated 92%)
2025-12-04T17:26:28.1307453Z   adding: test/test-reports/python-pytest/inductor.test_binary_folding/inductor.test_binary_folding-0c797ad2be676af7.xml (deflated 79%)
2025-12-04T17:26:28.1313789Z   adding: test/test-reports/python-pytest/inductor.test_mkldnn_pattern_matcher/inductor.test_mkldnn_pattern_matcher-c93031a5b8f8293d.xml (deflated 94%)
2025-12-04T17:26:28.1325590Z   adding: test/test-reports/python-pytest/inductor.test_gpu_cpp_wrapper/inductor.test_gpu_cpp_wrapper-c206afd337165094.xml (deflated 94%)
2025-12-04T17:26:28.1327612Z   adding: test/test-reports/python-pytest/inductor.test_cutedsl_template/inductor.test_cutedsl_template-1780c0291e7a0397.xml (deflated 88%)
2025-12-04T17:26:28.1328969Z   adding: test/test-reports/python-pytest/inductor.test_benchmark_fusion/inductor.test_benchmark_fusion-33e3c50f2f02127c.xml (deflated 80%)
2025-12-04T17:26:28.1334107Z   adding: test/test-reports/python-pytest/dynamo.test_modules/dynamo.test_modules-f3674dc870090d50.xml (deflated 88%)
2025-12-04T17:26:28.1336134Z   adding: test/test-reports/python-pytest/dynamo.test_recompiles/dynamo.test_recompiles-755ec9793479e2dd.xml (deflated 76%)
2025-12-04T17:26:28.1338522Z   adding: test/test-reports/python-pytest/export.test_tree_utils/export.test_tree_utils-4b33de82582b2e92.xml (deflated 48%)
2025-12-04T17:26:28.1340573Z   adding: test/test-reports/python-pytest/inductor.test_triton_wrapper/inductor.test_triton_wrapper-7697274370716365.xml (deflated 50%)
2025-12-04T17:26:28.1343265Z   adding: test/test-reports/python-pytest/inductor.test_static_cuda_launcher/inductor.test_static_cuda_launcher-96effba66b878950.xml (deflated 85%)
2025-12-04T17:26:28.1345939Z   adding: test/test-reports/python-pytest/export.test_dynamic_shapes/export.test_dynamic_shapes-6f817f896f94c83c.xml (deflated 50%)
2025-12-04T17:26:28.1347156Z   adding: test/test-reports/python-pytest/dynamo.test_sdpa/dynamo.test_sdpa-3e0149796a415876.xml (deflated 73%)
2025-12-04T17:26:28.1348245Z   adding: test/test-reports/python-pytest/dynamo.test_utils/dynamo.test_utils-e6d94f5c34c685f8.xml (deflated 80%)
2025-12-04T17:26:28.1349463Z   adding: test/test-reports/python-pytest/inductor.test_codegen_triton/inductor.test_codegen_triton-f741c3b21cf28e3b.xml (deflated 35%)
2025-12-04T17:26:28.1350727Z   adding: test/test-reports/python-pytest/dynamo.test_frame_init/dynamo.test_frame_init-c2e1024fb8a07387.xml (deflated 38%)
2025-12-04T17:26:28.1351973Z   adding: test/test-reports/python-pytest/inductor.test_device_assert/inductor.test_device_assert-451c7142fcd9d62b.xml (deflated 80%)
2025-12-04T17:26:28.1353258Z   adding: test/test-reports/python-pytest/dynamo.test_skip_non_tensor/dynamo.test_skip_non_tensor-f190ace25428cb94.xml (deflated 71%)
2025-12-04T17:26:28.1354591Z   adding: test/test-reports/python-pytest/dynamo.test_skip_guard_eval_unsafe/dynamo.test_skip_guard_eval_unsafe-aa1ded9d0a4e400e.xml (deflated 73%)
2025-12-04T17:26:28.1356059Z   adding: test/test-reports/python-pytest/inductor.test_control_deps/inductor.test_control_deps-23047419ffe03376.xml (deflated 48%)
2025-12-04T17:26:28.1357357Z   adding: test/test-reports/python-pytest/inductor.test_benchmarking/inductor.test_benchmarking-53f04a03954d2058.xml (deflated 87%)
2025-12-04T17:26:28.1358796Z   adding: test/test-reports/python-pytest/inductor.test_helion_kernels/inductor.test_helion_kernels-0df86f8cd24ea26a.xml (deflated 62%)
2025-12-04T17:26:28.1360120Z   adding: test/test-reports/python-pytest/inductor.test_quantization/inductor.test_quantization-951156711359c867.xml (deflated 62%)
2025-12-04T17:26:28.1361665Z   adding: test/test-reports/python-pytest/export.test_tools/export.test_tools-c033e9415dabe65c.xml (deflated 47%)
2025-12-04T17:26:28.1363191Z   adding: test/test-reports/python-pytest/inductor.test_compiled_optimizers/inductor.test_compiled_optimizers-1745c9b9e5fc7ed3.xml (deflated 96%)
2025-12-04T17:26:28.1364618Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor_utils/inductor.test_aot_inductor_utils-e7355f16ccb52d23.xml (deflated 28%)
2025-12-04T17:26:28.1382075Z   adding: test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-0b2081966a192cef.xml (deflated 97%)
2025-12-04T17:26:28.1386332Z   adding: test/test-reports/python-pytest/inductor.test_minifier_isolate/inductor.test_minifier_isolate-f50615d1a1981661.xml (deflated 93%)
2025-12-04T17:26:28.1509754Z   adding: test/test-reports/python-pytest/dynamo.test_error_messages/dynamo.test_error_messages-36d8e363c2770c16.xml (deflated 95%)
2025-12-04T17:26:28.1511639Z   adding: test/test-reports/python-pytest/dynamo.test_fake_distributed/dynamo.test_fake_distributed-b0f5d6fe6c345e8f.xml (deflated 58%)
2025-12-04T17:26:28.1512879Z   adding: test/test-reports/python-pytest/dynamo.test_tree_map/dynamo.test_tree_map-39d9c68e899fe910.xml (deflated 90%)
2025-12-04T17:26:28.1520593Z   adding: test/test-reports/python-pytest/dynamo.test_minifier/dynamo.test_minifier-9124cc51e1c5e7b6.xml (deflated 94%)
2025-12-04T17:26:28.1522729Z   adding: test/test-reports/python-pytest/dynamo.test_guard_manager/dynamo.test_guard_manager-f0dd8a549f18516b.xml (deflated 86%)
2025-12-04T17:26:28.1524509Z   adding: test/test-reports/python-pytest/export.test_schema/export.test_schema-98e7fce7714746ab.xml (deflated 67%)
2025-12-04T17:26:28.1526038Z   adding: test/test-reports/python-pytest/export.test_pass_infra/export.test_pass_infra-0489f34d1d482c78.xml (deflated 75%)
2025-12-04T17:26:28.1527411Z   adding: test/test-reports/python-pytest/dynamo.test_recompile_ux/dynamo.test_recompile_ux-5436245cbc75fddd.xml (deflated 79%)
2025-12-04T17:26:28.1529136Z   adding: test/test-reports/python-pytest/export.test_experimental/export.test_experimental-4743e9a7200af635.xml (deflated 86%)
2025-12-04T17:26:28.1530858Z   adding: test/test-reports/python-pytest/export.test_converter/export.test_converter-a6e4e9ebcfaea6df.xml (deflated 90%)
2025-12-04T17:26:28.1532076Z   adding: test/test-reports/python-pytest/dynamo.test_reorder_logs/dynamo.test_reorder_logs-d530254831fe0a21.xml (deflated 85%)
2025-12-04T17:26:28.1535373Z   adding: test/test-reports/python-pytest/dynamo.test_subclasses/dynamo.test_subclasses-90ae20717b7fd572.xml (deflated 91%)
2025-12-04T17:26:28.1537340Z   adding: test/test-reports/python-pytest/dynamo.test_python_autograd/dynamo.test_python_autograd-b76a60537c2ba691.xml (deflated 75%)
2025-12-04T17:26:28.1538605Z   adding: test/test-reports/python-pytest/export.test_draft_export/export.test_draft_export-0c8a812115433a7d.xml (deflated 83%)
2025-12-04T17:26:28.1540703Z   adding: test/test-reports/python-pytest/test_package/test_package-523d81f0792170f1.xml (deflated 87%)
2025-12-04T17:26:28.1542090Z   adding: test/test-reports/python-pytest/test_mkl_verbose/test_mkl_verbose-874cbf06946f8b3e.xml (deflated 50%)
2025-12-04T17:26:28.1543207Z   adding: test/test-reports/python-pytest/test_comparison_utils/test_comparison_utils-ce770324779d51b3.xml (deflated 76%)
2025-12-04T17:26:28.1544531Z   adding: test/test-reports/python-pytest/functorch.test_ac_logging/functorch.test_ac_logging-f1c79a1c8c74be66.xml (deflated 63%)
2025-12-04T17:26:28.1545722Z   adding: test/test-reports/python-pytest/test_mkldnn_verbose/test_mkldnn_verbose-e983273d29ed8e1e.xml (deflated 50%)
2025-12-04T17:26:28.1548731Z   adding: test/test-reports/python-pytest/test_cpp_api_parity/test_cpp_api_parity-c6b7300fef8db168.xml (deflated 94%)
2025-12-04T17:26:28.1550515Z   adding: test/test-reports/python-pytest/test_autoload/test_autoload-21f1eacf8f4a4d28.xml (deflated 38%)
2025-12-04T17:26:28.1551701Z   adding: test/test-reports/python-pytest/nn.attention.test_open_registry/nn.attention.test_open_registry-bacfee0084c93992.xml (deflated 51%)
2025-12-04T17:26:28.1552903Z   adding: test/test-reports/python-pytest/test_as_strided/test_as_strided-4555079064233d7d.xml (deflated 49%)
2025-12-04T17:26:28.1600519Z   adding: test/test-reports/python-pytest/test_foreach/test_foreach-aa4419a4e7b6d381.xml (deflated 96%)
2025-12-04T17:26:28.1601958Z   adding: test/test-reports/python-pytest/xpu.test_gemm/xpu.test_gemm-2cb9cf39de6aa2cf.xml (deflated 28%)
2025-12-04T17:26:28.1603601Z   adding: test/test-reports/python-pytest/test_numpy_interop/test_numpy_interop-660870d95235d56d.xml (deflated 88%)
2025-12-04T17:26:28.1605245Z   adding: test/test-reports/python-pytest/profiler.test_cpp_thread/profiler.test_cpp_thread-31559e2ba96f64a3.xml (deflated 82%)
2025-12-04T17:26:28.1607075Z   adding: test/test-reports/python-pytest/test_hub/test_hub-33a47573ff45c77e.xml (deflated 83%)
2025-12-04T17:26:28.1609166Z   adding: test/test-reports/python-pytest/test_segment_reductions/test_segment_reductions-ad616dd6940e0de0.xml (deflated 95%)
2025-12-04T17:26:28.1610381Z   adding: test/test-reports/python-pytest/test_autograd_fallback/test_autograd_fallback-e1a7bbd98afc63dc.xml (deflated 89%)
2025-12-04T17:26:28.1611492Z   adding: test/test-reports/python-pytest/test_type_hints/test_type_hints-d14fd0906e097d86.xml (deflated 58%)
2025-12-04T17:26:28.1612809Z   adding: test/test-reports/python-pytest/functorch.test_aot_joint_with_descriptors/functorch.test_aot_joint_with_descriptors-79fd9b229bc0c00b.xml (deflated 83%)
2025-12-04T17:26:28.1614189Z   adding: test/test-reports/python-pytest/test_fx_reinplace_pass/test_fx_reinplace_pass-047146b9ff22e4f6.xml (deflated 76%)
2025-12-04T17:26:28.1632137Z   adding: test/test-reports/python-pytest/functorch.test_control_flow/functorch.test_control_flow-922a9914156e0312.xml (deflated 95%)
2025-12-04T17:26:28.1633921Z   adding: test/test-reports/python-pytest/test_subclass/test_subclass-68565895e4fc66ea.xml (deflated 93%)
2025-12-04T17:26:28.1658320Z   adding: test/test-reports/python-pytest/functorch.test_vmap_registrations/functorch.test_vmap_registrations-40d5b566ee6986dc.xml (deflated 97%)
2025-12-04T17:26:28.1660162Z   adding: test/test-reports/python-pytest/nn.test_parametrization/nn.test_parametrization-ed4e97080833ff92.xml (deflated 90%)
2025-12-04T17:26:28.1675759Z   adding: test/test-reports/python-pytest/test_dynamic_shapes/test_dynamic_shapes-07075f000d166d21.xml (deflated 94%)
2025-12-04T17:26:28.1677812Z   adding: test/test-reports/python-pytest/test_dispatch/test_dispatch-bf1fd68f7abb7228.xml (deflated 85%)
2025-12-04T17:26:28.1678911Z   adding: test/test-reports/python-pytest/test_numba_integration/test_numba_integration-edcc49db775b9990.xml (deflated 80%)
2025-12-04T17:26:28.1680188Z   adding: test/test-reports/python-pytest/test_functional_optim/test_functional_optim-389cbc1bb3d61470.xml (deflated 74%)
2025-12-04T17:26:28.1692042Z   adding: test/test-reports/python-pytest/test_maskedtensor/test_maskedtensor-1089c4e953521eec.xml (deflated 95%)
2025-12-04T17:26:28.1693593Z   adding: test/test-reports/python-pytest/benchmark_utils.test_benchmark_utils/benchmark_utils.test_benchmark_utils-76c10c33afe299c4.xml (deflated 79%)
2025-12-04T17:26:28.1714636Z   adding: test/test-reports/python-pytest/test_scaled_matmul_cuda/test_scaled_matmul_cuda-d1f8763e6c1869e6.xml (deflated 99%)
2025-12-04T17:26:28.1716974Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_shape_base/torch_np.numpy_tests.core.test_shape_base-b9eed7c143bc9bc3.xml (deflated 95%)
2025-12-04T17:26:28.1718454Z   adding: test/test-reports/python-pytest/test_vulkan/test_vulkan-b25d187bf3baa78a.xml (deflated 45%)
2025-12-04T17:26:28.1719512Z   adding: test/test-reports/python-pytest/lazy.test_generator/lazy.test_generator-42072a3593c4e25d.xml (deflated 52%)
2025-12-04T17:26:28.1722745Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.linalg.test_linalg/torch_np.numpy_tests.linalg.test_linalg-2974f2048ff6a577.xml (deflated 94%)
2025-12-04T17:26:28.1725429Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_dtype/torch_np.numpy_tests.core.test_dtype-4b8c4285965a7813.xml (deflated 95%)
2025-12-04T17:26:28.1727400Z   adding: test/test-reports/python-pytest/lazy.test_debug_util/lazy.test_debug_util-7c02b1e3dfee61bd.xml (deflated 35%)
2025-12-04T17:26:28.1728868Z   adding: test/test-reports/python-pytest/nn.test_load_state_dict/nn.test_load_state_dict-e81d6ed8d3f8789f.xml (deflated 89%)
2025-12-04T17:26:28.1730811Z   adding: test/test-reports/python-pytest/test_shape_ops/test_shape_ops-a6160583c0856270.xml (deflated 91%)
2025-12-04T17:26:28.1732375Z   adding: test/test-reports/python-pytest/nn.test_module_hooks/nn.test_module_hooks-e13d4f4eb9af9666.xml (deflated 88%)
2025-12-04T17:26:28.1734391Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_twodim_base/torch_np.numpy_tests.lib.test_twodim_base-2da66c446de8da89.xml (deflated 87%)
2025-12-04T17:26:28.1736044Z   adding: test/test-reports/python-pytest/profiler.test_memory_profiler/profiler.test_memory_profiler-20f5e2eefecacaee.xml (deflated 79%)
2025-12-04T17:26:28.1737361Z   adding: test/test-reports/python-pytest/test_jit_llga_fuser/test_jit_llga_fuser-b203cab2c461ce78.xml (deflated 94%)
2025-12-04T17:26:28.1738772Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_getlimits/torch_np.numpy_tests.core.test_getlimits-5149534e2555ec6f.xml (deflated 86%)
2025-12-04T17:26:28.1740629Z   adding: test/test-reports/python-pytest/torch_np.test_ndarray_methods/torch_np.test_ndarray_methods-fe7c638b86097b2d.xml (deflated 96%)
2025-12-04T17:26:28.1743845Z   adding: test/test-reports/python-pytest/test_view_ops/test_view_ops-f5d6b3525797eb50.xml (deflated 92%)
2025-12-04T17:26:28.1745069Z   adding: test/test-reports/python-pytest/test_type_info/test_type_info-22600993e111f6f2.xml (deflated 67%)
2025-12-04T17:26:28.1763142Z   adding: test/test-reports/python-pytest/functorch.test_aotdispatch/functorch.test_aotdispatch-efb7e0b79840fa38.xml (deflated 93%)
2025-12-04T17:26:28.1764917Z   adding: test/test-reports/python-pytest/test_scatter_gather_ops/test_scatter_gather_ops-5274bd99c8a0619f.xml (deflated 91%)
2025-12-04T17:26:28.1766550Z   adding: test/test-reports/python-pytest/test_cuda_multigpu/test_cuda_multigpu-a9a26e79d8868522.xml (deflated 91%)
2025-12-04T17:26:28.1768480Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_index_tricks/torch_np.numpy_tests.lib.test_index_tricks-43f32c31fbfc43cd.xml (deflated 90%)
2025-12-04T17:26:28.1770212Z   adding: test/test-reports/python-pytest/test_jit_autocast/test_jit_autocast-9b5e22ff1077135a.xml (deflated 86%)
2025-12-04T17:26:28.1772234Z   adding: test/test-reports/python-pytest/nn.test_pooling/nn.test_pooling-2151df52b065bbdf.xml (deflated 92%)
2025-12-04T17:26:28.1775325Z   adding: test/test-reports/python-pytest/nn.test_embedding/nn.test_embedding-d055fd5d393643fe.xml (deflated 95%)
2025-12-04T17:26:28.1777667Z   adding: test/test-reports/python-pytest/test_xnnpack_integration/test_xnnpack_integration-ed8e38bda9a33f4f.xml (deflated 81%)
2025-12-04T17:26:28.1779900Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-5e38c3c197506de5.xml (deflated 38%)
2025-12-04T17:26:28.1781456Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-4795d7c5159b6e03.xml (deflated 35%)
2025-12-04T17:26:28.1782598Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-6c76df2a5666e90f.xml (deflated 36%)
2025-12-04T17:26:28.1783634Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-c93df3ae687a8e58.xml (deflated 35%)
2025-12-04T17:26:28.1784663Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-524a9565fc6ac576.xml (deflated 36%)
2025-12-04T17:26:28.1785807Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-b18b3c9d4ddc6b34.xml (deflated 35%)
2025-12-04T17:26:28.1786845Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-319074c2014cbf3e.xml (deflated 36%)
2025-12-04T17:26:28.1787860Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-2483ba726355768c.xml (deflated 36%)
2025-12-04T17:26:28.1788890Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-f2eed9c29ea8eac7.xml (deflated 36%)
2025-12-04T17:26:28.1789932Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-c778cc218c519690.xml (deflated 36%)
2025-12-04T17:26:28.1790961Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-7897a7f1d03cdaa3.xml (deflated 37%)
2025-12-04T17:26:28.1791979Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-d4fb26045e698199.xml (deflated 35%)
2025-12-04T17:26:28.1799383Z   adding: test/test-reports/python-pytest/torch_np.test_reductions/torch_np.test_reductions-73a2026a6cdfd4dc.xml (deflated 98%)
2025-12-04T17:26:28.1802044Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_scalar_ctors/torch_np.numpy_tests.core.test_scalar_ctors-e23576bcb06b5d61.xml (deflated 94%)
2025-12-04T17:26:28.1803588Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_arraypad/torch_np.numpy_tests.lib.test_arraypad-f4e46a1506be78e1.xml (deflated 78%)
2025-12-04T17:26:28.1804803Z   adding: test/test-reports/python-pytest/test_prims/test_prims-38188698633a9bb5.xml (deflated 81%)
2025-12-04T17:26:28.1807712Z   adding: test/test-reports/python-pytest/test_spectral_ops/test_spectral_ops-ae86fbbf23286ef9.xml (deflated 93%)
2025-12-04T17:26:28.1809892Z   adding: test/test-reports/python-pytest/test_cpp_extensions_aot_ninja/test_cpp_extensions_aot_ninja-5c9ab2f003415ced.xml (deflated 81%)
2025-12-04T17:26:28.1811236Z   adding: test/test-reports/python-pytest/test_cpp_extensions_aot_no_ninja/test_cpp_extensions_aot_no_ninja-dc0b3ab1cc30279c.xml (deflated 82%)
2025-12-04T17:26:28.1812643Z   adding: test/test-reports/python-unittest/test_autoload/TEST-TestDeviceBackendAutoload-20251204172052.xml (deflated 42%)
2025-12-04T17:26:28.1841167Z ##[group]Run # Remove any previous usage logs if they exist
2025-12-04T17:26:28.1841723Z [36;1m# Remove any previous usage logs if they exist[0m
2025-12-04T17:26:28.1842159Z [36;1mrm -f logs-*.zip[0m
2025-12-04T17:26:28.1842569Z [36;1mzip "logs-${FILE_SUFFIX}.zip" 'usage_log.txt' || true[0m
2025-12-04T17:26:28.1843188Z [36;1mzip -r "logs-${FILE_SUFFIX}.zip" test/test-reports -i '*.log' || true[0m
2025-12-04T17:26:28.1850064Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T17:26:28.1850515Z env:
2025-12-04T17:26:28.1850771Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:28.1851073Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:28.1851447Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:28.1852101Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:28.1852925Z   FILE_SUFFIX: test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248
2025-12-04T17:26:28.1853488Z ##[endgroup]
2025-12-04T17:26:28.1933402Z   adding: usage_log.txt (deflated 58%)
2025-12-04T17:26:28.1996432Z   adding: test/test-reports/lazy.test_ts_opinfo_1.1_4d268f7078430bdf_.log (deflated 60%)
2025-12-04T17:26:28.2009549Z   adding: test/test-reports/inductor.test_aot_inductor_1.6_cf1c969272c5d084_.log (deflated 93%)
2025-12-04T17:26:28.2042103Z   adding: test/test-reports/inductor.test_aot_inductor_6.6_462385258b0b1d27_.log (deflated 95%)
2025-12-04T17:26:28.2066064Z   adding: test/test-reports/inductor.test_torchinductor_codegen_dynamic_shapes_2.4_37f84ce4dcc870f4_.log (deflated 93%)
2025-12-04T17:26:28.2070236Z   adding: test/test-reports/nn.test_pooling_1.1_e8e935ea909a1883_.log (deflated 90%)
2025-12-04T17:26:28.2077076Z   adding: test/test-reports/inductor.test_torchinductor_opinfo_2.17_595df7515ef47f8b_.log (deflated 91%)
2025-12-04T17:26:28.2081947Z   adding: test/test-reports/nn.test_embedding_1.1_dc9119745a665b44_.log (deflated 93%)
2025-12-04T17:26:28.2092032Z   adding: test/test-reports/inductor.test_torchinductor_opinfo_7.17_bf87dc9c512027f2_.log (deflated 92%)
2025-12-04T17:26:28.2095108Z   adding: test/test-reports/torch_np.numpy_tests.core.test_dtype_1.1_7868c6a3dd1e371a_.log (deflated 91%)
2025-12-04T17:26:28.2101683Z   adding: test/test-reports/inductor.test_torchinductor_opinfo_12.17_a032934f54d29036_.log (deflated 91%)
2025-12-04T17:26:28.2102595Z   adding: test/test-reports/lazy.test_debug_util_1.1_1fda475a5f9a06f5_.log (deflated 51%)
2025-12-04T17:26:28.2112229Z   adding: test/test-reports/inductor.test_torchinductor_opinfo_17.17_0b4f962be1a8215a_.log (deflated 92%)
2025-12-04T17:26:28.2113159Z   adding: test/test-reports/test_xnnpack_integration_1.1_5b815e7820ba690d_.log (deflated 72%)
2025-12-04T17:26:28.2167139Z   adding: test/test-reports/inductor.test_cuda_select_algorithm_3.5_e3565bc7025c1889_.log (deflated 96%)
2025-12-04T17:26:28.2168586Z   adding: test/test-reports/test_cuda_trace_1.1_70d30feb0b9acc89_.log (deflated 92%)
2025-12-04T17:26:28.2213383Z   adding: test/test-reports/inductor.test_compile_subprocess_3.3_92ce494afd455b37_.log (deflated 95%)
2025-12-04T17:26:28.2214314Z   adding: test/test-reports/inductor.test_flex_decoding_1.1_a47e1c88f2ff3c9a_.log (deflated 50%)
2025-12-04T17:26:28.2223621Z   adding: test/test-reports/inductor.test_deterministic_5.8_04041ff7a6ce6208_.log (deflated 94%)
2025-12-04T17:26:28.2580857Z   adding: test/test-reports/inductor.test_fp8_1.1_5b24deb545871ee8_.log (deflated 95%)
2025-12-04T17:26:28.2582930Z   adding: test/test-reports/dynamo.test_model_output_1.1_9f288500c4a144e5_.log (deflated 89%)
2025-12-04T17:26:28.2603209Z   adding: test/test-reports/torch_np.test_reductions_1.1_b720aba5a84f607c_.log (deflated 96%)
2025-12-04T17:26:28.2616750Z   adding: test/test-reports/inductor.test_triton_kernels_1.1_80e8269e9d3330b3_.log (deflated 92%)
2025-12-04T17:26:28.2626077Z   adding: test/test-reports/inductor.test_loop_ordering_1.1_ca0aee6babe9c71a_.log (deflated 92%)
2025-12-04T17:26:28.2674298Z   adding: test/test-reports/export.test_serdes_1.1_d6753111c4d56d4f_.log (deflated 91%)
2025-12-04T17:26:28.2677119Z   adding: test/test-reports/dynamo.test_backends_1.1_0248c6271c37d6dd_.log (deflated 91%)
2025-12-04T17:26:28.2678148Z   adding: test/test-reports/test_prims_1.1_8a7702ff07b7da5d_.log (deflated 77%)
2025-12-04T17:26:28.2764497Z   adding: test/test-reports/inductor.test_aot_inductor_package_1.1_5509f9f54e762912_.log (deflated 97%)
2025-12-04T17:26:28.2766394Z   adding: test/test-reports/inductor.test_padding_1.1_4d224b6d5f4af5af_.log (deflated 86%)
2025-12-04T17:26:28.2767641Z   adding: test/test-reports/dynamo.test_aot_compile_1.1_232ed44e0e50b87e_.log (deflated 78%)
2025-12-04T17:26:28.2770881Z   adding: test/test-reports/dynamo.test_sets_1.1_e77962cd1c25fe47_.log (deflated 87%)
2025-12-04T17:26:28.2772308Z   adding: test/test-reports/nn.test_load_state_dict_1.1_54a686ad2f48d7f9_.log (deflated 85%)
2025-12-04T17:26:28.2773703Z   adding: test/test-reports/dynamo.test_wrap_inductor_compiled_regions_1.1_1c64e72dd7c0888e_.log (deflated 81%)
2025-12-04T17:26:28.2816923Z   adding: test/test-reports/test_sparse_2.2_a491ad82f72502f4_.log (deflated 94%)
2025-12-04T17:26:28.2833103Z   adding: test/test-reports/test_decomp_3.17_3a5dd6feb399010e_.log (deflated 89%)
2025-12-04T17:26:28.2849290Z   adding: test/test-reports/test_decomp_8.17_26b4abb8a1042a34_.log (deflated 89%)
2025-12-04T17:26:28.2865999Z   adding: test/test-reports/test_decomp_13.17_a52400f805dcf5ec_.log (deflated 89%)
2025-12-04T17:26:28.2912578Z   adding: test/test-reports/test_ops_fwd_gradients_1.2_4abfc4ee1bccdea9_.log (deflated 94%)
2025-12-04T17:26:28.3122439Z   adding: test/test-reports/test_meta_2.5_dad2a564d06ce93f_.log (deflated 93%)
2025-12-04T17:26:28.3139469Z   adding: test/test-reports/test_ops_jit_2.2_814c1a8715769c60_.log (deflated 91%)
2025-12-04T17:26:28.3153844Z   adding: test/test-reports/test_nestedtensor_3.4_8e55fc0245a5aec0_.log (deflated 91%)
2025-12-04T17:26:28.3243106Z   adding: test/test-reports/test_ops_2.11_06c992f175cc3a27_.log (deflated 91%)
2025-12-04T17:26:28.3330553Z   adding: test/test-reports/test_ops_7.11_97114ebb7b0ad963_.log (deflated 91%)
2025-12-04T17:26:28.3332433Z   adding: test/test-reports/functorch.test_dims_1.1_a45bb86ae199f167_.log (deflated 83%)
2025-12-04T17:26:28.3373525Z   adding: test/test-reports/functorch.test_ops_1.7_2b66798f0700c47b_.log (deflated 92%)
2025-12-04T17:26:28.3415011Z   adding: test/test-reports/functorch.test_ops_6.7_b2e5f87489ea3e61_.log (deflated 92%)
2025-12-04T17:26:28.3424791Z   adding: test/test-reports/test_spectral_ops_1.1_434231ff814fe9e8_.log (deflated 93%)
2025-12-04T17:26:28.3425668Z   adding: test/test-reports/inductor.test_select_algorithm_1.1_7db4d246e17eb863_.log (deflated 7%)
2025-12-04T17:26:28.3435072Z   adding: test/test-reports/inductor.test_cpu_repro_1.3_45e7fcc9d89e84f9_.log (deflated 93%)
2025-12-04T17:26:28.3436125Z   adding: test/test-reports/test_cpp_extensions_aot_no_ninja_1.1_8356099a97b89d55_.log (deflated 78%)
2025-12-04T17:26:28.3437065Z   adding: test/test-reports/inductor.test_custom_lowering_1.1_b51e0c13dc286ed6_.log (deflated 67%)
2025-12-04T17:26:28.3445703Z   adding: test/test-reports/inductor.test_perf_1.1_8b1dd16368b2df6e_.log (deflated 91%)
2025-12-04T17:26:28.3446785Z   adding: test/test-reports/test_cpp_extensions_aot_ninja_1.1_f69ae0466baae8e0_.log (deflated 78%)
2025-12-04T17:26:28.3447862Z   adding: test/test-reports/inductor.test_binary_folding_1.1_181cb55db6266036_.log (deflated 66%)
2025-12-04T17:26:28.3450222Z   adding: test/test-reports/test_shape_ops_1.1_4cd0c635a81aa180_.log (deflated 87%)
2025-12-04T17:26:28.3454631Z   adding: test/test-reports/inductor.test_mkldnn_pattern_matcher_3.3_de8f963f0fd4260a_.log (deflated 91%)
2025-12-04T17:26:28.3455644Z   adding: test/test-reports/inductor.test_cutlass_backend_1.1_15c862b0fcbdbc05_.log (deflated 33%)
2025-12-04T17:26:28.3456602Z   adding: test/test-reports/inductor.test_ck_backend_1.1_578c7dfc11700a2c_.log (deflated 33%)
2025-12-04T17:26:28.3465530Z   adding: test/test-reports/inductor.test_gpu_cpp_wrapper_1.1_e2281895ade7355a_.log (deflated 93%)
2025-12-04T17:26:28.3466575Z   adding: test/test-reports/inductor.test_cutedsl_template_1.1_431b05ccc7f3aa92_.log (deflated 77%)
2025-12-04T17:26:28.3467631Z   adding: test/test-reports/inductor.test_benchmark_fusion_1.1_06ce66c290620934_.log (deflated 75%)
2025-12-04T17:26:28.3473445Z   adding: test/test-reports/dynamo.test_modules_1.1_8a3e7afe44c0508c_.log (deflated 87%)
2025-12-04T17:26:28.3474680Z   adding: test/test-reports/dynamo.test_recompiles_1.1_781d5b3da7b99916_.log (deflated 79%)
2025-12-04T17:26:28.3475649Z   adding: test/test-reports/export.test_tree_utils_1.1_01fdd9412c3dc291_.log (deflated 55%)
2025-12-04T17:26:28.3476553Z   adding: test/test-reports/inductor.test_triton_wrapper_1.1_aad0f3987661a0f9_.log (deflated 53%)
2025-12-04T17:26:28.3477596Z   adding: test/test-reports/inductor.test_static_cuda_launcher_1.1_aa705837cbb50573_.log (deflated 79%)
2025-12-04T17:26:28.3478655Z   adding: test/test-reports/export.test_dynamic_shapes_1.1_fa1beed2f0eed81a_.log (deflated 55%)
2025-12-04T17:26:28.3479595Z   adding: test/test-reports/dynamo.test_sdpa_1.1_5570cc8ef25d14ab_.log (deflated 63%)
2025-12-04T17:26:28.3480431Z   adding: test/test-reports/dynamo.test_utils_1.1_31a21332cf86ab83_.log (deflated 76%)
2025-12-04T17:26:28.3481416Z   adding: test/test-reports/inductor.test_codegen_triton_1.1_8e8a3c1b0bc12db7_.log (deflated 53%)
2025-12-04T17:26:28.3482523Z   adding: test/test-reports/dynamo.test_frame_init_1.1_2f60459938295159_.log (deflated 51%)
2025-12-04T17:26:28.3483421Z   adding: test/test-reports/inductor.test_device_assert_1.1_d916ba60ad9d20e5_.log (deflated 74%)
2025-12-04T17:26:28.3484387Z   adding: test/test-reports/dynamo.test_skip_non_tensor_1.1_5109354b2e4bf091_.log (deflated 70%)
2025-12-04T17:26:28.3485380Z   adding: test/test-reports/dynamo.test_skip_guard_eval_unsafe_1.1_b141b115e14ff53c_.log (deflated 66%)
2025-12-04T17:26:28.3486554Z   adding: test/test-reports/inductor.test_control_deps_1.1_e3804afa5ea10bb1_.log (deflated 51%)
2025-12-04T17:26:28.3487451Z   adding: test/test-reports/inductor.test_benchmarking_1.1_f947c0362e7ea45b_.log (deflated 78%)
2025-12-04T17:26:28.3488347Z   adding: test/test-reports/inductor.test_helion_kernels_1.1_7576dd76567d0db5_.log (deflated 56%)
2025-12-04T17:26:28.3489240Z   adding: test/test-reports/inductor.test_quantization_1.1_84a522d95ca6c1ae_.log (deflated 56%)
2025-12-04T17:26:28.3490081Z   adding: test/test-reports/export.test_tools_1.1_7b301d5abd4a995c_.log (deflated 63%)
2025-12-04T17:26:28.3495921Z   adding: test/test-reports/inductor.test_compiled_optimizers_1.3_8b95325a31b7233d_.log (deflated 92%)
2025-12-04T17:26:28.3496979Z   adding: test/test-reports/inductor.test_aot_inductor_utils_1.1_6e3c972b94953db6_.log (deflated 51%)
2025-12-04T17:26:28.4137987Z   adding: test/test-reports/inductor.test_control_flow_3.4_41808f1ad591b77f_.log (deflated 96%)
2025-12-04T17:26:28.4139200Z   adding: test/test-reports/inductor.test_minifier_isolate_1.1_057329f0cdaf132f_.log (deflated 55%)
2025-12-04T17:26:28.4140852Z   adding: test/test-reports/dynamo.test_error_messages_1.1_69ccbdbb7b8c4f0d_.log (deflated 85%)
2025-12-04T17:26:28.4141851Z   adding: test/test-reports/dynamo.test_fake_distributed_1.1_14aa9693a6d04f2f_.log (deflated 60%)
2025-12-04T17:26:28.4143128Z   adding: test/test-reports/dynamo.test_tree_map_1.1_63649f1aa127b381_.log (deflated 87%)
2025-12-04T17:26:28.4147261Z   adding: test/test-reports/dynamo.test_minifier_1.1_70592d9088ca13b1_.log (deflated 93%)
2025-12-04T17:26:28.4148689Z   adding: test/test-reports/dynamo.test_guard_manager_1.1_bfbfec93ec272b46_.log (deflated 83%)
2025-12-04T17:26:28.4149756Z   adding: test/test-reports/export.test_schema_1.1_81eb22b4e3e11516_.log (deflated 62%)
2025-12-04T17:26:28.4150572Z   adding: test/test-reports/export.test_pass_infra_1.1_d5838225a9a8bb31_.log (deflated 62%)
2025-12-04T17:26:28.4151557Z   adding: test/test-reports/dynamo.test_recompile_ux_1.1_ac1d0051161f3db2_.log (deflated 78%)
2025-12-04T17:26:28.4152736Z   adding: test/test-reports/export.test_experimental_1.1_01776c650d6c59b4_.log (deflated 79%)
2025-12-04T17:26:28.4154821Z   adding: test/test-reports/export.test_converter_1.1_96408107873dd104_.log (deflated 87%)
2025-12-04T17:26:28.4155860Z   adding: test/test-reports/dynamo.test_reorder_logs_1.1_c9bc43c050335e8d_.log (deflated 78%)
2025-12-04T17:26:28.4162286Z   adding: test/test-reports/dynamo.test_subclasses_1.1_2bde93c2c59c5c84_.log (deflated 89%)
2025-12-04T17:26:28.4163267Z   adding: test/test-reports/dynamo.test_python_autograd_1.1_3d66bfb1c1737055_.log (deflated 65%)
2025-12-04T17:26:28.4167864Z   adding: test/test-reports/export.test_draft_export_1.1_dc9e6c5dfafe9a68_.log (deflated 92%)
2025-12-04T17:26:28.4173171Z   adding: test/test-reports/test_package_1.1_34eeddca63aecf34_.log (deflated 87%)
2025-12-04T17:26:28.4174008Z   adding: test/test-reports/test_mkl_verbose_1.1_8df5a0c4f0a0ed8d_.log (deflated 54%)
2025-12-04T17:26:28.4174993Z   adding: test/test-reports/test_comparison_utils_1.1_bef8586b0834f006_.log (deflated 68%)
2025-12-04T17:26:28.4175875Z   adding: test/test-reports/functorch.test_ac_logging_1.1_7064fc1f81d9dc21_.log (deflated 63%)
2025-12-04T17:26:28.4176783Z   adding: test/test-reports/test_mkldnn_verbose_1.1_7178d5eae573783e_.log (deflated 55%)
2025-12-04T17:26:28.4190422Z   adding: test/test-reports/test_cpp_api_parity_1.1_286b24be771dc4b7_.log (deflated 94%)
2025-12-04T17:26:28.4191271Z   adding: test/test-reports/test_autoload_1.1_4b58ab9cd8e50318_.log (deflated 50%)
2025-12-04T17:26:28.4192361Z   adding: test/test-reports/nn.attention.test_open_registry_1.1_52b8c107579dfb04_.log (deflated 58%)
2025-12-04T17:26:28.4193209Z   adding: test/test-reports/test_as_strided_1.1_915ecc12abd3e105_.log (deflated 53%)
2025-12-04T17:26:28.4283578Z   adding: test/test-reports/test_foreach_1.1_754d93a1205d9df5_.log (deflated 95%)
2025-12-04T17:26:28.4284445Z   adding: test/test-reports/xpu.test_gemm_1.1_f9c98ad78a8f930f_.log (deflated 48%)
2025-12-04T17:26:28.4285992Z   adding: test/test-reports/test_numpy_interop_1.1_0cfaaa8b9ef10506_.log (deflated 85%)
2025-12-04T17:26:28.4287348Z   adding: test/test-reports/profiler.test_cpp_thread_1.1_6bc17e34ef07b5a0_.log (deflated 82%)
2025-12-04T17:26:28.4288299Z   adding: test/test-reports/test_hub_1.1_af317e8677316cdb_.log (deflated 70%)
2025-12-04T17:26:28.4290813Z   adding: test/test-reports/test_segment_reductions_1.1_c6d7e787931576c3_.log (deflated 91%)
2025-12-04T17:26:28.4291974Z   adding: test/test-reports/test_autograd_fallback_1.1_60e7b253f9787096_.log (deflated 85%)
2025-12-04T17:26:28.4292896Z   adding: test/test-reports/test_type_hints_1.1_d9336b501fe8992b_.log (deflated 49%)
2025-12-04T17:26:28.4294577Z   adding: test/test-reports/nn.test_module_hooks_1.1_b8e5016c3845034d_.log (deflated 86%)
2025-12-04T17:26:28.4295597Z   adding: test/test-reports/functorch.test_aot_joint_with_descriptors_1.1_948ec5a85f7c1f8f_.log (deflated 80%)
2025-12-04T17:26:28.4296752Z   adding: test/test-reports/test_fx_reinplace_pass_1.1_8f7033a49b0aaa2e_.log (deflated 75%)
2025-12-04T17:26:28.4707971Z   adding: test/test-reports/functorch.test_control_flow_2.2_2e5432104edc7835_.log (deflated 96%)
2025-12-04T17:26:28.4710061Z   adding: test/test-reports/test_subclass_1.1_b65d4f741f14f053_.log (deflated 90%)
2025-12-04T17:26:28.4753100Z   adding: test/test-reports/functorch.test_vmap_registrations_1.1_8a0424ce5b3ca65e_.log (deflated 96%)
2025-12-04T17:26:28.4755076Z   adding: test/test-reports/nn.test_parametrization_1.1_0b836fe205c49662_.log (deflated 89%)
2025-12-04T17:26:28.4768103Z   adding: test/test-reports/test_dynamic_shapes_1.1_f2bbcf4caeac0628_.log (deflated 91%)
2025-12-04T17:26:28.4769769Z   adding: test/test-reports/test_dispatch_1.1_a7d630610c114c46_.log (deflated 77%)
2025-12-04T17:26:28.4770724Z   adding: test/test-reports/test_numba_integration_1.1_4248037d4c172e88_.log (deflated 71%)
2025-12-04T17:26:28.4771950Z   adding: test/test-reports/test_functional_optim_1.1_82fdba90420e8f47_.log (deflated 65%)
2025-12-04T17:26:28.4794703Z   adding: test/test-reports/test_maskedtensor_1.1_4e0623e742dfe084_.log (deflated 94%)
2025-12-04T17:26:28.4796134Z   adding: test/test-reports/torch_np.numpy_tests.lib.test_twodim_base_1.1_facf24e95ed5355d_.log (deflated 82%)
2025-12-04T17:26:28.4797287Z   adding: test/test-reports/benchmark_utils.test_benchmark_utils_1.1_63175fb80c7f9ea7_.log (deflated 72%)
2025-12-04T17:26:28.4823539Z   adding: test/test-reports/test_scaled_matmul_cuda_1.1_751f5e87909cbd5d_.log (deflated 97%)
2025-12-04T17:26:28.4825024Z   adding: test/test-reports/profiler.test_memory_profiler_1.1_70baf0213dbc5855_.log (deflated 82%)
2025-12-04T17:26:28.4828740Z   adding: test/test-reports/torch_np.numpy_tests.core.test_shape_base_1.1_0a0e6d68a930787e_.log (deflated 92%)
2025-12-04T17:26:28.4829701Z   adding: test/test-reports/test_vulkan_1.1_2892328dc9a2ec74_.log (deflated 48%)
2025-12-04T17:26:28.4830462Z   adding: test/test-reports/lazy.test_generator_1.1_9fb7d5917fd83b83_.log (deflated 55%)
2025-12-04T17:26:28.4831540Z   adding: test/test-reports/torch_np.numpy_tests.lib.test_index_tricks_1.1_e2c692e766f99011_.log (deflated 85%)
2025-12-04T17:26:28.4838585Z   adding: test/test-reports/torch_np.numpy_tests.linalg.test_linalg_1.1_f8a6a4a0c07965ac_.log (deflated 92%)
2025-12-04T17:26:28.4841464Z   adding: test/test-reports/test_jit_llga_fuser_1.1_a67e637a7f701026_.log (deflated 88%)
2025-12-04T17:26:28.4842269Z   adding: test/test-reports/optim.test_optim_1.1_e409dee8e8c07436_.log (deflated 7%)
2025-12-04T17:26:28.4843670Z   adding: test/test-reports/test_jit_autocast_1.1_9af7b4b8017e3406_.log (deflated 81%)
2025-12-04T17:26:28.4844895Z   adding: test/test-reports/torch_np.numpy_tests.core.test_getlimits_1.1_827a2f053af78584_.log (deflated 77%)
2025-12-04T17:26:28.4853117Z   adding: test/test-reports/torch_np.test_ndarray_methods_1.1_793e3aaaf30f7d3c_.log (deflated 94%)
2025-12-04T17:26:28.4859897Z   adding: test/test-reports/test_view_ops_1.1_405f53c81662ed35_.log (deflated 91%)
2025-12-04T17:26:28.4860893Z   adding: test/test-reports/test_type_info_1.1_9ab09808df8277a9_.log (deflated 61%)
2025-12-04T17:26:28.4877376Z   adding: test/test-reports/functorch.test_aotdispatch_1.1_9b74bc936a6dcdae_.log (deflated 92%)
2025-12-04T17:26:28.4879674Z   adding: test/test-reports/test_scatter_gather_ops_1.1_f3de59c3735d2471_.log (deflated 89%)
2025-12-04T17:26:28.4881722Z   adding: test/test-reports/test_cuda_multigpu_1.1_5809c25d23c9a947_.log (deflated 85%)
2025-12-04T17:26:28.4883799Z   adding: test/test-reports/torch_np.numpy_tests.core.test_scalar_ctors_1.1_4168b5c3b3d7f9be_.log (deflated 90%)
2025-12-04T17:26:28.4884978Z   adding: test/test-reports/torch_np.numpy_tests.lib.test_arraypad_1.1_867803734c4a045d_.log (deflated 72%)
2025-12-04T17:26:28.4915180Z ##[group]Run # Remove any previous debugging artifacts if they exist
2025-12-04T17:26:28.4915801Z [36;1m# Remove any previous debugging artifacts if they exist[0m
2025-12-04T17:26:28.4916285Z [36;1mrm -f debug-*.zip[0m
2025-12-04T17:26:28.4916619Z [36;1mif [ -d 'test/debug' ]; then[0m
2025-12-04T17:26:28.4917060Z [36;1m  zip -r "debug-${FILE_SUFFIX}.zip" test/debug[0m
2025-12-04T17:26:28.4917451Z [36;1mfi[0m
2025-12-04T17:26:28.4924246Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T17:26:28.4924691Z env:
2025-12-04T17:26:28.4924934Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:28.4925253Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:28.4925615Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:28.4926251Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:28.4927053Z   FILE_SUFFIX: test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248
2025-12-04T17:26:28.4927625Z ##[endgroup]
2025-12-04T17:26:28.5019853Z ##[group]Run seemethere/upload-artifact-s3@v5
2025-12-04T17:26:28.5020237Z with:
2025-12-04T17:26:28.5020492Z   s3-bucket: gha-artifacts
2025-12-04T17:26:28.5020865Z   s3-prefix: pytorch/pytorch/19922826259/1/artifact

2025-12-04T17:26:28.5021278Z   retention-days: 14
2025-12-04T17:26:28.5021801Z   if-no-files-found: warn
2025-12-04T17:26:28.5022385Z   path: test-jsons-*.zip
2025-12-04T17:26:28.5022816Z   name: artifact
2025-12-04T17:26:28.5023197Z   region: us-east-1
2025-12-04T17:26:28.5023639Z env:
2025-12-04T17:26:28.5023999Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:28.5024579Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:28.5025129Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:28.5025905Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:28.5026619Z ##[endgroup]
2025-12-04T17:26:28.9133280Z NOTE: s3-prefix specified, ignoring name parameter
2025-12-04T17:26:28.9133918Z With the provided path, there will be 1 file uploaded
2025-12-04T17:26:28.9134648Z Uploading to s3 prefix: pytorch/pytorch/19922826259/1/artifact
2025-12-04T17:26:28.9190866Z Starting upload of test-jsons-test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248.zip
2025-12-04T17:26:29.1147237Z Finished upload of test-jsons-test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248.zip
2025-12-04T17:26:29.1379339Z ##[group]Run seemethere/upload-artifact-s3@v5
2025-12-04T17:26:29.1379741Z with:
2025-12-04T17:26:29.1379988Z   s3-bucket: gha-artifacts
2025-12-04T17:26:29.1380365Z   s3-prefix: pytorch/pytorch/19922826259/1/artifact

2025-12-04T17:26:29.1380781Z   retention-days: 14
2025-12-04T17:26:29.1381063Z   if-no-files-found: error
2025-12-04T17:26:29.1381387Z   path: test-reports-*.zip
2025-12-04T17:26:29.1381804Z   name: artifact
2025-12-04T17:26:29.1382053Z   region: us-east-1
2025-12-04T17:26:29.1382316Z env:
2025-12-04T17:26:29.1382564Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:29.1382875Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:29.1383232Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:29.1383885Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:29.1384461Z ##[endgroup]
2025-12-04T17:26:29.5556382Z NOTE: s3-prefix specified, ignoring name parameter
2025-12-04T17:26:29.5556950Z With the provided path, there will be 1 file uploaded
2025-12-04T17:26:29.5557477Z Uploading to s3 prefix: pytorch/pytorch/19922826259/1/artifact
2025-12-04T17:26:29.5612200Z Starting upload of test-reports-test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248.zip
2025-12-04T17:26:29.8075705Z Finished upload of test-reports-test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248.zip
2025-12-04T17:26:29.8289130Z ##[group]Run seemethere/upload-artifact-s3@v5
2025-12-04T17:26:29.8289543Z with:
2025-12-04T17:26:29.8289807Z   s3-bucket: gha-artifacts
2025-12-04T17:26:29.8290195Z   s3-prefix: pytorch/pytorch/19922826259/1/artifact

2025-12-04T17:26:29.8290608Z   retention-days: 14
2025-12-04T17:26:29.8290896Z   if-no-files-found: ignore
2025-12-04T17:26:29.8291222Z   path: logs-*.zip
2025-12-04T17:26:29.8291497Z   name: artifact
2025-12-04T17:26:29.8291753Z   region: us-east-1
2025-12-04T17:26:29.8292020Z env:
2025-12-04T17:26:29.8292291Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:29.8292588Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:29.8292962Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:29.8293783Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:29.8294355Z ##[endgroup]
2025-12-04T17:26:30.2082494Z NOTE: s3-prefix specified, ignoring name parameter
2025-12-04T17:26:30.2083046Z With the provided path, there will be 1 file uploaded
2025-12-04T17:26:30.2083590Z Uploading to s3 prefix: pytorch/pytorch/19922826259/1/artifact
2025-12-04T17:26:30.2138543Z Starting upload of logs-test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248.zip
2025-12-04T17:26:30.4403939Z Finished upload of logs-test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248.zip
2025-12-04T17:26:30.4614787Z ##[group]Run seemethere/upload-artifact-s3@v5
2025-12-04T17:26:30.4615170Z with:
2025-12-04T17:26:30.4615430Z   s3-bucket: gha-artifacts
2025-12-04T17:26:30.4615953Z   s3-prefix: pytorch/pytorch/19922826259/1/artifact

2025-12-04T17:26:30.4616481Z   retention-days: 14
2025-12-04T17:26:30.4616775Z   if-no-files-found: ignore
2025-12-04T17:26:30.4617098Z   path: debug-*.zip
2025-12-04T17:26:30.4617373Z   name: artifact
2025-12-04T17:26:30.4617630Z   region: us-east-1
2025-12-04T17:26:30.4617894Z env:
2025-12-04T17:26:30.4618137Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:30.4618431Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:30.4618810Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:30.4619457Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:30.4620017Z ##[endgroup]
2025-12-04T17:26:30.8339120Z No files were found with the provided path: debug-*.zip. No artifacts will be uploaded.
2025-12-04T17:26:30.8556195Z ##[group]Run # shellcheck disable=SC2156
2025-12-04T17:26:30.8556657Z [36;1m# shellcheck disable=SC2156[0m
2025-12-04T17:26:30.8557345Z [36;1mfind . -iname "core.[1-9]*" -exec docker exec "${DOCKER_CONTAINER_ID}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \;[0m
2025-12-04T17:26:30.8564494Z shell: /usr/bin/bash -e {0}
2025-12-04T17:26:30.8564820Z env:
2025-12-04T17:26:30.8565070Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:30.8565383Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:30.8565740Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:30.8566392Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:30.8567075Z ##[endgroup]
2025-12-04T17:26:31.2346710Z ##[group]Run seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a
2025-12-04T17:26:31.2347300Z with:
2025-12-04T17:26:31.2347719Z   name: coredumps-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu
2025-12-04T17:26:31.2348242Z   retention-days: 14
2025-12-04T17:26:31.2348545Z   if-no-files-found: ignore
2025-12-04T17:26:31.2348850Z   path: ./**/core.[1-9]*
2025-12-04T17:26:31.2349155Z   s3-bucket: gha-artifacts
2025-12-04T17:26:31.2349472Z   region: us-east-1
2025-12-04T17:26:31.2349724Z env:
2025-12-04T17:26:31.2349965Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:31.2350271Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:31.2350627Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:31.2351276Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:31.2351848Z ##[endgroup]
2025-12-04T17:26:40.9348455Z No files were found with the provided path: ./**/core.[1-9]*. No artifacts will be uploaded.
2025-12-04T17:26:40.9676380Z Prepare all required actions
2025-12-04T17:26:40.9676842Z Getting action download info
2025-12-04T17:26:41.1182041Z Download action repository 'actions/setup-python@v6' (SHA:83679a892e2d95755f2dac6acb0bfd1e9ac5d548)
2025-12-04T17:26:41.5839914Z ##[group]Run ./.github/actions/upload-utilization-stats
2025-12-04T17:26:41.5840333Z with:
2025-12-04T17:26:41.5840582Z   job_id: 57119749248
2025-12-04T17:26:41.5841306Z   job_name: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable)
2025-12-04T17:26:41.5842122Z   workflow_name: periodic
2025-12-04T17:26:41.5842424Z   workflow_run_id: 19922826259
2025-12-04T17:26:41.5842744Z   workflow_attempt: 1
2025-12-04T17:26:41.5843013Z env:
2025-12-04T17:26:41.5843245Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:41.5843551Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:41.5843924Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:41.5844708Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:41.5845269Z ##[endgroup]
2025-12-04T17:26:41.5904366Z ##[group]Run actions/setup-python@v6
2025-12-04T17:26:41.5904723Z with:
2025-12-04T17:26:41.5904979Z   python-version: 3.10
2025-12-04T17:26:41.5905264Z   check-latest: false
2025-12-04T17:26:41.5905686Z   token: ***
2025-12-04T17:26:41.5905956Z   update-environment: true
2025-12-04T17:26:41.5906281Z   allow-prereleases: false
2025-12-04T17:26:41.5906704Z   freethreaded: false
2025-12-04T17:26:41.5906979Z env:
2025-12-04T17:26:41.5907221Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:41.5907513Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:41.5907884Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:41.5908539Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:41.5909104Z ##[endgroup]
2025-12-04T17:26:41.7579925Z ##[group]Installed versions
2025-12-04T17:26:41.7590806Z Version 3.10 was not found in the local cache
2025-12-04T17:26:41.7796261Z (node:368369) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
2025-12-04T17:26:41.7797227Z (Use `node --trace-deprecation ...` to show where the warning was created)
2025-12-04T17:26:42.1058796Z ##[error]The version '3.10' with architecture 'x64' was not found for this operating system.
The list of all available versions can be found here: https://raw.githubusercontent.com/actions/python-versions/main/versions-manifest.json
2025-12-04T17:26:42.1235178Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main
2025-12-04T17:26:42.1235698Z with:
2025-12-04T17:26:42.1235922Z env:
2025-12-04T17:26:42.1236167Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:42.1236479Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:42.1236850Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:42.1237490Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:42.1238174Z ##[endgroup]
2025-12-04T17:26:42.1256690Z ##[group]Run set -eou pipefail
2025-12-04T17:26:42.1257063Z [36;1mset -eou pipefail[0m
2025-12-04T17:26:42.1257372Z [36;1m[0m
2025-12-04T17:26:42.1257800Z [36;1mecho "Holding runner for 2 hours until all ssh sessions have logged out"[0m
2025-12-04T17:26:42.1258346Z [36;1mfor _ in $(seq 1440); do[0m
2025-12-04T17:26:42.1258723Z [36;1m    # Break if no ssh session exists anymore[0m
2025-12-04T17:26:42.1259145Z [36;1m    if [ "$(who)" = "" ]; then[0m
2025-12-04T17:26:42.1259536Z [36;1m      break[0m
2025-12-04T17:26:42.1259812Z [36;1m    fi[0m
2025-12-04T17:26:42.1260075Z [36;1m    echo "."[0m
2025-12-04T17:26:42.1260355Z [36;1m    sleep 5[0m
2025-12-04T17:26:42.1260617Z [36;1mdone[0m
2025-12-04T17:26:42.1267528Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T17:26:42.1267976Z env:
2025-12-04T17:26:42.1268217Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:42.1268542Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:42.1268910Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:42.1269541Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:42.1270123Z ##[endgroup]
2025-12-04T17:26:42.1299439Z Holding runner for 2 hours until all ssh sessions have logged out
2025-12-04T17:26:42.1381137Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty
2025-12-04T17:26:42.1381815Z [36;1m# ignore expansion of "docker ps -q" since it could be empty[0m
2025-12-04T17:26:42.1382356Z [36;1m# shellcheck disable=SC2046[0m
2025-12-04T17:26:42.1382749Z [36;1mdocker stop $(docker ps -q) || true[0m
2025-12-04T17:26:42.1383141Z [36;1m# Prune all of the docker images[0m
2025-12-04T17:26:42.1383527Z [36;1mdocker system prune -af[0m
2025-12-04T17:26:42.1391767Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T17:26:42.1392217Z env:
2025-12-04T17:26:42.1392481Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:26:42.1392782Z   HAS_NVIDIA_GPU: true
2025-12-04T17:26:42.1393155Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:26:42.1393812Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:42.1394396Z ##[endgroup]
2025-12-04T17:26:53.2018792Z 764ff984146f
2025-12-04T17:26:58.5413707Z Deleted Containers:
2025-12-04T17:26:58.5414211Z 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:26:58.5414866Z 
2025-12-04T17:27:07.5200864Z Deleted Images:
2025-12-04T17:27:07.5201899Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T17:27:07.5203434Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image@sha256:ae30f11a5b50741bd652aa0c94ad89ef791c4e50157eff642748620825cf7940
2025-12-04T17:27:07.5204497Z deleted: sha256:5465aa79632b68f6240c23f0d0b021df4d0fd595333b61a40d36a0cf73656024
2025-12-04T17:27:07.5205254Z deleted: sha256:f57a578c46f36a858c2be92210a89558688ee36b619af78c698952c0e3ef05ad
2025-12-04T17:27:07.5206015Z deleted: sha256:ce0698bd1efc811ccead0ecdad944b4839bf17bff387495b58e64cf8db0e210c
2025-12-04T17:27:07.5206774Z deleted: sha256:f0ee66f328fa98c40f336c64fee9a4b42e51a793cceea7f81932068bdc7bd315
2025-12-04T17:27:07.5207545Z deleted: sha256:ea24b30a25c161bd4bd564bfd90c36d88674a1aa59ef3e65647e926c76685be0
2025-12-04T17:27:07.5208312Z deleted: sha256:15bc0847ce5e60cc1a9b36d25283dc5648fb45e04aa9a8dec984af3c193e2f0b
2025-12-04T17:27:07.5209467Z deleted: sha256:3639aa26691090ef45641c75bffcb2e3f427f5e282abc93d607de4433bf90488
2025-12-04T17:27:07.5210206Z deleted: sha256:86258272ba477934c917d08b21e0da6000c268b60f5a9ae907038e7bf3236532
2025-12-04T17:27:07.5210960Z deleted: sha256:ba8e0040c98ddbf87acbc3ae6575b2933c09421ac7094a96e027d1fc9356fbb6
2025-12-04T17:27:07.5211728Z deleted: sha256:ca0176fc0de6cc059c4dbfc313434b5dea2c90dc24f2dc3a1061b941c7b3e6ca
2025-12-04T17:27:07.5212580Z deleted: sha256:cc6a480ab9e6091c6c206bc9b340611b3863258975e835769bd8f2a38b5d8c13
2025-12-04T17:27:07.5213333Z deleted: sha256:8465c24f0b284d8589ea191edeb80d1da07e4a59dfcfdcfa153bdf3d5d678d3e
2025-12-04T17:27:07.5214097Z deleted: sha256:b93bfbd3b55899c606fb98c5edbd21fd63114862a4f5a5b67c7aa63fc9ada9a3
2025-12-04T17:27:07.5214861Z deleted: sha256:6b7582e3ce445d82e9d2ae7769502119c39c1edbf5fe11c195615db8da846931
2025-12-04T17:27:07.5215597Z deleted: sha256:9d79615a9d9ae67110cc9da697933492b385b1e4708d30c2211625bea5d42f27
2025-12-04T17:27:07.5216429Z deleted: sha256:7132c6db5e7d5692786167dfb22dea62d8203dc7837b2d1de435c6e5c85e906e
2025-12-04T17:27:07.5217186Z deleted: sha256:d61bc13a0957d633ff633186c6cbdf48da1c551991d814281262e58709e225a8
2025-12-04T17:27:07.5217943Z deleted: sha256:0c348bbc3988acd329b3e42de4d2c73d5dc4942618716ca312d389d4f704f4bb
2025-12-04T17:27:07.5218684Z deleted: sha256:28d30dd15686ab6819c2f03388c9999bbdaef35e8756817297d795e00dd623fc
2025-12-04T17:27:07.5219449Z deleted: sha256:0a57608df6cffb31a0b24f2537b4dfe7a55bbe6ea02216703cc3172062ab9d75
2025-12-04T17:27:07.5220211Z deleted: sha256:43d23f49f4d70a54b4aff6f4f10d5c5a3d75b100abbbf281ad510177cc80cd99
2025-12-04T17:27:07.5220964Z deleted: sha256:f9e33c2e4c7b8e7179fba052da4d7c4acdc8287f253c95328ae04055755f88a4
2025-12-04T17:27:07.5221728Z deleted: sha256:cfce0930cf33c7136fc92511b9bcad570958363b55e9e0c82e9b8ebc29301356
2025-12-04T17:27:07.5222483Z deleted: sha256:9a709ae20528f500f51271ad2ce6a3d7196fe814a28ae73881901ecef9748c2a
2025-12-04T17:27:07.5223238Z deleted: sha256:68a1d16e9392be6fe939a58c5f941a0919408b5852e52cb04027b0b8777e2b0e
2025-12-04T17:27:07.5223975Z deleted: sha256:042a0022b3eea78f54015f4cf2888bcfa3b91deb0b08830a33c2814b93285dd9
2025-12-04T17:27:07.5224735Z deleted: sha256:a7ba703ff0aa305a608f3b4afd89c2ecd0d1244b127629145a2e691490abb271
2025-12-04T17:27:07.5225510Z deleted: sha256:be44f5fbae55066faba60eebf7065a082abf517ab8f2ebf8ece69e74d45def07
2025-12-04T17:27:07.5226290Z deleted: sha256:a01f1b0d88a8936d648f78787f56579bdb6617edf4620d0410ab6b118351bbb2
2025-12-04T17:27:07.5227031Z deleted: sha256:dc93f45553adafb5c6e7473711c833996f6884dab2da708ffc76b5cf65b8db9d
2025-12-04T17:27:07.5227802Z deleted: sha256:ffdba9ecb5890a9cb23368d781ff5484270b7f13c6d5629feca3512b58b9a0ac
2025-12-04T17:27:07.5228551Z deleted: sha256:268a91c420865628895871795b524436f5cc4403aa53d71f457db21bf42dd530
2025-12-04T17:27:07.5229297Z deleted: sha256:72450bfd97986ccc53d8fa76252130b464fdb3c5fd8e688546e8c3ce0b9d4394
2025-12-04T17:27:07.5230060Z deleted: sha256:63954235d3be0420af6ad2dae2b24849e3eee1edb10cf86d29137c3e19621f47
2025-12-04T17:27:07.5230914Z deleted: sha256:1c4e2d3e68e8a166d1965962077fe194ea00cad2ee636399c0c17ba5a94bdb9c
2025-12-04T17:27:07.5231687Z deleted: sha256:361cacbab7154a0cb62486f57d75b112feedbcc751a7d8f7bb02ec7a61b1fe0d
2025-12-04T17:27:07.5232454Z deleted: sha256:e653f6af92265f4300717bd617aab954cfbf049d4be32e890e57c2e8135be7f9
2025-12-04T17:27:07.5233212Z deleted: sha256:bfffeb2974ffc58c0669724812f701df860257ac3d047a7315a100beb0ea0507
2025-12-04T17:27:07.5233966Z deleted: sha256:6ae48d8efc75420f721058928fe8b1ccf48aa1bdc92de539b1f0db9248a41fcf
2025-12-04T17:27:07.5234725Z deleted: sha256:535c7026785a690366fc69ecbc9a81f1b58a46f63c782620591c1297406a2731
2025-12-04T17:27:07.5235482Z deleted: sha256:8462076c3cc8db6030f38e1137bfbef1aad85404ed4231285c1e06cd414d3e57
2025-12-04T17:27:07.5236234Z deleted: sha256:fe340d63ccb66e5b395b7900c1002a513e4afd7f610e9df5e7262c4f71e93bef
2025-12-04T17:27:07.5236994Z deleted: sha256:b61085386114396fe42144a4aa739b2a0b45f0c30a083462a2ea7b9b675c02aa
2025-12-04T17:27:07.5237856Z deleted: sha256:7772f25c05bcd5ede631d287b826aa108db67c773e377db98ffa73b0917f3629
2025-12-04T17:27:07.5238631Z deleted: sha256:3ea8a43d8193d05ecd6aa473b523a3569e11ae691eed9e6ffd693f23b0106035
2025-12-04T17:27:07.5239375Z deleted: sha256:34647b4087d29cf48a18668bb935a95fc8b2dac3522c2581397f0f27227047fd
2025-12-04T17:27:07.5240192Z deleted: sha256:b6a169f1ab01281c16562ad43b462a1a47a33be8d3cfae0a117ffa5c47d0b532
2025-12-04T17:27:07.5240992Z deleted: sha256:664173a33cd21248a2d73d2eba7887602e36fbc96002d991eb0bd0a2d574ac88
2025-12-04T17:27:07.5241751Z deleted: sha256:d67fdfe94c9a0228f17991cd3e958e36da96d4d597b46773cb7eed98c489f947
2025-12-04T17:27:07.5242560Z deleted: sha256:f2be0722250908742f067756b56ed3fa169daa2f1c8201a7ed4335b2fed2cae5
2025-12-04T17:27:07.5243302Z deleted: sha256:8614db257d8dc9e0f0ee8398a4a4d3c061b2797d6017daaf0696dd7f87633b3e
2025-12-04T17:27:07.5244063Z deleted: sha256:23ee0908a1bf254f1d4dd0591cc0c6801571b4d93950b6fd4fee57ca7e361da0
2025-12-04T17:27:07.5244836Z deleted: sha256:f627a99df4c0f370bd7fc8ea6be7695d8027f988aed52b65233cbcf78b01989b
2025-12-04T17:27:07.5245576Z deleted: sha256:d5e92389b59d4134cdb96113af964186602e98c392e76a8f26d4ea6e54056ccc
2025-12-04T17:27:07.5246337Z deleted: sha256:cbfccf44b9dc670c109634fbf19c2bfff2a3d5243bfa351c851d9fad3f1acfc2
2025-12-04T17:27:07.5247099Z deleted: sha256:1242535e81ad4bd713910a6c5e1b38375b12ed1bcd1b48419813a5ef28a5c84c
2025-12-04T17:27:07.5247848Z deleted: sha256:10b1394079cfe756a1ad9aa9aa3a2995bd5e46ef1e18029eb9eae0398f6d4e88
2025-12-04T17:27:07.5248589Z deleted: sha256:1d32da9a5f10e10c4a97a839151a1943d4db18494e8080bea91a6c9784fde067
2025-12-04T17:27:07.5249340Z deleted: sha256:af2fd59653ebd685a032ef800f8227c0d7b9b0e5ef397b30d4301e001c943e8b
2025-12-04T17:27:07.5250101Z deleted: sha256:c48d351980e3bd24d533ae55d1acc6a27911dffcbb03b2ae552d7ccc3e4cd74f
2025-12-04T17:27:07.5250849Z deleted: sha256:e663afac609b1b6c812ab45265c27d870b92c9fc6849939f0b8635da83cbfb53
2025-12-04T17:27:07.5251595Z deleted: sha256:f79dc17668331d4214ef24000d5c54a0bb2ba70f152d8523f571e2b76a303f4f
2025-12-04T17:27:07.5252351Z deleted: sha256:00de9606a6cd2a2dfb4ceffcb076474d027a1f6273894677090aee7478035865
2025-12-04T17:27:07.5253108Z deleted: sha256:cf35fe1d0317253b75ee17c12783c2561faebf9bf2c59c07ad4712c053246586
2025-12-04T17:27:07.5253841Z deleted: sha256:06622801490739d9db884c23c05a31a1ee86c41e888b34c3ccef23d37f2bdbb5
2025-12-04T17:27:07.5254594Z deleted: sha256:df5dafcaee865ddfb66e22075c63769836e01a627d6fe46658b6f4b4a25318d3
2025-12-04T17:27:07.5255365Z deleted: sha256:7949ae5c4df921feb0e2cd7bac1e402e1ab9135e758fa41cd567880b354b40bc
2025-12-04T17:27:07.5256119Z deleted: sha256:9f19148d820adb1d6e86d0ce68e21fbcedafa7c7ec6c45c9004fa3a607096923
2025-12-04T17:27:07.5256970Z deleted: sha256:1d37d963e85ce22ffaab56a1cf35b3411f34f9432dc5e49ebbdf6f30816cdfa8
2025-12-04T17:27:07.5257740Z deleted: sha256:bac6d91e3830e51e96879deaa3e6d0d39da076fa802ebda68f81bdf7ef8342d5
2025-12-04T17:27:07.5258495Z deleted: sha256:ffd496b07151c90e7ddd68a81a36471f51a544187982db5e34621358e1b29681
2025-12-04T17:27:07.5259418Z deleted: sha256:890b2042bdb9e22a614cea1be88366cd3ae15159bf78ac510b9daa6f802493a6
2025-12-04T17:27:07.5260188Z deleted: sha256:ddd9a57b20a8b45ae0e8e350ec266d50a1b9e9a7ff4921470eb38f004d50eb20
2025-12-04T17:27:07.5260950Z deleted: sha256:2f4f91684b8221bc5cbc3f14c7e00bb693854027a1a6de5ad6bdcd000bb579f2
2025-12-04T17:27:07.5261717Z deleted: sha256:9c01ec5e73233284a0f9bb42de59696a1fa61caacacdf63d04df5ebd73895d77
2025-12-04T17:27:07.5262465Z deleted: sha256:f6153a90f0f5316b03f1464826325a1578231b89b3c1f1c83cc7cebdd41cee2a
2025-12-04T17:27:07.5263209Z deleted: sha256:4e89cd2181813af7fd2219923bae493e33111d8b4ebd76f257b7fb26744fda28
2025-12-04T17:27:07.5263968Z deleted: sha256:a0b77eb4054db8f2ea2ec957b3941b4aeee14b59e94a99a1521f90d6e41faf0e
2025-12-04T17:27:07.5264704Z deleted: sha256:1a1b2848f15aa5114f5a67e3705439512880bf1a7a6436cc67760c59b5f10c46
2025-12-04T17:27:07.5265437Z deleted: sha256:004fc01362840c164664c18580e479546fa0b7f9599487558f80190aec30e2b5
2025-12-04T17:27:07.5266269Z deleted: sha256:35f36e20799f0a0dead81bc3701732e43489264e6bee9fcb789b376a99e17e78
2025-12-04T17:27:07.5267020Z deleted: sha256:1207fd2ede86015c3f105620cb491e8199d2060a4a87490de358286d0ae52e4e
2025-12-04T17:27:07.5267768Z deleted: sha256:02dccb85ee744d1fbb819c6da618b2c52a3e4affc89e407f79b875e7b3bbb7df
2025-12-04T17:27:07.5268551Z deleted: sha256:d22e6ff9c3ac9dabbcc6052e1459f8dc4ebd19bd057bd0688615d6cc3ebb5cf0
2025-12-04T17:27:07.5269320Z deleted: sha256:73974f74b436f39a2fdb6461b1e3f7c3e41c73325776fa71d16b942a5b4a365b
2025-12-04T17:27:07.5269977Z untagged: public.ecr.aws/docker/library/python:3.13
2025-12-04T17:27:07.5270810Z untagged: public.ecr.aws/docker/library/python@sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0
2025-12-04T17:27:07.5271995Z deleted: sha256:44438aecfedf7b6086fce506dae0db5ba7fc0027f9b743f1a75a6b5cbc7de70a
2025-12-04T17:27:07.5272769Z deleted: sha256:6f09a1f5d8a107c2532fbd116e75116cb75fa77b1a7d72d3bdf1ac12de152acd
2025-12-04T17:27:07.5273524Z deleted: sha256:fe5f3ac0be086125eb1e3cd10cc33e8e426f4e079381f7ce5a987b626e99fa67
2025-12-04T17:27:07.5274292Z deleted: sha256:79dd2061a22cf919cfc4f1f02704bfda09afadb017265e670ee54441d296c06c
2025-12-04T17:27:07.5275063Z deleted: sha256:9447ad402aafdbee17e999b0ec84ad89c2646dbebf054d469d4f8bee77f66212
2025-12-04T17:27:07.5275815Z deleted: sha256:7a4909f3c1975be52292f53107495ee1b41c17494918767ccedf1cf1688ae318
2025-12-04T17:27:07.5276541Z deleted: sha256:3474923d97f1f498237650a7d51bd4aea37d5e6b9d8a778777920584af5dd560
2025-12-04T17:27:07.5277291Z deleted: sha256:683afd1773444401a9cbd24842ee5d9154a11abb4fab63ddea5c03df788597ee
2025-12-04T17:27:07.5277741Z 
2025-12-04T17:27:07.5277882Z Total reclaimed space: 36GB
2025-12-04T17:27:07.5316525Z ##[group]Run set +e
2025-12-04T17:27:07.5316942Z [36;1mset +e[0m
2025-12-04T17:27:07.5317204Z [36;1mset -x[0m
2025-12-04T17:27:07.5317465Z [36;1m[0m
2025-12-04T17:27:07.5317717Z [36;1mnvidia-smi[0m
2025-12-04T17:27:07.5318254Z [36;1m# NB: Surprisingly, nvidia-smi command returns successfully with return code 0 even in[0m
2025-12-04T17:27:07.5319081Z [36;1m# the case where the driver has already crashed as it still can get the driver version[0m
2025-12-04T17:27:07.5319883Z [36;1m# and some basic information like the bus ID.  However, the rest of the information[0m
2025-12-04T17:27:07.5320491Z [36;1m# would be missing (ERR!), for example:[0m
2025-12-04T17:27:07.5320856Z [36;1m#[0m
2025-12-04T17:27:07.5321217Z [36;1m# +-----------------------------------------------------------------------------+[0m
2025-12-04T17:27:07.5321850Z [36;1m# | NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |[0m
2025-12-04T17:27:07.5322507Z [36;1m# |-------------------------------+----------------------+----------------------+[0m
2025-12-04T17:27:07.5323139Z [36;1m# | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |[0m
2025-12-04T17:27:07.5323834Z [36;1m# | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |[0m
2025-12-04T17:27:07.5324504Z [36;1m# |                               |                      |               MIG M. |[0m
2025-12-04T17:27:07.5324931Z [36;1m# |===============================+======================+======================|[0m
2025-12-04T17:27:07.5325427Z [36;1m# |   0  ERR!                Off  | 00000000:00:1E.0 Off |                 ERR! |[0m
2025-12-04T17:27:07.5326000Z [36;1m# |ERR!  ERR! ERR!    ERR! / ERR! |   4184MiB / 23028MiB |    ERR!      Default |[0m
2025-12-04T17:27:07.5326525Z [36;1m# |                               |                      |                 ERR! |[0m
2025-12-04T17:27:07.5327018Z [36;1m# +-------------------------------+----------------------+----------------------+[0m
2025-12-04T17:27:07.5327469Z [36;1m#[0m
2025-12-04T17:27:07.5327824Z [36;1m# +-----------------------------------------------------------------------------+[0m
2025-12-04T17:27:07.5328370Z [36;1m# | Processes:                                                                  |[0m
2025-12-04T17:27:07.5328925Z [36;1m# |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |[0m
2025-12-04T17:27:07.5329454Z [36;1m# |        ID   ID                                                   Usage      |[0m
2025-12-04T17:27:07.5329899Z [36;1m# |=============================================================================|[0m
2025-12-04T17:27:07.5330396Z [36;1m# +-----------------------------------------------------------------------------+[0m
2025-12-04T17:27:07.5330903Z [36;1m#[0m
2025-12-04T17:27:07.5331354Z [36;1m# This should be reported as a failure instead as it will guarantee to fail when[0m
2025-12-04T17:27:07.5331955Z [36;1m# Docker tries to run with --gpus all[0m
2025-12-04T17:27:07.5332331Z [36;1m#[0m
2025-12-04T17:27:07.5332750Z [36;1m# So, the correct check here is to query one of the missing piece of info like[0m
2025-12-04T17:27:07.5333371Z [36;1m# GPU name, so that the command can fail accordingly[0m
2025-12-04T17:27:07.5333953Z [36;1mnvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0[0m
2025-12-04T17:27:07.5334448Z [36;1mNVIDIA_SMI_STATUS=$?[0m
2025-12-04T17:27:07.5334768Z [36;1m[0m
2025-12-04T17:27:07.5335287Z [36;1m# These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action[0m
2025-12-04T17:27:07.5336064Z [36;1mif [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then[0m
2025-12-04T17:27:07.5336861Z [36;1m  echo "NVIDIA driver installation has failed, shutting down the runner..."[0m
2025-12-04T17:27:07.5337464Z [36;1m  .github/scripts/stop_runner_service.sh[0m
2025-12-04T17:27:07.5337850Z [36;1mfi[0m
2025-12-04T17:27:07.5338085Z [36;1m[0m
2025-12-04T17:27:07.5338674Z [36;1m# For runner with multiple GPUs, we also want to confirm that the number of GPUs are the[0m
2025-12-04T17:27:07.5339421Z [36;1m# power of 2, i.e. 1, 2, 4, or 8. This is to avoid flaky test issue when one GPU fails[0m
2025-12-04T17:27:07.5340045Z [36;1m# https://github.com/pytorch/test-infra/issues/4000[0m
2025-12-04T17:27:07.5340549Z [36;1mGPU_COUNT=$(nvidia-smi --list-gpus | wc -l)[0m
2025-12-04T17:27:07.5340970Z [36;1mNVIDIA_SMI_STATUS=$?[0m
2025-12-04T17:27:07.5341286Z [36;1m[0m
2025-12-04T17:27:07.5341789Z [36;1m# These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action[0m
2025-12-04T17:27:07.5342558Z [36;1mif [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then[0m
2025-12-04T17:27:07.5343258Z [36;1m  echo "NVIDIA driver installation has failed, shutting down the runner..."[0m
2025-12-04T17:27:07.5343859Z [36;1m  .github/scripts/stop_runner_service.sh[0m
2025-12-04T17:27:07.5344232Z [36;1mfi[0m
2025-12-04T17:27:07.5344479Z [36;1m[0m
2025-12-04T17:27:07.5344769Z [36;1m# Check the GPU count to be a power of 2[0m
2025-12-04T17:27:07.5345429Z [36;1mif [ "$GPU_COUNT" -le 8 ] && [ "$GPU_COUNT" -ne 1 ] && [ "$GPU_COUNT" -ne 2 ] && [ "$GPU_COUNT" -ne 4 ] && [ "$GPU_COUNT" -ne 8 ]; then[0m
2025-12-04T17:27:07.5346315Z [36;1m  echo "NVIDIA driver detects $GPU_COUNT GPUs. The runner has a broken GPU, shutting it down..."[0m
2025-12-04T17:27:07.5347039Z [36;1m  .github/scripts/stop_runner_service.sh[0m
2025-12-04T17:27:07.5347423Z [36;1mfi[0m
2025-12-04T17:27:07.5356303Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T17:27:07.5356753Z env:
2025-12-04T17:27:07.5357008Z   GIT_DEFAULT_BRANCH: main
2025-12-04T17:27:07.5357310Z   HAS_NVIDIA_GPU: true
2025-12-04T17:27:07.5357688Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T17:27:07.5358333Z   DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad
2025-12-04T17:27:07.5358911Z ##[endgroup]
2025-12-04T17:27:07.5391016Z + nvidia-smi
2025-12-04T17:27:07.5597303Z Thu Dec  4 17:27:07 2025       
2025-12-04T17:27:07.5597747Z +-----------------------------------------------------------------------------+
2025-12-04T17:27:07.5598355Z | NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
2025-12-04T17:27:07.5598949Z |-------------------------------+----------------------+----------------------+
2025-12-04T17:27:07.5599542Z | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
2025-12-04T17:27:07.5600194Z | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
2025-12-04T17:27:07.5600723Z |                               |                      |               MIG M. |
2025-12-04T17:27:07.5601117Z |===============================+======================+======================|
2025-12-04T17:27:07.5760129Z |   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
2025-12-04T17:27:07.5760673Z | N/A   27C    P8    13W /  70W |      2MiB / 15360MiB |      0%      Default |
2025-12-04T17:27:07.5761125Z |                               |                      |                  N/A |
2025-12-04T17:27:07.5761592Z +-------------------------------+----------------------+----------------------+
2025-12-04T17:27:07.5762065Z                                                                                
2025-12-04T17:27:07.5762531Z +-----------------------------------------------------------------------------+
2025-12-04T17:27:07.5763025Z | Processes:                                                                  |
2025-12-04T17:27:07.5763547Z |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
2025-12-04T17:27:07.5764036Z |        ID   ID                                                   Usage      |
2025-12-04T17:27:07.5764458Z |=============================================================================|
2025-12-04T17:27:07.5765461Z |  No running processes found                                                 |
2025-12-04T17:27:07.5766035Z +-----------------------------------------------------------------------------+
2025-12-04T17:27:07.6584397Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0
2025-12-04T17:27:07.6762989Z Tesla T4
2025-12-04T17:27:07.6805480Z + NVIDIA_SMI_STATUS=0
2025-12-04T17:27:07.6805858Z + '[' 0 -ne 0 ']'
2025-12-04T17:27:07.6811582Z ++ nvidia-smi --list-gpus
2025-12-04T17:27:07.6812928Z ++ wc -l
2025-12-04T17:27:07.7033066Z + GPU_COUNT=1
2025-12-04T17:27:07.7033407Z + NVIDIA_SMI_STATUS=0
2025-12-04T17:27:07.7033704Z + '[' 0 -ne 0 ']'
2025-12-04T17:27:07.7033974Z + '[' 1 -le 8 ']'
2025-12-04T17:27:07.7034234Z + '[' 1 -ne 1 ']'
2025-12-04T17:27:07.7133497Z Post job cleanup.
2025-12-04T17:27:07.7243322Z Post job cleanup.
2025-12-04T17:27:07.7296856Z Post job cleanup.
2025-12-04T17:27:07.8492438Z [command]/usr/bin/git version
2025-12-04T17:27:07.8539429Z git version 2.50.1
2025-12-04T17:27:07.8582273Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/84ee311e-a21c-45e6-bbc0-dd53cf2d8378/.gitconfig'
2025-12-04T17:27:07.8593246Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/84ee311e-a21c-45e6-bbc0-dd53cf2d8378' before making global git config changes
2025-12-04T17:27:07.8594402Z Adding repository directory to the temporary git global config as a safe directory
2025-12-04T17:27:07.8598942Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T17:27:07.8642425Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2025-12-04T17:27:07.8689087Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :"
2025-12-04T17:27:07.9045544Z Entering 'android/libs/fbjni'
2025-12-04T17:27:07.9116247Z Entering 'third_party/FP16'
2025-12-04T17:27:07.9181957Z Entering 'third_party/FXdiv'
2025-12-04T17:27:07.9247183Z Entering 'third_party/NNPACK'
2025-12-04T17:27:07.9315220Z Entering 'third_party/NVTX'
2025-12-04T17:27:07.9383350Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T17:27:07.9450762Z Entering 'third_party/XNNPACK'
2025-12-04T17:27:07.9534515Z Entering 'third_party/aiter'
2025-12-04T17:27:07.9601844Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T17:27:07.9676107Z Entering 'third_party/benchmark'
2025-12-04T17:27:07.9743146Z Entering 'third_party/composable_kernel'
2025-12-04T17:27:07.9821389Z Entering 'third_party/cpp-httplib'
2025-12-04T17:27:07.9886561Z Entering 'third_party/cpuinfo'
2025-12-04T17:27:07.9953867Z Entering 'third_party/cudnn_frontend'
2025-12-04T17:27:08.0020449Z Entering 'third_party/cutlass'
2025-12-04T17:27:08.0101588Z Entering 'third_party/fbgemm'
2025-12-04T17:27:08.0171283Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T17:27:08.0239915Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T17:27:08.0316591Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T17:27:08.0383140Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T17:27:08.0458476Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T17:27:08.0525095Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T17:27:08.0590350Z Entering 'third_party/fbgemm/external/json'
2025-12-04T17:27:08.0660362Z Entering 'third_party/flash-attention'
2025-12-04T17:27:08.0729126Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T17:27:08.0802721Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T17:27:08.0879733Z Entering 'third_party/flatbuffers'
2025-12-04T17:27:08.0952382Z Entering 'third_party/fmt'
2025-12-04T17:27:08.1018158Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T17:27:08.1085811Z Entering 'third_party/gloo'
2025-12-04T17:27:08.1151573Z Entering 'third_party/googletest'
2025-12-04T17:27:08.1221150Z Entering 'third_party/ideep'
2025-12-04T17:27:08.1286820Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T17:27:08.1360685Z Entering 'third_party/ittapi'
2025-12-04T17:27:08.1429425Z Entering 'third_party/kineto'
2025-12-04T17:27:08.1497852Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T17:27:08.1562748Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T17:27:08.1629025Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T17:27:08.1696086Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T17:27:08.1763169Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T17:27:08.1827433Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T17:27:08.1896972Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T17:27:08.1964840Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T17:27:08.2030428Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T17:27:08.2097004Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T17:27:08.2163453Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T17:27:08.2230592Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T17:27:08.2299756Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T17:27:08.2372714Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T17:27:08.2438156Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T17:27:08.2508137Z Entering 'third_party/kleidiai'
2025-12-04T17:27:08.2577752Z Entering 'third_party/mimalloc'
2025-12-04T17:27:08.2643159Z Entering 'third_party/nlohmann'
2025-12-04T17:27:08.2713072Z Entering 'third_party/onnx'
2025-12-04T17:27:08.2800446Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T17:27:08.2868617Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T17:27:08.2937009Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T17:27:08.3001983Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T17:27:08.3066160Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T17:27:08.3129820Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T17:27:08.3196122Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T17:27:08.3263405Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T17:27:08.3330029Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T17:27:08.3395749Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T17:27:08.3462044Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T17:27:08.3530369Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T17:27:08.3619123Z Entering 'third_party/pocketfft'
2025-12-04T17:27:08.3686244Z Entering 'third_party/protobuf'
2025-12-04T17:27:08.3755350Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T17:27:08.3822391Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T17:27:08.3889470Z Entering 'third_party/psimd'
2025-12-04T17:27:08.3955521Z Entering 'third_party/pthreadpool'
2025-12-04T17:27:08.4022022Z Entering 'third_party/pybind11'
2025-12-04T17:27:08.4088374Z Entering 'third_party/python-peachpy'
2025-12-04T17:27:08.4152970Z Entering 'third_party/sleef'
2025-12-04T17:27:08.4220251Z Entering 'third_party/tensorpipe'
2025-12-04T17:27:08.4286252Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T17:27:08.4350413Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T17:27:08.4414989Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T17:27:08.4482427Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T17:27:08.4545072Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T17:27:08.4637058Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2025-12-04T17:27:08.4662158Z http.https://github.com/.extraheader
2025-12-04T17:27:08.4673429Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader
2025-12-04T17:27:08.4708365Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :"
2025-12-04T17:27:08.5063561Z Entering 'android/libs/fbjni'
2025-12-04T17:27:08.5108870Z http.https://github.com/.extraheader
2025-12-04T17:27:08.5149263Z Entering 'third_party/FP16'
2025-12-04T17:27:08.5198679Z http.https://github.com/.extraheader
2025-12-04T17:27:08.5238714Z Entering 'third_party/FXdiv'
2025-12-04T17:27:08.5283864Z http.https://github.com/.extraheader
2025-12-04T17:27:08.5323747Z Entering 'third_party/NNPACK'
2025-12-04T17:27:08.5369098Z http.https://github.com/.extraheader
2025-12-04T17:27:08.5411732Z Entering 'third_party/NVTX'
2025-12-04T17:27:08.5456201Z http.https://github.com/.extraheader
2025-12-04T17:27:08.5499267Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T17:27:08.5543891Z http.https://github.com/.extraheader
2025-12-04T17:27:08.5586812Z Entering 'third_party/XNNPACK'
2025-12-04T17:27:08.5631580Z http.https://github.com/.extraheader
2025-12-04T17:27:08.5691199Z Entering 'third_party/aiter'
2025-12-04T17:27:08.5735608Z http.https://github.com/.extraheader
2025-12-04T17:27:08.5776963Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T17:27:08.5820453Z http.https://github.com/.extraheader
2025-12-04T17:27:08.5872058Z Entering 'third_party/benchmark'
2025-12-04T17:27:08.5916315Z http.https://github.com/.extraheader
2025-12-04T17:27:08.5955741Z Entering 'third_party/composable_kernel'
2025-12-04T17:27:08.6000313Z http.https://github.com/.extraheader
2025-12-04T17:27:08.6050111Z Entering 'third_party/cpp-httplib'
2025-12-04T17:27:08.6094943Z http.https://github.com/.extraheader
2025-12-04T17:27:08.6136249Z Entering 'third_party/cpuinfo'
2025-12-04T17:27:08.6181126Z http.https://github.com/.extraheader
2025-12-04T17:27:08.6222663Z Entering 'third_party/cudnn_frontend'
2025-12-04T17:27:08.6266772Z http.https://github.com/.extraheader
2025-12-04T17:27:08.6307956Z Entering 'third_party/cutlass'
2025-12-04T17:27:08.6352389Z http.https://github.com/.extraheader
2025-12-04T17:27:08.6405096Z Entering 'third_party/fbgemm'
2025-12-04T17:27:08.6450258Z http.https://github.com/.extraheader
2025-12-04T17:27:08.6495100Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T17:27:08.6538609Z http.https://github.com/.extraheader
2025-12-04T17:27:08.6579113Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T17:27:08.6622555Z http.https://github.com/.extraheader
2025-12-04T17:27:08.6672655Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T17:27:08.6717321Z http.https://github.com/.extraheader
2025-12-04T17:27:08.6757215Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T17:27:08.6802603Z http.https://github.com/.extraheader
2025-12-04T17:27:08.6852906Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T17:27:08.6897494Z http.https://github.com/.extraheader
2025-12-04T17:27:08.6936663Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T17:27:08.6982054Z http.https://github.com/.extraheader
2025-12-04T17:27:08.7021021Z Entering 'third_party/fbgemm/external/json'
2025-12-04T17:27:08.7066336Z http.https://github.com/.extraheader
2025-12-04T17:27:08.7110710Z Entering 'third_party/flash-attention'
2025-12-04T17:27:08.7157168Z http.https://github.com/.extraheader
2025-12-04T17:27:08.7199393Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T17:27:08.7242842Z http.https://github.com/.extraheader
2025-12-04T17:27:08.7292000Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T17:27:08.7336737Z http.https://github.com/.extraheader
2025-12-04T17:27:08.7390662Z Entering 'third_party/flatbuffers'
2025-12-04T17:27:08.7437235Z http.https://github.com/.extraheader
2025-12-04T17:27:08.7483443Z Entering 'third_party/fmt'
2025-12-04T17:27:08.7529722Z http.https://github.com/.extraheader
2025-12-04T17:27:08.7574088Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T17:27:08.7620355Z http.https://github.com/.extraheader
2025-12-04T17:27:08.7661504Z Entering 'third_party/gloo'
2025-12-04T17:27:08.7707356Z http.https://github.com/.extraheader
2025-12-04T17:27:08.7748970Z Entering 'third_party/googletest'
2025-12-04T17:27:08.7794550Z http.https://github.com/.extraheader
2025-12-04T17:27:08.7836306Z Entering 'third_party/ideep'
2025-12-04T17:27:08.7880835Z http.https://github.com/.extraheader
2025-12-04T17:27:08.7919504Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T17:27:08.7964309Z http.https://github.com/.extraheader
2025-12-04T17:27:08.8014222Z Entering 'third_party/ittapi'
2025-12-04T17:27:08.8059687Z http.https://github.com/.extraheader
2025-12-04T17:27:08.8099695Z Entering 'third_party/kineto'
2025-12-04T17:27:08.8145382Z http.https://github.com/.extraheader
2025-12-04T17:27:08.8185650Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T17:27:08.8230005Z http.https://github.com/.extraheader
2025-12-04T17:27:08.8270697Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T17:27:08.8315032Z http.https://github.com/.extraheader
2025-12-04T17:27:08.8357349Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T17:27:08.8402474Z http.https://github.com/.extraheader
2025-12-04T17:27:08.8444525Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T17:27:08.8489925Z http.https://github.com/.extraheader
2025-12-04T17:27:08.8531024Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T17:27:08.8576392Z http.https://github.com/.extraheader
2025-12-04T17:27:08.8615317Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T17:27:08.8659548Z http.https://github.com/.extraheader
2025-12-04T17:27:08.8703151Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T17:27:08.8745510Z http.https://github.com/.extraheader
2025-12-04T17:27:08.8785240Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T17:27:08.8827268Z http.https://github.com/.extraheader
2025-12-04T17:27:08.8866565Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T17:27:08.8909681Z http.https://github.com/.extraheader
2025-12-04T17:27:08.8950050Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T17:27:08.8993428Z http.https://github.com/.extraheader
2025-12-04T17:27:08.9032797Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T17:27:08.9076596Z http.https://github.com/.extraheader
2025-12-04T17:27:08.9114679Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T17:27:08.9157090Z http.https://github.com/.extraheader
2025-12-04T17:27:08.9202339Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T17:27:08.9246019Z http.https://github.com/.extraheader
2025-12-04T17:27:08.9291524Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T17:27:08.9334933Z http.https://github.com/.extraheader
2025-12-04T17:27:08.9374881Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T17:27:08.9418636Z http.https://github.com/.extraheader
2025-12-04T17:27:08.9460175Z Entering 'third_party/kleidiai'
2025-12-04T17:27:08.9505284Z http.https://github.com/.extraheader
2025-12-04T17:27:08.9545724Z Entering 'third_party/mimalloc'
2025-12-04T17:27:08.9592424Z http.https://github.com/.extraheader
2025-12-04T17:27:08.9631172Z Entering 'third_party/nlohmann'
2025-12-04T17:27:08.9676144Z http.https://github.com/.extraheader
2025-12-04T17:27:08.9716117Z Entering 'third_party/onnx'
2025-12-04T17:27:08.9760169Z http.https://github.com/.extraheader
2025-12-04T17:27:08.9820977Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T17:27:08.9864461Z http.https://github.com/.extraheader
2025-12-04T17:27:08.9906546Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T17:27:08.9952435Z http.https://github.com/.extraheader
2025-12-04T17:27:08.9993468Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T17:27:09.0035936Z http.https://github.com/.extraheader
2025-12-04T17:27:09.0074782Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T17:27:09.0116187Z http.https://github.com/.extraheader
2025-12-04T17:27:09.0153951Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T17:27:09.0197338Z http.https://github.com/.extraheader
2025-12-04T17:27:09.0235713Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T17:27:09.0278754Z http.https://github.com/.extraheader
2025-12-04T17:27:09.0319855Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T17:27:09.0362108Z http.https://github.com/.extraheader
2025-12-04T17:27:09.0401872Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T17:27:09.0446343Z http.https://github.com/.extraheader
2025-12-04T17:27:09.0485836Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T17:27:09.0528638Z http.https://github.com/.extraheader
2025-12-04T17:27:09.0566134Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T17:27:09.0608802Z http.https://github.com/.extraheader
2025-12-04T17:27:09.0649511Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T17:27:09.0693055Z http.https://github.com/.extraheader
2025-12-04T17:27:09.0735348Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T17:27:09.0778738Z http.https://github.com/.extraheader
2025-12-04T17:27:09.0840687Z Entering 'third_party/pocketfft'
2025-12-04T17:27:09.0886633Z http.https://github.com/.extraheader
2025-12-04T17:27:09.0925184Z Entering 'third_party/protobuf'
2025-12-04T17:27:09.0968649Z http.https://github.com/.extraheader
2025-12-04T17:27:09.1010456Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T17:27:09.1054093Z http.https://github.com/.extraheader
2025-12-04T17:27:09.1094295Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T17:27:09.1136027Z http.https://github.com/.extraheader
2025-12-04T17:27:09.1179089Z Entering 'third_party/psimd'
2025-12-04T17:27:09.1224428Z http.https://github.com/.extraheader
2025-12-04T17:27:09.1265023Z Entering 'third_party/pthreadpool'
2025-12-04T17:27:09.1310452Z http.https://github.com/.extraheader
2025-12-04T17:27:09.1350646Z Entering 'third_party/pybind11'
2025-12-04T17:27:09.1394150Z http.https://github.com/.extraheader
2025-12-04T17:27:09.1432896Z Entering 'third_party/python-peachpy'
2025-12-04T17:27:09.1476318Z http.https://github.com/.extraheader
2025-12-04T17:27:09.1515413Z Entering 'third_party/sleef'
2025-12-04T17:27:09.1558989Z http.https://github.com/.extraheader
2025-12-04T17:27:09.1597779Z Entering 'third_party/tensorpipe'
2025-12-04T17:27:09.1641623Z http.https://github.com/.extraheader
2025-12-04T17:27:09.1680867Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T17:27:09.1723394Z http.https://github.com/.extraheader
2025-12-04T17:27:09.1766513Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T17:27:09.1810883Z http.https://github.com/.extraheader
2025-12-04T17:27:09.1848260Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T17:27:09.1892502Z http.https://github.com/.extraheader
2025-12-04T17:27:09.1929951Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T17:27:09.1972520Z http.https://github.com/.extraheader
2025-12-04T17:27:09.2009374Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T17:27:09.2051924Z http.https://github.com/.extraheader
2025-12-04T17:27:09.2118213Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.2155036Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url
2025-12-04T17:27:09.2514570Z Entering 'android/libs/fbjni'
2025-12-04T17:27:09.2543868Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config	remote.origin.url
2025-12-04T17:27:09.2563477Z Entering 'third_party/FP16'
2025-12-04T17:27:09.2594196Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config	remote.origin.url
2025-12-04T17:27:09.2612657Z Entering 'third_party/FXdiv'
2025-12-04T17:27:09.2641652Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config	remote.origin.url
2025-12-04T17:27:09.2660359Z Entering 'third_party/NNPACK'
2025-12-04T17:27:09.2689594Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config	remote.origin.url
2025-12-04T17:27:09.2708516Z Entering 'third_party/NVTX'
2025-12-04T17:27:09.2737056Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config	remote.origin.url
2025-12-04T17:27:09.2756610Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T17:27:09.2785573Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config	remote.origin.url
2025-12-04T17:27:09.2804252Z Entering 'third_party/XNNPACK'
2025-12-04T17:27:09.2832749Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config	remote.origin.url
2025-12-04T17:27:09.2868946Z Entering 'third_party/aiter'
2025-12-04T17:27:09.2897949Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config	remote.origin.url
2025-12-04T17:27:09.2917154Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T17:27:09.2945418Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config	remote.origin.url
2025-12-04T17:27:09.2974468Z Entering 'third_party/benchmark'
2025-12-04T17:27:09.3003307Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T17:27:09.3022825Z Entering 'third_party/composable_kernel'
2025-12-04T17:27:09.3052286Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config	remote.origin.url
2025-12-04T17:27:09.3081836Z Entering 'third_party/cpp-httplib'
2025-12-04T17:27:09.3111023Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config	remote.origin.url
2025-12-04T17:27:09.3129922Z Entering 'third_party/cpuinfo'
2025-12-04T17:27:09.3159097Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config	remote.origin.url
2025-12-04T17:27:09.3179880Z Entering 'third_party/cudnn_frontend'
2025-12-04T17:27:09.3212542Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config	remote.origin.url
2025-12-04T17:27:09.3231893Z Entering 'third_party/cutlass'
2025-12-04T17:27:09.3261070Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config	remote.origin.url
2025-12-04T17:27:09.3291693Z Entering 'third_party/fbgemm'
2025-12-04T17:27:09.3320672Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config	remote.origin.url
2025-12-04T17:27:09.3342513Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T17:27:09.3371836Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config	remote.origin.url
2025-12-04T17:27:09.3389804Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T17:27:09.3419270Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config	remote.origin.url
2025-12-04T17:27:09.3450156Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T17:27:09.3479759Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config	remote.origin.url
2025-12-04T17:27:09.3499228Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T17:27:09.3527511Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config	remote.origin.url
2025-12-04T17:27:09.3557193Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T17:27:09.3586295Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config	remote.origin.url
2025-12-04T17:27:09.3604618Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T17:27:09.3632981Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config	remote.origin.url
2025-12-04T17:27:09.3650516Z Entering 'third_party/fbgemm/external/json'
2025-12-04T17:27:09.3680078Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config	remote.origin.url
2025-12-04T17:27:09.3703514Z Entering 'third_party/flash-attention'
2025-12-04T17:27:09.3733500Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config	remote.origin.url
2025-12-04T17:27:09.3753876Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T17:27:09.3785195Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config	remote.origin.url
2025-12-04T17:27:09.3810590Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T17:27:09.3839333Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config	remote.origin.url
2025-12-04T17:27:09.3869279Z Entering 'third_party/flatbuffers'
2025-12-04T17:27:09.3898942Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config	remote.origin.url
2025-12-04T17:27:09.3921163Z Entering 'third_party/fmt'
2025-12-04T17:27:09.3950600Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config	remote.origin.url
2025-12-04T17:27:09.3970012Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T17:27:09.3999352Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config	remote.origin.url
2025-12-04T17:27:09.4018650Z Entering 'third_party/gloo'
2025-12-04T17:27:09.4047835Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config	remote.origin.url
2025-12-04T17:27:09.4067423Z Entering 'third_party/googletest'
2025-12-04T17:27:09.4096614Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config	remote.origin.url
2025-12-04T17:27:09.4115380Z Entering 'third_party/ideep'
2025-12-04T17:27:09.4144782Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config	remote.origin.url
2025-12-04T17:27:09.4162560Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T17:27:09.4192297Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config	remote.origin.url
2025-12-04T17:27:09.4220455Z Entering 'third_party/ittapi'
2025-12-04T17:27:09.4251457Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config	remote.origin.url
2025-12-04T17:27:09.4269415Z Entering 'third_party/kineto'
2025-12-04T17:27:09.4299895Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config	remote.origin.url
2025-12-04T17:27:09.4318797Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T17:27:09.4347559Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config	remote.origin.url
2025-12-04T17:27:09.4364971Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T17:27:09.4395003Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config	remote.origin.url
2025-12-04T17:27:09.4414238Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T17:27:09.4444771Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config	remote.origin.url
2025-12-04T17:27:09.4462887Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T17:27:09.4492041Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config	remote.origin.url
2025-12-04T17:27:09.4510461Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T17:27:09.4539730Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config	remote.origin.url
2025-12-04T17:27:09.4556554Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T17:27:09.4585918Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config	remote.origin.url
2025-12-04T17:27:09.4606051Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T17:27:09.4634650Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config	remote.origin.url
2025-12-04T17:27:09.4653919Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T17:27:09.4683189Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config	remote.origin.url
2025-12-04T17:27:09.4701940Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T17:27:09.4731521Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config	remote.origin.url
2025-12-04T17:27:09.4751074Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T17:27:09.4781865Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config	remote.origin.url
2025-12-04T17:27:09.4800227Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T17:27:09.4828500Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T17:27:09.4845897Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T17:27:09.4875645Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T17:27:09.4895892Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T17:27:09.4925242Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T17:27:09.4948016Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T17:27:09.4976731Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config	remote.origin.url
2025-12-04T17:27:09.4994593Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T17:27:09.5023331Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config	remote.origin.url
2025-12-04T17:27:09.5043199Z Entering 'third_party/kleidiai'
2025-12-04T17:27:09.5073189Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config	remote.origin.url
2025-12-04T17:27:09.5092451Z Entering 'third_party/mimalloc'
2025-12-04T17:27:09.5122001Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config	remote.origin.url
2025-12-04T17:27:09.5141343Z Entering 'third_party/nlohmann'
2025-12-04T17:27:09.5169871Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config	remote.origin.url
2025-12-04T17:27:09.5196973Z Entering 'third_party/onnx'
2025-12-04T17:27:09.5226270Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config	remote.origin.url
2025-12-04T17:27:09.5266354Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T17:27:09.5296788Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T17:27:09.5318276Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T17:27:09.5349374Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config	remote.origin.url
2025-12-04T17:27:09.5370009Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T17:27:09.5400854Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T17:27:09.5419437Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T17:27:09.5447480Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config	remote.origin.url
2025-12-04T17:27:09.5466057Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T17:27:09.5495744Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config	remote.origin.url
2025-12-04T17:27:09.5513362Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T17:27:09.5541138Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config	remote.origin.url
2025-12-04T17:27:09.5560430Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T17:27:09.5596291Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config	remote.origin.url
2025-12-04T17:27:09.5613377Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T17:27:09.5643725Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config	remote.origin.url
2025-12-04T17:27:09.5660692Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T17:27:09.5690608Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T17:27:09.5707495Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T17:27:09.5736381Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T17:27:09.5756634Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T17:27:09.5785069Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T17:27:09.5804720Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T17:27:09.5833581Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config	remote.origin.url
2025-12-04T17:27:09.5877058Z Entering 'third_party/pocketfft'
2025-12-04T17:27:09.5907856Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config	remote.origin.url
2025-12-04T17:27:09.5926153Z Entering 'third_party/protobuf'
2025-12-04T17:27:09.5956344Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config	remote.origin.url
2025-12-04T17:27:09.5980515Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T17:27:09.6008425Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T17:27:09.6026535Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T17:27:09.6055705Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config	remote.origin.url
2025-12-04T17:27:09.6077041Z Entering 'third_party/psimd'
2025-12-04T17:27:09.6183653Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config	remote.origin.url
2025-12-04T17:27:09.6202348Z Entering 'third_party/pthreadpool'
2025-12-04T17:27:09.6231693Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config	remote.origin.url
2025-12-04T17:27:09.6250167Z Entering 'third_party/pybind11'
2025-12-04T17:27:09.6279170Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T17:27:09.6298166Z Entering 'third_party/python-peachpy'
2025-12-04T17:27:09.6326368Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config	remote.origin.url
2025-12-04T17:27:09.6346476Z Entering 'third_party/sleef'
2025-12-04T17:27:09.6376395Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config	remote.origin.url
2025-12-04T17:27:09.6394987Z Entering 'third_party/tensorpipe'
2025-12-04T17:27:09.6425386Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config	remote.origin.url
2025-12-04T17:27:09.6443017Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T17:27:09.6472457Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config	remote.origin.url
2025-12-04T17:27:09.6490358Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T17:27:09.6518804Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config	remote.origin.url
2025-12-04T17:27:09.6537178Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T17:27:09.6565186Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config	remote.origin.url
2025-12-04T17:27:09.6584672Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T17:27:09.6613853Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T17:27:09.6630678Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T17:27:09.6660355Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config	remote.origin.url
2025-12-04T17:27:09.6702037Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.6731835Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.6758630Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.6788167Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.6813638Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.6843583Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.6871771Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.6897189Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.6923552Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.6950839Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.6978053Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7005181Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7033671Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7060657Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7087360Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7113822Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7140209Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7166759Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7194739Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7220309Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7247138Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7274472Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7301723Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7327620Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7355618Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7382692Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7408141Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7434235Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7463940Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7492061Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7520143Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7549654Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7580538Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7610893Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7639368Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7671497Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7701562Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7731770Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7760360Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7800983Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7819291Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7847622Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7883329Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7912674Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7942318Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.7972418Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8003662Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8034333Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8062682Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8091799Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8117017Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8143544Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8169317Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8195469Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8221607Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8248230Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8275628Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8302161Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8327063Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8353609Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8382775Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8409155Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8436838Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8464495Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8500656Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8527137Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8552884Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8580460Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8610735Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8638401Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8663797Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8692519Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8719596Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8746294Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8773560Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8800910Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8827504Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8854719Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8881213Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8911204Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.8942046Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T17:27:09.9053598Z A job completed hook has been configured by the self-hosted runner administrator
2025-12-04T17:27:09.9069245Z ##[group]Run '/home/ec2-user/runner-scripts/after_job.sh'
2025-12-04T17:27:09.9075593Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T17:27:09.9076071Z ##[endgroup]
2025-12-04T17:27:09.9169373Z [!ALERT!] Swap in detected! [!ALERT!]
2025-12-04T17:27:22.9963070Z [!ALERT!] Swap out detected [!ALERT!]
2025-12-04T17:27:44.8692407Z Cleaning up orphan processes